Statistics 101: How To Actually Interpret Accuracy & Build Loads

Alistair · Dec 3, 2023

Hello folks.

I've been meaning to put something together on this topic for a while as it's a persistent blind spot within the shooting and reloading community. @Shooter375 s recent thread on crimping finally gave me the incentive to put words to forum. Not calling him out specifically, just the latest in a long line of threads on reloading, grouping and what it all means. I think this'll be of interest to a few of our members and may spark some lively discussion.

This post will be long, and will be split into 2 main topics:

1. Basic statistics and why 3 round group don't mean anything. This'll be math heavy (dull, I know!), but should be mildly interesting and give sufficient background for it all to make sense.
2. If 3 round groups mean nothing, then what can we do instead to guide load development? This section will be more practical and maybe it'll save people some time and money.

Here we go!

Topic 1 - Stats 'n' stuff.

This is a normal distribution curve:

Scary, I know. But it's a surprisingly useful thing. What this curve does, is explain the distribution of data points within a sample using 2 main criteria. These are; the mean of the data set, and the standard deviation (SD). I will not go into the math of what SD is, but this link is here for those who are interested: https://www.mathsisfun.com/data/standard-deviation.html

This calculator allows you to calculate SD from a data set: https://www.calculator.net/standard-deviation-calculator.html
We'll get onto that later. Use 'Sample', not 'Population'.

This curve assumes something called 'normal' or Gaussian distribution. It assumes that data is spread about the mean in a specific way, with no skew to the high end, or the low end. Not all data does this, but normal distribution is called 'normal' for a reason. It is incredibly common in nature, and in complex systems. Examples of data that follow this trend include; height of people within a population, IQ in a population, torque applied to a bolt by a torque wrench, and roughly 90% of all other examples that spring to mind. Some topical ones for us on this forum might include bullet weights within a single box of bullets or velocity of a given load.

The curve actually tells us quite a lot about the data and helps us make conclusions. The mean tells us the mid point, whilst the standard deviation tells us how spread out that data is about the mean. A small SD suggests that all data points are tightly clustered, giving a steeper curve. You'll notice in the above that we have lines at -3, -2, -1, 0 and so forth. Each of these describes one standard deviation, and as the figure shows, we can make clear statements about what percent of our data points fit within a certain number of SDs from the mean.

As an example, Let's say you buy a box of shiny 300gr bullets. You weigh 20 of 'em, stick the values into the calculator above. You find that the mean is 300gr and the SD is 2gr. This tells us that in your box of 100 bullets, 68 of them weigh between 298 and 302gr, 95 of them weigh between 296 and 304gr, and all 100 of them weigh between 294gr and 306gr. 20 bullets (the sample) tells you enough about the entire box (population) to say this with confidence (real, statistical confidence, not the usual bollocks shooters say!)

Cool stuff, but why should you care?

Well, as with most complex systems, the grouping of a given load also follows this normal distribution. That's unsurprising. A rifle and its load is a complex system, made up of a whole lot of variables which themselves follow normal distributions. Bullet weight. Bullet diameter. Neck tension. Case volume. Specific energy of powder. Powder charge. Primer energy. All have a mean that they vary about randomly following the distribution above.

We can therefore imagine the group of a rifle to look something like this if you fired say 1000 rounds:

You have a normal distribution for 'x' (horizontal dispersion) and a normal distribution for 'y' (vertical dispersion) with the height of the little hill at any given (x,y) defining the relative number of those 1000 rounds that fall there. The middle of the hill would have coordinates 0,0 as it is 0" away from the center of the group both vertically and horizontally. You'll see that most rounds fall in the middle (mean +/-1 SD) but that some rounds fall further from the center at 2 or even 3 SD away from the mean center of the group.

So now you know what your group is doing, but what does it have to do with 3 round groups?

Well, let's say you have 2 loads, both of which have exactly the same mean (0,0) and the same SD (let's say 0.5" horizontally and vertically). They're identically accurate in every way. But obviously you don't know this. You haven't shot 'em yet.

You do what many shooters do. Take 3 rounds of one, shoot 'em, then take 3 rounds of the other and shoot them too. You then measure your group in MOA.

We can be pretty confident that one group will be smaller than the other by pure random chance. But does that mean that one is actually 'better'?

No, it does not.

Let's look at the distribution curves again. Let's say that for load #1 the three rounds you shot landed within 1SD of the mean. This is pretty likely. You've got a 68% chance of that happening so with only 3 shots in the group the odds are in your favor. That load posts a group of MAX 1" or 1MOA. You're pretty happy and you go hunting. Let's say for load #2 the three you randomly picked to shoot land within 2SD of the mean. You've got a 95% chance of that happening. The second load therefore gives a group of 2" or 2MOA. Twice what the first load gave you.

Obviously based on this you go shoot with load #1, boasting happily to your buddies at the bar about your 'sub-MOA' rifle.

But the two groups were equally accurate. If you performed the same test again, the results could completely flip the other way based on pure random chance. You've learned nothing.

And that sub-MOA rifle? It's not even a 1.5MOA rifle. The standard distribution tells us that actually, if you fire 20 shots, your group is actually barely keeping under 2MOA. If you fire 100 rounds of that load over the following year, the actual group is more like 3MOA... ponder that next time you have a 'flier'.

It's the same story when people say their rifle 'likes' or 'doesn't like' a specific brand of ammo based on only a few shots, or pick a specific bullet because it's 'way more accurate'. Maybe it is, maybe it isn't. You don't know and you've done nothing to find out. You've just cherry picked a group of a few shots that happened to fall in the 68% confidence interval instead of the 95% confidence interval. It means nothing.

That brings us onto Topic 2 - What can be done?

So, we know from the above that using a small number of shots cannot accurately and truthfully distinguish between two loads in terms of their grouping. What can we do instead? Well adding more rounds to your group starts to give you a truer picture of what is going on, but ammo is expensive and no one wants to be sat at the bench doing load development and shooting 20 rounds of each load in a ladder test. Your barrel wouldn't last long either.

We have to be pragmatic here. We must accept that actually, you will never be able to find the 'true' best load. But we can truthfully and accurately understand what a load is doing and if it is good enough for our purposes.

I am proposing the following method to achieve that. It is not without its drawbacks, but it will be far more statistically robust. I welcome criticism and feedback.

Step 1. Choose a bullet, any bullet. You'll never know if it's the 'most accurate' so pick one you like, one whose performance you trust, one that is readily available and in budget. Stick with it.
Step 2. Do the same with powder and all other components. Again, you don't know 'the best' and you never will, but you can choose one you can source and I'll share a method later to see if it's 'good enough'.
Step 3. Choose a velocity. What energy do you want, do you have a figure in mind, a goal you'd like to meet or a speed that seems reasonable based on your loading data?
Step 4. Ladder test. 1 round of each powder charge. No worries about accuracy, testing purely for a charge to meet your velocity. Define your powder charge when you chrono the velocity you want.
Step 5. You now have your load. But is it 'acceptably' accurate? Load 25x bullets of this load. Use 5 to zero the rifle. Get a target with a clear and defined center point. Shoot all 20 rounds at the target, measuring the velocity as you go. First sense check. What is your velocity SD like, are you hitting your target energy? Second sense check. Go to your target and measure distance from the center of the target (which as you zeroed should be center of the group) to the center of each bullet hole. Write down all 20 measurements. Enter these into the SD calculator above to get your SD for the load.
Step 6. You now have a mean (0,0) and an SD. As such, you can say (with actual confidence) how accurate the load is. For instance, if your SD is 0.2", you can say with certainty that 99.7% of all rounds from that group fall within 0.6" of the aim point (1.2MOA overall). If the SD is 0.5", 99.7% of all rounds of that loading will fall within 1.5" of your aim point (3MOA overall). Is this accurate enough for what the rifle and load is intended to do?

Bear in mind that this is the 99.7% interval, most rounds will be closer than this (3MOA at 99.7% means that 68% of shots fall within 1MOA and 95% fall within 2MOA. That's minute of deer in my book, especially if your annual hunting round count is only 20 rounds of so).

If it is, you have a load. If it isn't, choose a variable at random (bullet, powder charge etc), change it and try steps 5 + 6 again. I think you'll find pretty quickly if you have an actually 'bad' bullet, velocity etc for the rifle, and actually, in most cases the method above will be 'adequate' in just one go.

So there you go. A load development strategy that is a. statistically valid and b. uses no more ammo than just shooting random 3 round groups and cherry picking something for no good reason. Heck, it might even save you time and money pointlessly shooting random load combinations to learn nothing until you get lucky with one random load that happens to throw out a small group through random chance.

Thoughts? I bet there are other (and better) statisticians on this forum than I, so please chime in!

Wyatt Smith · Dec 4, 2023

Thanks!

Tgood1 · Dec 4, 2023

very interesting, will definitely have to read that again

Shootist43 · Dec 4, 2023

Alistair, I share your "opinion" of three round groups. I generally like to shoot multiple 10 round groups and compare them. Sometimes they are months or even years apart. When it comes to an "Accuracy Load" I like to see the comparison shot from a number of rifles. I realize there is a "practical limit" for the test firing most of us are able to do. What number of shots and or groups of shots do you use in determining the accuracy of a rifle or the accuracy potential of a given load?

DieJager · Dec 4, 2023

Very interesting read. Not yet started reloading but definitely saving this thread. Thanks!

Altitude sickness · Dec 4, 2023

Shooting east and west or north and south.

During the Moons Parigee or Apogee

Each planet perturbs the orbits of the other planets, making Kepler's ellipses approximately correct rather than exact. The Moon's orbit is strongly perturbed by the Sun in a number of ways. The Moon's orbit deviates from being a fixed ellipse in a number of ways. One result of these solar perturbations (and to a much lesser extent, perturbations from Venus and Jupiter, and to an even lesser extent, from the other planets) is that the Moon's orbit precesses in a number of ways.

One such precession is the apsidal precession. The line from the Earth to the point at which the Moon reaches perigee does not point to a fixed position in space. It instead precesses with a period of about 8.85 years. This is what results in the so-called supermoons, which occur when the Moon's orbit is close to perigee when the Moon is full.

Another such precession is the nodal precession. The line of nodes (where the Moon crosses from above to below the ecliptic, and vice versa), also precesses, but with a period of about 18.6 years. We only get eclipses when the Moon is very close to a node at a syzygy (either a full Moon, resulting in a lunar eclipse, or a new Moon, resulting in a solar eclipse).

You would need these added into your formula

Gravity and the planets rotation are different through the days, weeks and years.

Groups over weeks, months and years at 100 yards and 1 mile are impacted by these

Your DOPE book would need to track all of these ATM effects

Sorry I couldn’t help myself

jnmullins · Dec 4, 2023

Hornady has several good podcasts/Youtube videos covering statistics, group size and mean radius. They go into fairly good detail.

sgt_zim · Dec 4, 2023

FWIW, I often weigh bullets before i start reloading them. I've found that Swift and Norma in particular are very consistent.

286 gr A Frames, the median for different batches is always 285.7 or 285.8, and almost everything weighs between 285.5 and 286.1

I get similar results with Norma "285" gr, though the mean and median tend to be around 285.6

Just weighed a box of Hornady 286 gr, weights range from 285.1 to 286.6. For precision shooting, that's a problem. For minute-of-lungs accuracy, they are all quite good enough out to as far as I care to shoot, which is usually no further than 300 yards.

BRICKBURN · Dec 4, 2023

Thanks for taking the time to write this up. I am sure it will foster discussion.

I have actually done most of your suggested steps for load development. No calipers though. Are the little holes all together or not?

Three under a quarter ($.25) at 100 yards was always the standard growing up.

"Is this accurate enough for what the rifle and load is intended to do?" (Spot on)

I am going to just keep rolling the dice with my three shot groups to check the rifle annually and continue to hope they knock those deer over. So, "Sub Deer Lung" accuracy will just have to do.

I will also continue to hope they produce that bullet, primer, brass and powder so i don't have to do it all over again.

flatwater bill · Dec 4, 2023

As principally a shotgunner over my lifetime I have shot and recorded thousands of rounds over the years. Kept careful records. Patterning is complex. It is often described in the hunting magazines in a cursory fashion, often sub optimally or just plain wrong. One cannot fully evaluate or understand patterns without a good knowledge of statistics. But my hunting partner Dave never did any of that. He just shot more ducks and geese than I did. Always. As a child I grew up with Model 94 Winchesters used as a tools. They were hard to keep clean, and ammo was expensive. If we could put 2 shots into a 4 inch circle at 100 yards, we were good to go after deer. How times change. Keep up the good work.....loved the refresher......FWB

Shooter375 · Dec 4, 2023

@Alistair, I am happy to hear that my post inspired you to post this post. I enjoyed reading it. I agree with you when you express that for accurate conclusions to be drawn accurate data must be compiled and analyzed. I also agree with your point which is that the more data that is compiled the more certain a person can be that same thing will happen next time and draw accurate conclusions. I totally agree with the essence of your post which in my opinion is that quantity is necessary to accurately theorize.

As for the never ending quest for quality data, collecting the data is the fun part. Hunting and shooting are hobbies of mine from which I draw great joy. It pleases me to avail myself to become the best hunter and the best shot that I can be. It's fun. So because it is fun I tinker with my tools (rifles, components, reloading tools) to see what they can do with my goal being, to shoot the smallest groups at the range and hit the animals that I am hunting where I try to hit them to kill them quickly so that I do not leave them lost and wounded. So with this goal in mind I try to remove as many variables as possible. One variable to I seek to limit is the variance in process. I seek to develop consistent processes in reloading and shooting that I seek to repeat as accurately as I can. The next variable to limit is weight uniformity. So I weigh the bullets, cases, and powder charges with ever increasing quality scales. In theory, the less the variance is in the grouped ammunition, the less the variance will be when it is fired and the smaller the groups will be. I do this because I am on a quest to shoot groups that all land in the same bullet diameter hole. A feat that I have yet to obtain, although I have come very close (I'll post a picture below). Another option is to limit variables with the build quality of my rifles (examples include lowering the tolerances with quality actions, quality barrels, quality triggers, quality bedding etc...).

I do all this because it fun. It is fun to constantly seek the never ending goal of perfection.

My best group ever!

Ballistic-X-Export-2021-09-06 15:31:18.109388.jpg

Altitude sickness · Dec 4, 2023

Very impressive 5 shot group

BourbonTrail · Dec 4, 2023

Oh a fake math, I mean, statistics thread!

I tease because of the manipulation that can occur with statistics as opposed to other mathematical fields.

My only recommendation:

Cp/Cpk is used for determining the stability of your process, so I would look at the parameters for setting and achieving Cp/Cpk > 1 for a desired grouping size (assuming we can skip Pp, PpK). I.e., for airbag deployment to qualify for cpk when I worked at Toyota, a minimum of 30 pieces needed to be tested in the exact same conditions.

Therefore, I recommend changing your 20 rounds (a typical box of ammo) to 30, to be statistically significant of process stability.

Granted, I will never go to that length, because the Cpk of my target GAF of 0 is 1.33.

Alistair · Dec 4, 2023

Shootist43 said:
Alistair, I share your "opinion" of three round groups. I generally like to shoot multiple 10 round groups and compare them. Sometimes they are months or even years apart. When it comes to an "Accuracy Load" I like to see the comparison shot from a number of rifles. I realize there is a "practical limit" for the test firing most of us are able to do. What number of shots and or groups of shots do you use in determining the accuracy of a rifle or the accuracy potential of a given load?

As a rule I try and do at least 7, ideally 20. As @BourbonTrail notes, 30 is more statistically valid than my 20. 50 is even better.

Ultimately, the more shots you fire, the more true a picture of accuracy you 'see'. This is intuitive; firing extra shots never makes a group smaller, it can only ever keep it the same, or (more likely) make it bigger. Having lots of shots means you're more likely to 'see' one of those 2SD, or even 3SD results which gives you a better idea of what the outer edges of your group might be over lots and lots of shots (assuming it isn't the shooter fault). It also strengthens the SD value you're establishing. I rely a lot more on establishing the SD, assuming the normal distribution and making conclusions based on that than I do on the absolute size of the posted group. I believe (and math agrees with me) that it is a more comprehensive and statistically valid measure of accuracy.

But, you get diminishing returns and ammo ain't cheap. If I were doing this for my job, or to publish in a journal, I want to be sure that my findings are statistically rigorous and defensible. If I'm spending my own money on the trials and the only person accountable for and impacted by the results is me, I'm much more chilled out.

In that context, I think that somewhere around 10-20 shots is a decent balance between marginally significant and reasonable in terms of time and cost. If I were shooting a 505Gibbs at $10 a pop, I think my scientific rigor might go out the window though!

On that topic, I tailor my rigor to the application. Benchrest rifle, lots of rigor. Hunting rifle, rather less. Plinking rifle... just pick a load at random and crack on!

Again to quote BourbonTrail, it's absolutely fine to have a 'Give a F' value of 0. Just be honest with yourself about what your method can, and cannot, tell you, and don't make wild claims that your data cannot substantiate. Otherwise I'll have to do a 'Stats 202' thread, and NOBODY wants that!

Honestly, I think under the uncontrolled, real world conditions we actually test ammo, any load development testing is as much an assessment of the accuracy of the shooter as it is the accuracy of the load... Maybe that single shot 2.5" from the rest of the group is a true 3SD result and should be included. Maybe... you just cocked it up! Uncontrolled variables suck, and humans are made exclusively of suck in that context.

CBH Australia · Dec 4, 2023

Following

fourfive8 · Dec 4, 2023

Huge difference between statistical validity suited to judge target shooting vs hunting. For hunting- a 3-5 shot group may be perfectly correct where the N sample size is maybe five to ten, 3-5 shot groups, shot at different times and/or different days where the rifle condition is true to the average hunting condition- like cold and clean. A valid idea for judging a hunting load or rifle may require several outings at 3-5 shots each with a total N of 20-30.

For target shooting a valid statistical evaluation may require much longer shot strings and the appropriate N number and repetitions might look much different- maybe a few 20 shot strings, depending on the type of target shooting. But to statistically gauge a hunting load or rifle based on a continuous, long shot string may not at all reflect how good the load or rifle may be for hunting.

Most every time I go to the range, especially just before hunting season, I can witness someone trying to work up a load or do final scope adjustments for hunting. And usually I see them do the same thing- bang, bang ,bang, bang, bang, etc. ... in long shot strings. After 20-50 shots and multiple "fine" adjustments of their scope settings they may have less idea about how well the load is working and how well the scope is sighted in for hunting than when they started.

Alistair · Dec 4, 2023

fourfive8 said:
Huge difference between statistical validity suited to judge target shooting vs hunting. For hunting- a 3-5 shot group may be perfectly correct where the N sample size is maybe five to ten, 3-5 shot groups, shot at different times and/or different days where the rifle condition is true to the average hunting condition- like cold and clean. A valid idea for judging a hunting load or rifle may require several outings at 3-5 shots each with a total N of 20-30.

For target shooting a valid statistical evaluation may require much longer shot strings and the appropriate N number and repetitions might look much different- maybe a few 20 shot strings, depending on the type of target shooting. But to statistically gauge a hunting load or rifle based on a continuous, long shot string may not at all reflect how good the load or rifle may be for hunting.

Most every time I go to the range, especially just before hunting season, I can witness someone trying to work up a load or do final scope adjustments for hunting. And usually I see them do the same thing- bang, bang ,bang, bang, bang, etc. ... in long shot strings. After 20-50 shots and multiple "fine" adjustments of their scope settings they may have less idea about how well the load is working and how well the scope is sighted in for hunting than when they started.

I think you're confusing the number of shots in a group with the need to shoot all the shots at once.

You don't have to shoot strings to shoot a group. It's perfectly valid to do your 10 rounds, or 20 rounds at a rate of 1 shot a day. Still a group. Possibly better really, cold bore shots are more representative of real use. It'll just take ages, so strings are more convenient. 2 or 3 at a time with a break in between is a great compromise, so long as you still consider it one overall group and run the stats as such.

As an example, when I was doing my load development with my 375, I'd shot 2 or 3, then take 20mins to plink with a 22, then do a couple more on the same target. I did that because it was a hunting load, so cold bore was representative, and also because I found trying to shoot 20 round strings of full fat 375 hunting loads off a bench a really good way to develop a flinch... It's still one group though and all the shots go into a single calculation of SD.

As far as hunting versus competition, there's actually no difference at all, in terms of the statistics. A group is a group. Accuracy is accuracy. The same number of shots yield the same statistical significance, the same number are needed for the same confidence in your conclusions.

Not to belabor this point, but remember that the group simply represents the MAXIMUM area that any ONE given round may fall. The number of shots you take does not define this area. It just gives you a snap shot of a couple of data points which you can then use to work out the real group size (which is intrinsic to the variability in the ammo and the rifle). You are merely discovering the SD by shooting many rounds, not defining it by doing so. It was always there in that load.

Most hunting expeditions take 1 shot. That one shot will fall somewhere in the group. Just the same as any one shot in a competition will fall somewhere in the group. If that's the first shot of the day, or the 50th shot in your string, doesn't matter (in theory). It'll fall within the group and you know that 99.7% of shots will fall with 3SD of the mean whilst 95% fall within 2SD and 68% within 1SD. As such, the methodology (from a statistical standpoint) is absolutely identical for both use cases. You're finding the same information.

fourfive8 · Dec 4, 2023

Nope, not confused at all.

There is a big error in the logic. The math cannot determine there is a cumulative variable built into the sampling. The math thinks the bullet impacts on the target are the result of a 100% random distribution. It cannot resolve cumulative variables caused by differential shot strings due to fouling, temperature, harmonics, etc. in the statistical analysis.

RockSlinger404 · Dec 5, 2023

If you use ballistic programs such as Quickload or Gordon's, it will significantly cut down the number of shots required to fine your accurate load. OBT will provide a scientific reference point, which you can then fine tune with the results you see on paper. I personally use Quickload and then shoot a ladder test at 300m to confirm. However, with the ladder test you still have the small sample size issue. Ideally you want to shoot a ladder with 2/3 bullets at each charge weight to make it more statistically significant (i.e. 30 shots).

fourfive8 · Dec 5, 2023

Statistical analysis of shooting results whether it be precision to a point on a target, accuracy in the form of repeatability of group size or SD (consistency) of velocity are all useful tools for making a judgement. Just have to recognize the limitations of the statistical analysis that result from sampling bias(es) when interpreting the results.

Statistics 101: How To Actually Interpret Accuracy & Build Loads

Alistair

Attachments

Wyatt Smith

Tgood1

Shootist43

DieJager

Altitude sickness

jnmullins

sgt_zim

BRICKBURN

flatwater bill

Shooter375

Altitude sickness

BourbonTrail

Alistair

CBH Australia

fourfive8

Alistair

fourfive8

RockSlinger404

fourfive8

Members online

Forum statistics

Proudly sponsored by

Latest posts

Latest profile posts

Share this page