**ALPHARETTA, Ga.** – **Aug. 11, 2017** – *PRLog* — Six Sigma methodology is built on the premise that data holds the key to process improvement. Only through data-driven analysis and correction can a process be truly transformed into a lean, efficient set of steps that can be controlled and continued over time. The quality of the data drives the quality of the solutions, so gathering accurate data is critical to Six Sigma project success.

*Accurate data is critical*

Without accurate data, it is impossible to create effective, long-term improvements to a process. Six Sigma requires accurate data to measure and analyze the state of an existing process as a first step toward determining where the process can be improved. Once improvements are determined the collection and analysis of accurate data is just as critical for ensuring ongoing control and verification of the process improvement results.

The standard technique for collecting data is sampling. It is a statistics-based process that strives to collect data from a portion of a particular population so that the data is representative of the population as a whole. The term population in this context does not necessarily refer to living things, but instead refers to the collection of items about which we want to know certain characteristics.

The sampling methodology used during any measurement activity must be unbiased and statistically representative of the population. This is especially important in Six Sigma projects because only with detailed and accurate data can a Six Sigma team develop and apply appropriate and effective process improvements.

*Representative sampling*

There are many ways to conduct representative sampling, some more effective than others for Six Sigma purposes.

The most common sampling methods include:

· Random sampling

· Systematic sampling

· Cluster sampling

· Stratified sampling

Each method has its own advantages and disadvantages, which we will discuss in the following sections.

*Random sampling*

The basis of random sampling is that each item in the desired population has an equal chance of being selected. This means that if you have 10,000 widgets in a production run and you intend to do a random sample to check for uniform size, each of those 10,000 widgets has an equal chance of being selected for inspection. It is not a haphazard process, but rather one that uses statistical methods to ensure the sample is truly random.

In most cases, the easiest way to do a random sample is to assign each item in the population a number, and then use a calculator or table of random numbers to select which items to pull for evaluation. This method is excellent for purely statistical purposes, but in practice can be hard to implement. It requires an accurate list of the entire target population and can be expensive to conduct if the selected samples are located across a wide area.

*Systematic sampling*

This method is a variation on random sampling that focuses on selecting items from within a population using specific intervals. For instance, let’s say you have 1,000 bearings and you determine that a sample size of 75 is appropriate for measuring a certain characteristic of interest.

Using systematic sampling, you would do the following:

· 1,000 bearings ÷ 75 samples = 13.33 interval (round to 13)

· Select a random starting point within the population of 1,000 bearings

· From that random starting point, select every 13th bearing for the sample

Systematic sampling is much easier to implement than true random sampling, and it spreads the sampling out more evenly across the entire target population. It does carry the danger, though, of being affected by a hidden or unforeseen pattern within the population. For example, maybe each batch of hot steel is supposed to contain enough material for 13 bearings, but in practice, there is not quite enough so that the 13thbearing in each batch is slightly different from the rest. This type of pattern could interact with your systematic sample of every 13th bearing, causing the data to be skewed.

*Cluster sampling*

This method takes a slightly different approach. Instead of collecting samples that are not necessarily located close to each other, cluster sampling deliberately looks at items that are close together.

Using cluster sampling, you would do the following:

· Divide the target population into clusters

· Randomly choose several clusters

· Within those randomly chosen clusters, select samples randomly or using some other valid method

Cluster sampling can be a time and money saver, depending on how spread out the entire target population is, but it does have disadvantages. Selecting items that are located close together may mean that those items are all very similar to each other and may not accurately represent the entire population. This method also has a larger sampling error than standard random sampling.

*Stratified sampling*

This method divides a population into strata (or groups) that are distinct from each other. For example, if you have a population of 500 lawn mowers you could divide them into strata based on several different characteristics, such as engine size, paint color, where they were manufactured, which shift of workers manufactured them, etc.

Once the population is divided into strata, samples are then taken from each stratum using whichever valid sampling method is preferred. Stratified sampling is very precise and accurate when the strata are distinct from each other and the items within each stratum are similar to each other in terms of the characteristic you want to measure.

From an administrative standpoint, stratified sampling is easy to implement because the strata are usually easy to separate. Additionally, it is easier to train the people who will be pulling and/or examining the samples if the items within the stratum are similar to each other.