When you work on a large set of data and want to select some data from the dataset, you find it really difficult to achieve it. Sometimes it provides bias approach to your data selection. To select a random sample from a population in Excel, you need to use an unbiased approach. This article will show you how to select a random sample from a population in Excel in an effective way. I hope you’ll go through the whole article and gain more knowledge about random sampling.
What Is Random Sample Selection in Excel?
Random sample selection is one of the most popular sample selection techniques where every sample has an equal chance to be selected. A random sample selection is an unbiased selection technique from a larger population. The main advantages of random sample selection are its unbiased approach and fairly ease of application.
As random sample selection from a population in Excel is a widely used application, we find out two different ways to do it. One is using Excel formulas and the other is using the data analysis tools. Undoubtedly, both methods are very informative and easy to digest. To show these processes we take a dataset that contains candidate and their CV ID. Here, we want to select some random CV IDs to call in an interview.
1. Using RAND Function to Select Random Sample from Population in Excel
Firstly, the widely used function to select a random sample from a population in Excel is by using the RAND function. The RAND function creates a random number from o to 1.
- First, cell D5 where you want to apply the RAND function.
- Now, write down the following formula :
- Press Enter then it’ll create a random value between 0 and 1.
- Double-click on the Fill handle icon to apply the formula down the column.
Here, you’ll find a new problem. In our last image, in cell C5 random CV ID gives a value but when we apply the function down the cell, the value in cell C5 changes. Don’t worry about this because these values will continuously change until you fix them.
- Copy the Random CV ID column and paste it to another adjacent column. Here, you need to paste only values.
- As we copy column D, Random CV ID, and paste it as a value in Column D, it’ll remove the RAND function’s continuously changing approach. Now, copy the values in Column E and paste them into column D, and also delete column E because there is no need for this column.
- By doing this, we can create stable values. Now, select the range of cells C5:D16.
- Now, go to the Data tab in the ribbon and click on Sort from Sort & Filter group.
- A Sort dialog box will appear. From there set the Sort by as Random CV ID, Sort On as Cell Values, and Order as Largest to Smallest. Finally, click on ‘OK’.
- Finally, we get a random sample from the dataset. You can choose any 5 or 10 as your random sample from the list.
Read More: Excel VBA: Random Selection from List
2. Applying Excel Formula to Select Random Sample from Population
To select a random sample from a population in Excel, we can use Excel formulas. As there is no built-in function to select a random sample in Excel, you can utilize the following functions to select a random sample in Excel.
2.1 Combining RANDBETWEEN and INDEX Functions
- First, select the cell where you want to apply the RANDBETWEEN. Write the following formula in the formula box.
- Press Enter to apply the function. It’ll give a random sample from the dataset.
- To apply random sample selection down the column, you can double-click on the Fill handle. But the problem is this method will provide duplicate results which are detrimental to your selection.
As a volatile function, the RANDBETWEEN function will recalculate the random sample with every change in your worksheet. To eliminate this problem, you copy the column and paste it into a new column as a value.
🔎 Formula Breakdown
RANDBETWEEN(1,ROWS($C$5:$C$16)): Here, the RANDBETWEEN function provides a random integer between two values. For lower values, you can use the number 1, and you can utilize the ROWS function in the upper values to find out the total row count. The RANDBETWEEN function returns a random value between 1 and the total row count in the dataset.
INDEX($C$5:$C$16,RANDBETWEEN(1,ROWS($C$5:$C$16)),1): The INDEX function will extract the random row value which is returned by the RANDBETWEEN function. We put the column_num argument as 1 because we want to extract a random sample from the first column.
Read More: How to Randomly Select Rows in Excel
2.2 Merging INDEX with RANK Functions
In our last method, one of the main disadvantages is that it creates duplicate values at various times when you apply the formula to more than one cell. To eliminate this problem, we can use the RANK and INDEX functions Which provide duplicate free random sample selection.
- Firstly, use the RAND function to create a random sample between 0 and 1. Just like the first approach create the random sample using the RAND function and double click on the Fill handle icon down the column.
- As this RAND function is a volatile function, it fluctuates the value after every change to the worksheet. So, we need to copy the whole cell and paste it into another column as a value. It eliminates its volatility.
- Now, select cell E5 beside the RAND function applied cell.
- Write down the following formula:
- Press Enter to apply the formula.
- Double-click on the Fill Handle icon down the last position. Here, we take seven random samples from the dataset and all are unique values.
🔎 Formula Breakdown
RANK(D5,$D$:$D$16): The Rank function provides the rank of a random number in the same row. In our worksheet, cell E5 gets the number rank in D5 whereas, in cell E6, the number rank changes from cell D5 to D6 and provides the number rank in D6.
INDEX($C$5:$C$16,RANK(D5,$D$:$D$16),1): Now, the Rank function’s returned values are put in the INDEX function’s row_num argument and put 1 in the column_num argument to extract a random value.
2.3 Joining COUNTIF and INDEX Functions
There is a small problem in the RANK function. When you use a large dataset and you use the RANK function, you can use still get duplicate values in some cases. To solve this problem, you can use COUNTIF and INDEX functions which will provide the perfect random sample selection without any duplicate values.
- First, create the sample number by using the RAND function just like other methods. Now write the following formula in the formula box.
- Press enter to apply the formula and double-click on the Fill handle icon down the last value. You will see all the random samples are unique.
🔎 Formula Breakdown
COUNTIF($D$5:D5,D5) – 1: The COUNTIF function denotes the range and criteria. It provides a platform to count values under certain criteria. Here, we lock our first reference as $D$5 and other references change based on row. Here ‘– 1’ means to subtract the first one. This is the main function to eliminate duplicate values.
RANK.EQ(D5,$D$5:$D$16) + COUNTIF($D$5:D5,D5)-1: Here, RANK.EQ provides a rank number against other values. This formula provides the rank CV ID uniquely in descending order. Cell D5 denotes the value whose rank you want to find, whereas, $D$5:$D$16 provides the cell reference.
INDEX($C$5:$C$16,RANK.EQ(D5,$D$5:$D$16) + COUNTIF($D$5:D5,D5) – 1,1): The INDEX function will extract value from the RANK.EQ function return value.
3. Selecting Random Sample from Population Using Data Analysis Tool
Lastly, we can select a random sample using the Data Analysis tool. This tool is very helpful and easy to use.
- First, go to the File tab and select More.
- Now, in the More section, select Options.
- Excel Options dialog box will appear. From there, click on Add-ins, and in the Add-ins section, select Excel Add-ins from the Manage option and click on ‘Go’.
- In the Add-ins dialog box, click on Analysis ToolPak and click on ‘OK’.
- Now, go to the Data tab, and in the Analysis section select Data Analysis.
- Data Analysis dialog box will appear and from the Analysis Tools section, select Sampling and click on ‘OK’.
- In the Sampling dialog box, provide your input range. Select the Random sampling method and put your preferred Number of Samples and select Output Range from Output Options. Click on ‘OK’.
- It’ll select some random samples from a population in Excel.
Read More: Random Selection Based on Criteria in Excel
Download Practice Workbook
Download this practice workbook.
We have discussed three useful methods to select a random sample from a population in Excel. All the methods are really helpful and easy to digest. I hope you enjoy this article.