Hello, I have a large data set and I want to get random samples with each ID, ACCN, ACCT_CNT, and group Please see the sample below
One ID can have multiple ACCN and ACCT_CNT is counts per each ID.
ID | ID 2 | ACCT_CNT | GROUP |
1 | 123 | 1 | |
2 | 1234 | 1 | |
2 | 1235 | 2 | |
5 | 1789 | 1 | |
5 | 1790 | 2 | |
3 | 1236 | 1 | |
3 | 1237 | 2 | |
3 | 1238 | 3 | |
4 | 1561 | 1 | D |
4 | 1562 | 2 | D |
4 | 1563 | 3 | D |
4 | 1564 | 4 | D |
4 | 1565 | 5 | D |
6 | 1910 | 1 | D |
6 | 1920 | 2 | D |
6 | 1930 | 3 | D |
6 | 1940 | 4 | D |
6 | 1950 | 5 | D |
6 | 1960 | 6 | D |
I tried proc survey select to get random samples by each ID and acct_cnt but I couldn't figure it out.
did you try the
OUTALL option
includes all observations from the sampling frame in the OUT= output data set. By default, the output data set includes only those units selected for the sample. When you specify the OUTALL option, the output data set includes all observations in the sampling frame along with a variable (Selected
) that indicates each observation’s selection status. The value of Selected
is 1 for an observation that is selected or 0 for an observation that is not selected. For information about the contents of the output data set, see the section Sample Output Data Set.
The OUTALL option is available for equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS, METHOD=SEQ, and METHOD=BERNOULLI). and for METHOD=POISSON.
If you specify a sample size of 0 for a stratum, PROC SURVEYSELECT omits this stratum from the sampling frame. By default, PROC SURVEYSELECT also omits this stratum from the output data set when you specify the OUTALL option. You can specify the OUTALL(ZEROSTRATA) option to include strata that have sample sizes of 0 in the output data set. For more information, see the description of the SAMPSIZE= option.
I tried but then in the output I would get random samples but not complete. for ex, from the table ID 4 would only one record would get printed. I want to print all 5 obs printed in the random samples from that group.
ID | ID 2 | ACCT_CNT | GROUP |
1 | 123 | 1 | |
2 | 1234 | 1 | |
2 | 1235 | 2 | |
5 | 1789 | 1 | |
5 | 1790 | 2 | |
3 | 1236 | 1 | |
3 | 1237 | 2 | |
3 | 1238 | 3 | |
4 | 1561 | 1 | D |
4 | 1562 | 2 | D |
4 | 1563 | 3 | D |
4 | 1564 | 4 | D |
4 | 1565 | 5 | D |
6 | 1910 | 1 | D |
6 | 1920 | 2 | D |
6 | 1930 | 3 | D |
6 | 1940 | 4 | D |
6 | 1950 | 5 | D |
6 | 1960 | 6 | D |
did you try the
OUTALL option
includes all observations from the sampling frame in the OUT= output data set. By default, the output data set includes only those units selected for the sample. When you specify the OUTALL option, the output data set includes all observations in the sampling frame along with a variable (Selected
) that indicates each observation’s selection status. The value of Selected
is 1 for an observation that is selected or 0 for an observation that is not selected. For information about the contents of the output data set, see the section Sample Output Data Set.
The OUTALL option is available for equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS, METHOD=SEQ, and METHOD=BERNOULLI). and for METHOD=POISSON.
If you specify a sample size of 0 for a stratum, PROC SURVEYSELECT omits this stratum from the sampling frame. By default, PROC SURVEYSELECT also omits this stratum from the output data set when you specify the OUTALL option. You can specify the OUTALL(ZEROSTRATA) option to include strata that have sample sizes of 0 in the output data set. For more information, see the description of the SAMPSIZE= option.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.