I have data like the following (but millions of rows, this is just an excerpt):
DATA have;
input individual & $char10. firm year num;
length individual $10;
DATALINES;
100004-77 1001 2004 3
100004-77 1004 2004 3
100004-77 1006 2004 3
100004-77 1008 2004 3
100004-77 1013 2004 3
100004-77 1015 2004 3
119561-14 1003 2006 5
119561-14 1004 2006 5
119561-14 1005 2006 5
119561-14 1008 2006 5
119561-14 1009 2006 5
119561-14 1012 2006 5
119561-14 1014 2006 5
119561-14 1015 2006 5
;
RUN;
I want to randomly select a sample for each individual year group based on the number in num, e.g., for individual=100004-77 and year=2004, I want to pick a random sample of size 3. I tried the following, but it does not work:
proc surveyselect data=have out=want
seed=1234 method=srs n=num;
strata individual year;
run;
I need to automate this because I have considerably many more rows, so I can't simply create two different datasets based on each individual-year combination then manually enter the sample size. Thank you.
/*Make a dataset for this purpose.*/
DATA have;
input individual & $char10. firm year num;
length individual $10;
DATALINES;
100004-77 1001 2004 3
100004-77 1004 2004 3
100004-77 1006 2004 3
100004-77 1008 2004 3
100004-77 1013 2004 3
100004-77 1015 2004 3
119561-14 1003 2006 5
119561-14 1004 2006 5
119561-14 1005 2006 5
119561-14 1008 2006 5
119561-14 1009 2006 5
119561-14 1012 2006 5
119561-14 1014 2006 5
119561-14 1015 2006 5
;
RUN;
proc sql;
create table SampleSize as
select distinct individual,year ,num as SampleSize
from have;
quit;
proc surveyselect data=have out=want
seed=1234 method=srs n=SampleSize;
strata individual year;
run;
/*Make a dataset for this purpose.*/
DATA have;
input individual & $char10. firm year num;
length individual $10;
DATALINES;
100004-77 1001 2004 3
100004-77 1004 2004 3
100004-77 1006 2004 3
100004-77 1008 2004 3
100004-77 1013 2004 3
100004-77 1015 2004 3
119561-14 1003 2006 5
119561-14 1004 2006 5
119561-14 1005 2006 5
119561-14 1008 2006 5
119561-14 1009 2006 5
119561-14 1012 2006 5
119561-14 1014 2006 5
119561-14 1015 2006 5
;
RUN;
proc sql;
create table SampleSize as
select distinct individual,year ,num as SampleSize
from have;
quit;
proc surveyselect data=have out=want
seed=1234 method=srs n=SampleSize;
strata individual year;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.