I have data like the following (but millions of rows, this is just an excerpt):
DATA have;
input individual & $char10. firm year num;
length individual $10;
DATALINES;
100004-77 1001 2004 3
100004-77 1004 2004 3
100004-77 1006 2004 3
100004-77 1008 2004 3
100004-77 1013 2004 3
100004-77 1015 2004 3
119561-14 1003 2006 5
119561-14 1004 2006 5
119561-14 1005 2006 5
119561-14 1008 2006 5
119561-14 1009 2006 5
119561-14 1012 2006 5
119561-14 1014 2006 5
119561-14 1015 2006 5
;
RUN;
I want to randomly select a sample for each individual year group based on the number in num, e.g., for individual=100004-77 and year=2004, I want to pick a random sample of size 3. I tried the following, but it does not work:
proc surveyselect data=have out=want
seed=1234 method=srs n=num;
strata individual year;
run;
I need to automate this because I have considerably many more rows, so I can't simply create two different datasets based on each individual-year combination then manually enter the sample size. Thank you.
/*Make a dataset for this purpose.*/
DATA have;
input individual & $char10. firm year num;
length individual $10;
DATALINES;
100004-77 1001 2004 3
100004-77 1004 2004 3
100004-77 1006 2004 3
100004-77 1008 2004 3
100004-77 1013 2004 3
100004-77 1015 2004 3
119561-14 1003 2006 5
119561-14 1004 2006 5
119561-14 1005 2006 5
119561-14 1008 2006 5
119561-14 1009 2006 5
119561-14 1012 2006 5
119561-14 1014 2006 5
119561-14 1015 2006 5
;
RUN;
proc sql;
create table SampleSize as
select distinct individual,year ,num as SampleSize
from have;
quit;
proc surveyselect data=have out=want
seed=1234 method=srs n=SampleSize;
strata individual year;
run;
/*Make a dataset for this purpose.*/
DATA have;
input individual & $char10. firm year num;
length individual $10;
DATALINES;
100004-77 1001 2004 3
100004-77 1004 2004 3
100004-77 1006 2004 3
100004-77 1008 2004 3
100004-77 1013 2004 3
100004-77 1015 2004 3
119561-14 1003 2006 5
119561-14 1004 2006 5
119561-14 1005 2006 5
119561-14 1008 2006 5
119561-14 1009 2006 5
119561-14 1012 2006 5
119561-14 1014 2006 5
119561-14 1015 2006 5
;
RUN;
proc sql;
create table SampleSize as
select distinct individual,year ,num as SampleSize
from have;
quit;
proc surveyselect data=have out=want
seed=1234 method=srs n=SampleSize;
strata individual year;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.