Hi,
I am using Proc Survey Select to create sample data from some big datasets I want to validate some data. I need a code that will allow me to increase my sample size without changing the original output. I always use a fized seed number so I can replicate my sample if needed.
For example:
I have a table X with 10 records and I do a sample of 2...
Dataset X
Field
100
120
320
560
125
888
215
214
698
563
I obtain the sample dataset Y
215
320
Now I want to increase my sample size to 4, but want to make sure the 215 and 320 are still present as part of my new sample dataset. I would like sometyhing like this:
215
320
563
100
Can anyone help me? The code that I am curently using is htis one:
PROC SURVEYSELECT
DATA=R_ALL_IDPROOFSTATUS (KEEP=ID COVERAGE)
METHOD=SRS
OUT=R_ALL_IDPROOFSTATUS
SEED=26
N=15 ;
RUN;
Start with the larger size first (N=4). To get the smaller sample, just use the first few elements of the bigger sample:
PROC SURVEYSELECT DATA=Have METHOD=SRS OUT=WantBig
SEED=26 N=4;
RUN;
data WantSmall;
set WantBig(obs=2);
run;
You can extend your sample if you have a variable (inSample below) indicating if the obs is already selected. Put those selected obs in a separate strata:
data test;
input Field inSample;
datalines;
100 0
120 0
320 1
560 0
125 0
888 0
215 1
214 0
698 0
563 0
;
proc sort data=test; by inSample; run;
%let newSampleSize=4;
proc sql;
create table sampleStrata as
select 0 as inSample, &newSampleSize - sum(inSample) as _nsize_ from test
union
select 1 as insample, sum(inSample) from test;
quit;
proc surveyselect data=test sampsize=sampleStrata out=newSample outall seed=78986;
strata inSample;
run;
proc print noobs; run;
Exclude these sample data you have already gotten from the original data . and run proc surveyselect again.
PROC SURVEYSELECT
DATA=sashelp.class
METHOD=SRS
OUT=R_ALL_IDPROOFSTATUS1
SEED=26
N=2 ;
RUN;
proc print;run;
proc sql;
create table temp as
select * from sashelp.class
except
select * from R_ALL_IDPROOFSTATUS1;
quit;
PROC SURVEYSELECT
DATA=temp
METHOD=SRS
OUT=R_ALL_IDPROOFSTATUS2
SEED=26
N=2 ;
RUN;
data want;
set R_ALL_IDPROOFSTATUS1 R_ALL_IDPROOFSTATUS2;
run;
proc print;run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.