BookmarkSubscribeRSS Feed
ismahero2
Obsidian | Level 7

Hi,

 

I am using Proc Survey Select to create sample data from some big datasets I want to validate some data. I need a code that will allow me to increase my sample size without changing the original output.  I always use a fized seed number so I can replicate my sample if needed.

For example:

I have a table X with 10 records and I do a sample of 2...

Dataset X

Field

100

120

320

560

125

888

215

214

698

563

 

I obtain the sample dataset Y

215

320

 

Now I want to increase my sample size to 4, but want to make sure the 215 and 320 are still present as part of my new sample dataset.  I would like sometyhing like this:

215

320

563

100

 

Can anyone help me?  The code that I am curently using is htis one:

PROC SURVEYSELECT

DATA=R_ALL_IDPROOFSTATUS (KEEP=ID COVERAGE)

METHOD=SRS

OUT=R_ALL_IDPROOFSTATUS

SEED=26

N=15 ;

RUN;

3 REPLIES 3
Rick_SAS
SAS Super FREQ

Start with the larger size first (N=4). To get the smaller sample, just use the first few elements of the bigger sample:

 

PROC SURVEYSELECT DATA=Have METHOD=SRS OUT=WantBig
SEED=26 N=4; 
RUN;

data WantSmall;
set WantBig(obs=2);
run;
PGStats
Opal | Level 21

You can extend your sample if you have a variable (inSample below) indicating if the obs is already selected. Put those selected obs in a separate strata:

 

data test;
input Field inSample;
datalines;
100 0
120 0 
320 1
560 0
125 0
888 0
215 1 
214 0 
698 0
563 0
;

proc sort data=test; by inSample; run;

%let newSampleSize=4;

proc sql;
create table sampleStrata as
select 0 as inSample, &newSampleSize - sum(inSample) as _nsize_ from test
union
select 1 as insample, sum(inSample) from test;
quit;

proc surveyselect data=test sampsize=sampleStrata out=newSample outall seed=78986;
strata inSample;
run;

proc print noobs; run;
PG
Ksharp
Super User

Exclude these sample data you have already gotten from the original data . and run proc surveyselect again.

 

 

 

PROC SURVEYSELECT
DATA=sashelp.class
METHOD=SRS
OUT=R_ALL_IDPROOFSTATUS1
SEED=26
N=2 ;
RUN;
proc print;run;





proc sql;
create table temp as
 select * from sashelp.class
 except
 select * from R_ALL_IDPROOFSTATUS1;
quit;
PROC SURVEYSELECT
DATA=temp
METHOD=SRS
OUT=R_ALL_IDPROOFSTATUS2
SEED=26
N=2 ;
RUN;
data want;
 set R_ALL_IDPROOFSTATUS1 R_ALL_IDPROOFSTATUS2;
run;
proc print;run;

 

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1547 views
  • 0 likes
  • 4 in conversation