DATA Step, Macro, Functions and more

random sample from another data set

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

random sample from another data set

[ Edited ]

 

 

I have no idea how to get started on this. I was thinking of creating an array with DO LOOPS to get the observation but not sure 

where to start it as I am very new to SAS.

 

Please let me know if you need anything else. I will be attaching a sample of my code soon.

 

Thanks!


Accepted Solutions
Solution
Friday
PROC Star
Posts: 851

Re: random sample from another data set

[ Edited ]

I would probably do this in two distinct PROC SURVEYSELECT steps like below.

 

I made example data set from sashelp.cars and made arbitrary mileage categories for demonstration purposes

 

data cars;
	set sashelp.cars;
	length mileage $20;
	if mpg_city<16 then mileage='Bad';
	else if 16<=mpg_city<25 then mileage='Fair';
	else if 25<=mpg_city<30 then mileage='Good';
	else mileage='Excellent';
	do i=1 to 100; output; end;
	drop i;
run;

proc sort data=cars;
	by mileage;
run;

proc surveyselect data=cars out=sample1 method=srs n=200 noprint;
	where mileage in ('Bad', 'Excellent');
	strata mileage;
run;

proc surveyselect data=cars out=sample2 method=srs samprate=0.01 noprint;
	where mileage in ('Fair', 'Good');
	strata mileage;
run;

data finalsample;
	set sample1 sample2;
run;

 

View solution in original post


All Replies
Highlighted
Super User
Posts: 20,222

Re: random sample from another data set

Try PROC SURVEYSELECT

 

 

Respected Advisor
Posts: 4,973

Re: random sample from another data set

Look at the strata statement and the Secondary input dataset feature of proc surveyselect.

PG
Solution
Friday
PROC Star
Posts: 851

Re: random sample from another data set

[ Edited ]

I would probably do this in two distinct PROC SURVEYSELECT steps like below.

 

I made example data set from sashelp.cars and made arbitrary mileage categories for demonstration purposes

 

data cars;
	set sashelp.cars;
	length mileage $20;
	if mpg_city<16 then mileage='Bad';
	else if 16<=mpg_city<25 then mileage='Fair';
	else if 25<=mpg_city<30 then mileage='Good';
	else mileage='Excellent';
	do i=1 to 100; output; end;
	drop i;
run;

proc sort data=cars;
	by mileage;
run;

proc surveyselect data=cars out=sample1 method=srs n=200 noprint;
	where mileage in ('Bad', 'Excellent');
	strata mileage;
run;

proc surveyselect data=cars out=sample2 method=srs samprate=0.01 noprint;
	where mileage in ('Fair', 'Good');
	strata mileage;
run;

data finalsample;
	set sample1 sample2;
run;

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 105 views
  • 0 likes
  • 4 in conversation