BookmarkSubscribeRSS Feed
jerry898969
Pyrite | Level 9
I'm having a hard time grasping how to use SURVEYSELECT.

I have a data table that I need to take the top 20 rows then a random sample row out of 10 row chunks till my entire dataset has been gone through.

So I have the first 20 then from 21 to 30 I need to randomly pick a row. Then from 31 to 40 I need to randomly pick another row and so on till it goes through my dataset.

Is this possible use SURVEYSELECT?

Any help would be greatly appreciated

Thank You
6 REPLIES 6
data_null__
Jade | Level 19
I think you need to stratify the observations into first 20 then by 10s until the EOF. You will also need a SAMPSIZE data set with _NSIZE_ and the strata variable. I think this is what you want.

[pre]
data shoes size(keep=strata _nsize_);
strata = 1;
_nsize_ = 20;
output size;
do i = 1 to _nsize_ until(eof);
link set;
output shoes;
end;
_nsize_ = 1;
do i = 1 by 1 until(eof);
link set;
if mod(i,10) eq 1 then do;
strata + 1;
output size;
end;
output shoes;
end;
stop;
set:
set sashelp.shoes end=eof;
return;
run;
proc print data=size;
run;

proc surveyselect data=shoes sampsize=size;
strata strata;
run;
proc print;
run;
[/pre]
jerry898969
Pyrite | Level 9
_null_,
Thank you so much

I just got out of a meeting where they changed the requirements. Now I have to take the first 20 then the first row out of 10 row chunks.

So first 20, 21,31,41 till the end. Do I still need to use SurveySelect to do this or can I do it a more simple way?

Thank you again for your quick response.
data_null__
Jade | Level 19
You don't need surveyselect if you don't need a random sample.

[pre]
dm 'clear log; clear output;';
data sample;
do i = 1 to 20;
link set;
output;
end;
do i = 21 by 10 while(i le nobs);
link set;
output;
end;
stop;
set:
set sashelp.shoes point=i nobs=nobs;
return;
run;
proc print;
run;
[/pre]
jerry898969
Pyrite | Level 9
_null_,
Thank you for your reply.

I have another issue that just popped up. How can I do this based on a user column?

I have a user column and I have to take the top 20 rows per user and the first row of every 10 for each user.

I do apologize for my newbie questions.

Thanks for all the help
data_null__
Jade | Level 19
The has to read all the obs in the data but if you don't have a huge data set it may be good enough. I you want to just read the selected obs let me know, but that is a little harder.

[pre]
data sample2;
do i=1 by 1 until(last.region);
set sashelp.shoes;
by region;
if 1 le i le 20 or mod(i,10) eq 1 then output;
end;
run;
[/pre]
jerry898969
Pyrite | Level 9
_null_,
Thank you so much for your help. I only had a bit to test it but it seems to be dead on. My full set is only around 1100 rows so it's small enough for this to work good.


Thanks again

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1305 views
  • 0 likes
  • 2 in conversation