BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Anita_n
Pyrite | Level 9

Dear all, 

I have this sample dataset :

 

data test;
input pat_id 2. sex $2. age 3. var1 $2. var2 $2. var3 $2. var4 $2. year 5.;
datalines;
1  F 25 A B C D 2001
1  F 25 E F G D 2002
2  F 35 C M N D 2010
2  F 35 E F V W 2020
15 M 55 A B C D 2011
15 M 55 E F G D 2010
15 M 55 U B C D 2011
15 M 55 j F K D 2010
15 M 55 X B C D 2009
15 M 55 E Y G D 2008
15 M 55 F Y T D 2008
11 F 60 A B C D 2001
11 F 60 E F G D 2002
11 F 60 U B C D 2015
11 F 60 j F K D 2004
11 F 60 X B C D 2010
11 F 60 E Y G D 2014
11 F 60 F Y T D 2008
11 F 60 S F G D 2003 11 F 60 V B G D 2012 11 F 60 K F K Q 2000 11 F 60 Z B M D 2011 11 F 60 U Y G S 2010 11 F 60 X Y T O 2009 ; run;

 

I wish to select firstly, all IDs which occurs less than 5 times with corresponding variables in another dataset(newly created). After that I wish to select randomly from the list of the ids which occurs more than 5 times 5 ids with corresponding variables. 

A row should not be selected twice. If already selected it should be flagged so that. it wouldn't be selected again.

Any help?

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @Anita_n,

 


@Anita_n wrote:

I wish to select firstly, all IDs which occurs less than 5 times with corresponding variables in another dataset(newly created). After that I wish to select randomly from the list of the ids which occurs more than 5 times 5 ids with corresponding variables.


I guess you mean 5 observations, not 5 IDs, from those IDs with more than 5 observations and also that you want all 5 observations from those IDs with exactly 5 observations (if there were any). In this case PROC SURVEYSELECT with the SELECTALL option (and the selection method "simple random sampling," which is the default) meets the requirements:

proc sort data=test;
by pat_id;
run;

proc surveyselect data=test
method=srs n=5 selectall /* outall */
seed=2718 out=want(drop=SelectionProb SamplingWeight);
strata pat_id;
run;

Use the OUTALL option (commented out above) if you want all observations from TEST in dataset WANT with a 0-1 flag variable (named Selected) indicating whether or not an observation belongs to the random sample. Otherwise, WANT contains only the random sample.

View solution in original post

3 REPLIES 3
Tom
Super User Tom
Super User

So you want to stratify your sample based on number of times that they appear?

You can use PROC FREQ (or PROC SQL) to count the number of observations per ID.

proc freq data=test;
  tables id / noprint out=count;
run;

Do you just want to sample from the unique ID list?

Or do you want to sample observations from the original file?

 

Show an example of the output you want for the given example input.

 

FreelanceReinh
Jade | Level 19

Hello @Anita_n,

 


@Anita_n wrote:

I wish to select firstly, all IDs which occurs less than 5 times with corresponding variables in another dataset(newly created). After that I wish to select randomly from the list of the ids which occurs more than 5 times 5 ids with corresponding variables.


I guess you mean 5 observations, not 5 IDs, from those IDs with more than 5 observations and also that you want all 5 observations from those IDs with exactly 5 observations (if there were any). In this case PROC SURVEYSELECT with the SELECTALL option (and the selection method "simple random sampling," which is the default) meets the requirements:

proc sort data=test;
by pat_id;
run;

proc surveyselect data=test
method=srs n=5 selectall /* outall */
seed=2718 out=want(drop=SelectionProb SamplingWeight);
strata pat_id;
run;

Use the OUTALL option (commented out above) if you want all observations from TEST in dataset WANT with a 0-1 flag variable (named Selected) indicating whether or not an observation belongs to the random sample. Otherwise, WANT contains only the random sample.

Anita_n
Pyrite | Level 9

@FreelanceReinh: Thanks, that is exactly what I wanted

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1768 views
  • 2 likes
  • 3 in conversation