BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DeepakSwain
Pyrite | Level 9

Hi there, 

I need your kind help to take a subset of my dataset with a desired number of reports having specific categories. My dataset is having 25 records with 5 apple, 4 banana, 6 grape, 6 orange and 4 pears. 

I want to take a random subset which will have 2 apple, 2 banana, 3 grape, 3 orange and 3 pears. 

data have;
input id $ catgeory $;
datalines;
101 apple
102 orange
103 grape
104 grape
105 pears
106 apple
106 orange
108 banana
109 grape
110 pears
111 apple
112 orange
113 banana
114 banana
115 pears
116 apple
117 orange
118 grape
119 banana
120 orange
121 pears
122 apple
123 orange
124 grape
125 grape
;
run;

Thank you in advance for your kind help.

Swain
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

One way:

proc sort data=have;
   by catgeory;
run;

proc surveyselect data=have out=want
   sampsize=(2 2 3 3 3 ); 
   /* the order of values in the SAMPSIZE must match sorted order of the STRATA variable*/
   strata catgeory;
run;

The Strata are combinations of variable(s) and the data must be sorted by those variables.

The output data set will have the records selected along with the probability of selection and weight if needed for an analysis later. If you don't want those drop the variables SelectionProb and Samplingweight.

View solution in original post

2 REPLIES 2
ballardw
Super User

One way:

proc sort data=have;
   by catgeory;
run;

proc surveyselect data=have out=want
   sampsize=(2 2 3 3 3 ); 
   /* the order of values in the SAMPSIZE must match sorted order of the STRATA variable*/
   strata catgeory;
run;

The Strata are combinations of variable(s) and the data must be sorted by those variables.

The output data set will have the records selected along with the probability of selection and weight if needed for an analysis later. If you don't want those drop the variables SelectionProb and Samplingweight.

Reeza
Super User
Are those numbers fixed or is it roughly 50% of cases that you want by category? In that case you can still use SURVEYSELECT but specify a rate instead of the hardcoded values.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 757 views
  • 4 likes
  • 3 in conversation