Hi there,
I need your kind help to take a subset of my dataset with a desired number of reports having specific categories. My dataset is having 25 records with 5 apple, 4 banana, 6 grape, 6 orange and 4 pears.
I want to take a random subset which will have 2 apple, 2 banana, 3 grape, 3 orange and 3 pears.
data have;
input id $ catgeory $;
datalines;
101 apple
102 orange
103 grape
104 grape
105 pears
106 apple
106 orange
108 banana
109 grape
110 pears
111 apple
112 orange
113 banana
114 banana
115 pears
116 apple
117 orange
118 grape
119 banana
120 orange
121 pears
122 apple
123 orange
124 grape
125 grape
;
run;
Thank you in advance for your kind help.
One way:
proc sort data=have; by catgeory; run; proc surveyselect data=have out=want sampsize=(2 2 3 3 3 ); /* the order of values in the SAMPSIZE must match sorted order of the STRATA variable*/ strata catgeory; run;
The Strata are combinations of variable(s) and the data must be sorted by those variables.
The output data set will have the records selected along with the probability of selection and weight if needed for an analysis later. If you don't want those drop the variables SelectionProb and Samplingweight.
One way:
proc sort data=have; by catgeory; run; proc surveyselect data=have out=want sampsize=(2 2 3 3 3 ); /* the order of values in the SAMPSIZE must match sorted order of the STRATA variable*/ strata catgeory; run;
The Strata are combinations of variable(s) and the data must be sorted by those variables.
The output data set will have the records selected along with the probability of selection and weight if needed for an analysis later. If you don't want those drop the variables SelectionProb and Samplingweight.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.