Hi guys,
I have a table of 10,000 records which has a variable “type” that has 2 values only (yes and no)
I am supposed to create 6 groups out of this table and all the selections need to be random ( at least the first 4 group have to be created by picking randomly)
Group 1 will have 100 randomly selected records with type=yes and then I label it with group=’Group1’
Group 2 will have 100 randomly selected records with type=no and gets a label group=’Group2’
Group 3 will have 100 randomly selected records with type=yes and group=’Group3’
Group 4 will have 100 randomly selected records with type=no and group=’Group4’
The remaining 9,600 records will go in group5 (type=yes) and group6 (type=no)
Records cannot overlap
Any ideas please?
If you prefer to program this yourself, rather than using the "black box" approach of PROC SURVEYSELECT, you can do it by assigning a random number to each observation:
data want;
set have;
random_number = ranuni(12345);
run;
proc sort data=want;
by type random_number;
run;
data want;
set want;
by type;
if first.type then counter=1;
else counter + 1;
if type='yes' then do;
if counter <= 100 then group='Group1';
else if counter <= 200 then group='Group3';
else group='Group5';
end;
else if type='no' then do;
if counter <= 100 then group='Group2';
else if counter <= 200 then group='Group4';
else group='Group6';
end;
drop counter random_number;
run;
Good luck.
Not really seeing the difference between groups 1 / 3 and 2/4 unless you mean you need two disjoint subsets with 100 each of Yes and two sets with no.
Proc surveyselect will do this. Your strata is the variable with yes/no values (assuming NO other values), sample size is 100 and REPS = 2. The output data set will include a variable to indicate which replicate each record is in within the strata values.
If you you OUTALL you will have all the information you need to assign your group values by examining the variable SELECTED which will have values of 1 for those selected and 0 otherwise(group 5). Strata = Yes and Replicate = 1 or 2 go to Groups 1 and 3.
Adding your group variable will require a pass through a datastep.
thanks Ballardw
I heard of this Proc surveyselect a lot and did try to use it but it seems like within this proc i cannot create a new variable. Is that really the case or did i do something wrong?
I mentioned you would have to take another pass through the data to add another variable. The only variables that surveyselect adds are related to weighting and selection, not general calculation. However the combination of strata, replicate and selected sufficiently identify the records such that depending on what will be done with the data the group variable could be moot. Any other general manipulation not involving those variables should be done prior to selection.
If you prefer to program this yourself, rather than using the "black box" approach of PROC SURVEYSELECT, you can do it by assigning a random number to each observation:
data want;
set have;
random_number = ranuni(12345);
run;
proc sort data=want;
by type random_number;
run;
data want;
set want;
by type;
if first.type then counter=1;
else counter + 1;
if type='yes' then do;
if counter <= 100 then group='Group1';
else if counter <= 200 then group='Group3';
else group='Group5';
end;
else if type='no' then do;
if counter <= 100 then group='Group2';
else if counter <= 200 then group='Group4';
else group='Group6';
end;
drop counter random_number;
run;
Good luck.
thanks Astounding,
thanks guys
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.