Randomly selected records

Accepted Solution Solved
Reply
Super Contributor
Super Contributor
Posts: 444
Accepted Solution

Randomly selected records

Hi guys,

I have a table of 10,000 records which has a variable “type” that has 2 values only (yes and no)

I am supposed to create 6 groups out of this table and all the selections need to be random ( at least the first 4 group have to be created by picking randomly)

Group 1 will have 100 randomly selected records with type=yes and then I label it with group=’Group1’

Group 2 will have 100 randomly selected records with type=no and gets a label group=’Group2’

Group 3 will have 100 randomly selected records with type=yes and group=’Group3’

Group 4 will have 100 randomly selected records with type=no and group=’Group4’

The remaining 9,600 records will go in group5 (type=yes) and group6 (type=no)

Records cannot overlap

Any ideas please?


Accepted Solutions
Solution
‎12-08-2014 05:04 PM
Super User
Posts: 5,372

Re: Randomly selected records

If you prefer to program this yourself, rather than using the "black box" approach of PROC SURVEYSELECT, you can do it by assigning a random number to each observation:

data want;

   set have;

   random_number = ranuni(12345);

run;

proc sort data=want;

   by type random_number;

run;

data want;

   set want;

   by type;

   if first.type then counter=1;

   else counter + 1;

   if type='yes' then do;

      if counter <= 100 then group='Group1';

      else if counter <= 200 then group='Group3';

      else group='Group5';

   end;

   else if type='no' then do;

      if counter <= 100 then group='Group2';

      else if counter <= 200 then group='Group4';

      else group='Group6';

   end;

   drop counter random_number;

run;

Good luck.

View solution in original post


All Replies
Super User
Posts: 11,144

Re: Randomly selected records

Not really seeing the difference between groups 1 / 3 and 2/4 unless you mean you need two disjoint subsets with 100 each of Yes and two sets with no.

Proc surveyselect will do this. Your strata is the variable with yes/no values (assuming NO other values), sample size is 100 and REPS = 2. The output data set will include a variable to indicate which replicate each record is in within the strata values.

If you you OUTALL you will have all the information you need to assign your group values by examining the variable SELECTED which will have values of 1 for those selected and 0 otherwise(group 5). Strata = Yes and Replicate = 1 or 2 go to Groups 1 and 3.

Adding your group variable will require a pass through a datastep.

Super Contributor
Super Contributor
Posts: 444

Re: Randomly selected records

thanks Ballardw

I heard of this Proc surveyselect a lot and   did  try to use  it but it seems like within this proc i cannot create a  new variable. Is that really the case  or did i do something wrong?

Super User
Posts: 11,144

Re: Randomly selected records

I mentioned you would have to take another pass through the data to add another variable. The only variables that surveyselect adds are related to weighting and selection, not general calculation. However the combination of strata, replicate and selected sufficiently identify the records such that depending on what will be done with the data the group variable could be moot. Any other general manipulation not involving those variables should be done prior to selection.

Solution
‎12-08-2014 05:04 PM
Super User
Posts: 5,372

Re: Randomly selected records

If you prefer to program this yourself, rather than using the "black box" approach of PROC SURVEYSELECT, you can do it by assigning a random number to each observation:

data want;

   set have;

   random_number = ranuni(12345);

run;

proc sort data=want;

   by type random_number;

run;

data want;

   set want;

   by type;

   if first.type then counter=1;

   else counter + 1;

   if type='yes' then do;

      if counter <= 100 then group='Group1';

      else if counter <= 200 then group='Group3';

      else group='Group5';

   end;

   else if type='no' then do;

      if counter <= 100 then group='Group2';

      else if counter <= 200 then group='Group4';

      else group='Group6';

   end;

   drop counter random_number;

run;

Good luck.

Super Contributor
Super Contributor
Posts: 444

Re: Randomly selected records


thanks Astounding,

thanks guys

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 259 views
  • 6 likes
  • 3 in conversation