BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

Hi guys,

I have a table of 10,000 records which has a variable “type” that has 2 values only (yes and no)

I am supposed to create 6 groups out of this table and all the selections need to be random ( at least the first 4 group have to be created by picking randomly)

Group 1 will have 100 randomly selected records with type=yes and then I label it with group=’Group1’

Group 2 will have 100 randomly selected records with type=no and gets a label group=’Group2’

Group 3 will have 100 randomly selected records with type=yes and group=’Group3’

Group 4 will have 100 randomly selected records with type=no and group=’Group4’

The remaining 9,600 records will go in group5 (type=yes) and group6 (type=no)

Records cannot overlap

Any ideas please?

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

If you prefer to program this yourself, rather than using the "black box" approach of PROC SURVEYSELECT, you can do it by assigning a random number to each observation:

data want;

   set have;

   random_number = ranuni(12345);

run;

proc sort data=want;

   by type random_number;

run;

data want;

   set want;

   by type;

   if first.type then counter=1;

   else counter + 1;

   if type='yes' then do;

      if counter <= 100 then group='Group1';

      else if counter <= 200 then group='Group3';

      else group='Group5';

   end;

   else if type='no' then do;

      if counter <= 100 then group='Group2';

      else if counter <= 200 then group='Group4';

      else group='Group6';

   end;

   drop counter random_number;

run;

Good luck.

View solution in original post

5 REPLIES 5
ballardw
Super User

Not really seeing the difference between groups 1 / 3 and 2/4 unless you mean you need two disjoint subsets with 100 each of Yes and two sets with no.

Proc surveyselect will do this. Your strata is the variable with yes/no values (assuming NO other values), sample size is 100 and REPS = 2. The output data set will include a variable to indicate which replicate each record is in within the strata values.

If you you OUTALL you will have all the information you need to assign your group values by examining the variable SELECTED which will have values of 1 for those selected and 0 otherwise(group 5). Strata = Yes and Replicate = 1 or 2 go to Groups 1 and 3.

Adding your group variable will require a pass through a datastep.

Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

thanks Ballardw

I heard of this Proc surveyselect a lot and   did  try to use  it but it seems like within this proc i cannot create a  new variable. Is that really the case  or did i do something wrong?

ballardw
Super User

I mentioned you would have to take another pass through the data to add another variable. The only variables that surveyselect adds are related to weighting and selection, not general calculation. However the combination of strata, replicate and selected sufficiently identify the records such that depending on what will be done with the data the group variable could be moot. Any other general manipulation not involving those variables should be done prior to selection.

Astounding
PROC Star

If you prefer to program this yourself, rather than using the "black box" approach of PROC SURVEYSELECT, you can do it by assigning a random number to each observation:

data want;

   set have;

   random_number = ranuni(12345);

run;

proc sort data=want;

   by type random_number;

run;

data want;

   set want;

   by type;

   if first.type then counter=1;

   else counter + 1;

   if type='yes' then do;

      if counter <= 100 then group='Group1';

      else if counter <= 200 then group='Group3';

      else group='Group5';

   end;

   else if type='no' then do;

      if counter <= 100 then group='Group2';

      else if counter <= 200 then group='Group4';

      else group='Group6';

   end;

   drop counter random_number;

run;

Good luck.

Tal
Pyrite | Level 9 Tal
Pyrite | Level 9


thanks Astounding,

thanks guys

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 627 views
  • 6 likes
  • 3 in conversation