BookmarkSubscribeRSS Feed
Dave
Calcite | Level 5

Hi,

I want to do a random selection based on the numeric variables (IDs) in column 1. Column 1 has a set of numbers which represent IDs. These ID numbers are repeated many times over as entries are associated with the ID numbers in column 1. I want to randomly select based on these ID numbers but then make sure we pick up all the transactions for that ID so we have a full set of entrries for the ID number randomly selected. Thanks

Example:

   ID          Other fields etc

1001   

1001

1001

1004

1004

1005

1006

1006

1006

1006

etc

I want to make sure if 1006 is selected randomly then we get all the rows/entries for ID 1006 in the random output file.

Thanks

2 REPLIES 2
SteveDenham
Jade | Level 19

data have;

input ID othervar $6.;

datalines;

1001 a

1001 b

1001 c

1004 d

1004 dd

1005 e

1006 f

1006 g

1006 h

1006 i ;

/*strip down to unique ids*/

proc sort data=have out=temp1 nodupkey;

     by ID;

run;

/*generate uniform random variable for each unique ID*/

/*I'm sure PROC SURVEYSELECT would be a good choice here if you want a fixed number of IDs, but this is a little more general, and a lot less precise*/

data temp2;

     set temp1;

     ranno=ranuni(1);

/*so that this is repeatable, I specified a fixed seed*/

     if ranno>0.3;

/*You'll need to fiddle with this so that you get approximately the right number in your sample*/

     keep ID ranno;

run;

proc sql;

     create table want as select a.* from have a, temp2 b where a.ID=b.ID;

quit;

This should provide a good start...

Edited to improve readability.  I hope this works.  Cutting and pasting from the SAS editor doesn't work so nicely

Dave
Calcite | Level 5

Thanks. This is my first attempt with SAS and if possible would like to know if this is doable via dialog boxes. Here is a copy of the log from the dialog box using strata based on the ID. Is there a small adjustment to be made to this log that would make it work.

/* -------------------------------------------------------------------

     Code generated by SAS Task

     Generated on: Saturday, July 02, 2011 at 4:15:00 PM

     By task: Random Sample

     Input data: WORK.Hfile1

     Server: Local

     ------------------------------------------------------------------- */

%_eg_conditional_dropds(WORK.SORTTempTableSorted, WORK.RANDRandomSampleHfile1);

PROC SORT

     DATA=WORK.Hfile1()

     OUT=WORK.SORTTempTableSorted;

     BY ID;

RUN;

PROC SURVEYSELECT DATA=WORK.SORTTempTableSorted

     OUT=WORK.RANDRandomSampleHfile1

     METHOD=SRS

     RATE=%SYSEVALF(10/100);

     STRATA ID / ALLOC=PROP;

RUN;

QUIT;

%_eg_conditional_dropds(WORK.SORTTempTableSorted);

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 2579 views
  • 0 likes
  • 2 in conversation