Programming the statistical procedures from SAS

Random selection based on variable in column 1

Reply
Occasional Contributor
Posts: 7

Random selection based on variable in column 1

Hi,

I want to do a random selection based on the numeric variables (IDs) in column 1. Column 1 has a set of numbers which represent IDs. These ID numbers are repeated many times over as entries are associated with the ID numbers in column 1. I want to randomly select based on these ID numbers but then make sure we pick up all the transactions for that ID so we have a full set of entrries for the ID number randomly selected. Thanks

Example:

   ID          Other fields etc

1001   

1001

1001

1004

1004

1005

1006

1006

1006

1006

etc

I want to make sure if 1006 is selected randomly then we get all the rows/entries for ID 1006 in the random output file.

Thanks

Respected Advisor
Posts: 2,655

Re: Random selection based on variable in column 1

data have;

input ID othervar $6.;

datalines;

1001 a

1001 b

1001 c

1004 d

1004 dd

1005 e

1006 f

1006 g

1006 h

1006 i ;

/*strip down to unique ids*/

proc sort data=have out=temp1 nodupkey;

     by ID;

run;

/*generate uniform random variable for each unique ID*/

/*I'm sure PROC SURVEYSELECT would be a good choice here if you want a fixed number of IDs, but this is a little more general, and a lot less precise*/

data temp2;

     set temp1;

     ranno=ranuni(1);

/*so that this is repeatable, I specified a fixed seed*/

     if ranno>0.3;

/*You'll need to fiddle with this so that you get approximately the right number in your sample*/

     keep ID ranno;

run;

proc sql;

     create table want as select a.* from have a, temp2 b where a.ID=b.ID;

quit;

This should provide a good start...

Edited to improve readability.  I hope this works.  Cutting and pasting from the SAS editor doesn't work so nicely

Occasional Contributor
Posts: 7

Random selection based on variable in column 1

Thanks. This is my first attempt with SAS and if possible would like to know if this is doable via dialog boxes. Here is a copy of the log from the dialog box using strata based on the ID. Is there a small adjustment to be made to this log that would make it work.

/* -------------------------------------------------------------------

     Code generated by SAS Task

     Generated on: Saturday, July 02, 2011 at 4:15:00 PM

     By task: Random Sample

     Input data: WORK.Hfile1

     Server: Local

     ------------------------------------------------------------------- */

%_eg_conditional_dropds(WORK.SORTTempTableSorted, WORK.RANDRandomSampleHfile1);

PROC SORT

     DATA=WORK.Hfile1()

     OUT=WORK.SORTTempTableSorted;

     BY ID;

RUN;

PROC SURVEYSELECT DATA=WORK.SORTTempTableSorted

     OUT=WORK.RANDRandomSampleHfile1

     METHOD=SRS

     RATE=%SYSEVALF(10/100);

     STRATA ID / ALLOC=PROP;

RUN;

QUIT;

%_eg_conditional_dropds(WORK.SORTTempTableSorted);

Ask a Question
Discussion stats
  • 2 replies
  • 106 views
  • 0 likes
  • 2 in conversation