Hi,
I want to do a random selection based on the numeric variables (IDs) in column 1. Column 1 has a set of numbers which represent IDs. These ID numbers are repeated many times over as entries are associated with the ID numbers in column 1. I want to randomly select based on these ID numbers but then make sure we pick up all the transactions for that ID so we have a full set of entrries for the ID number randomly selected. Thanks
Example:
ID Other fields etc
1001
1001
1001
1004
1004
1005
1006
1006
1006
1006
etc
I want to make sure if 1006 is selected randomly then we get all the rows/entries for ID 1006 in the random output file.
Thanks
data have;
input ID othervar $6.;
datalines;
1001 a
1001 b
1001 c
1004 d
1004 dd
1005 e
1006 f
1006 g
1006 h
1006 i ;
/*strip down to unique ids*/
proc sort data=have out=temp1 nodupkey;
by ID;
run;
/*generate uniform random variable for each unique ID*/
/*I'm sure PROC SURVEYSELECT would be a good choice here if you want a fixed number of IDs, but this is a little more general, and a lot less precise*/
data temp2;
set temp1;
ranno=ranuni(1);
/*so that this is repeatable, I specified a fixed seed*/
if ranno>0.3;
/*You'll need to fiddle with this so that you get approximately the right number in your sample*/
keep ID ranno;
run;
proc sql;
create table want as select a.* from have a, temp2 b where a.ID=b.ID;
quit;
This should provide a good start...
Edited to improve readability. I hope this works. Cutting and pasting from the SAS editor doesn't work so nicely
Thanks. This is my first attempt with SAS and if possible would like to know if this is doable via dialog boxes. Here is a copy of the log from the dialog box using strata based on the ID. Is there a small adjustment to be made to this log that would make it work.
/* -------------------------------------------------------------------
Code generated by SAS Task
Generated on: Saturday, July 02, 2011 at 4:15:00 PM
By task: Random Sample
Input data: WORK.Hfile1
Server: Local
------------------------------------------------------------------- */
%_eg_conditional_dropds(WORK.SORTTempTableSorted, WORK.RANDRandomSampleHfile1);
PROC SORT
DATA=WORK.Hfile1()
OUT=WORK.SORTTempTableSorted;
BY ID;
RUN;
PROC SURVEYSELECT DATA=WORK.SORTTempTableSorted
OUT=WORK.RANDRandomSampleHfile1
METHOD=SRS
RATE=%SYSEVALF(10/100);
STRATA ID / ALLOC=PROP;
RUN;
QUIT;
%_eg_conditional_dropds(WORK.SORTTempTableSorted);
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.