BookmarkSubscribeRSS Feed
Anita_n
Pyrite | Level 9

Dear all, 

I wanted to know how I can select ids ramdomly from a set of dataset using this datasample as an example:

 

data test;
input pat_id 2. sex $2. age 3. var1 $2. var2 $2. var3 $2. var4 $2. year 5.;
datalines;
1  F 25 A B C D 2001
1  F 25 E F G D 2002
2  F 35 C M N D 2010
2  F 35 E F V W 2020
15 M 55 A B C D 2011
15 M 55 E F G D 2010
15 M 55 U B C D 2011
15 M 55 j F K D 2010
15 M 55 A B C D 2009
15 M 55 E Y G D 2008
15 M 55 F Y T D 2008
11 F 60 A B C D 2001
11 F 60 E F G D 2002
11 F 60 U B C D 2015
11 F 60 j F K D 2004
11 F 60 X B C D 2010
11 F 60 C M N D 2014
11 F 60 F Y T D 2008
11 F 60 S F G D 2003
11 F 60 V B G D 2012
11 F 60 K F K Q 2000
11 F 60 Z B M D 2011
11 F 60 U Y G S 2010
11 F 60 X Y T O 2009
;
run;

and I got this as solution:

proc sort data=test;
by pat_id;
run;

proc surveyselect data=test
method=srs n=5 selectall /* outall */
seed=2718 out=want(drop=SelectionProb SamplingWeight);
strata pat_id;
run;

 

if I have another question, if I wish to say if the values of var1 and var2  has previously been selected in another id (row) then this shouldn't be selected again for a different Id.

For example: Var1=A and var2=B has already be selected for id 1, then this shouldn' t be selected again for id 15( the row will not be selected)

 

pat_id sex age var1 var2 var3 var4 year
1 F 25 A B C D 2001
1 F 25 E F G D 2002
2 F 35 C M N D 2010
2 F 35 E F V W 2020
11 F 60 E F G D 2002
11 F 60 E Y G D 2014
11 F 60 F Y T D 2008
11 F 60 S F G D 2003
11 F 60 U Y G S 2010
15 M 55 A B C D 2011
15 M 55 E F G D 2010
15 M 55 U B C D 2011
15 M 55 X B C D 2009
15 M 55 F Y T D 2008

 

 

4 REPLIES 4
jimbarbour
Meteorite | Level 14

You could sort your results after your SURVEYSELECT like so:

PROC	SORT	DATA=Have
				OUT=Want
				NODUPKEY
				NOEQUALS;
	BY	Var1	Var2;
RUN;

In that way, there would be only unique combinations of Var1 and Var2 in your final results.

 

Jim 

Anita_n
Pyrite | Level 9

sorry, that didn't solve the problem. Is there any way one can state in proc survey select that if an identical observation has been previously selected it shouldn't be selected the second time independent of the ID (something like select distinct)

Patrick
Opal | Level 21

Before the introduction of Proc Surveyselect there was already some really smart SAS code around for sampling. This code gets harder and harder to find - but it's still around.

 

The way I found it today:

1. Went to the SAS knowledge base: https://support.sas.com/en/knowledge-base.html 

2. Searched for "random sample"

 

Using the data step code found under Sample 24722: Simple random sample without replacement I've added a bit of logic to skip already selected values by adding a hash lookup table to collect previously selected values. Here what I came up with.

/* WORK.EASTHIGH is a data base of student grade point averages                */
/* from East High School, Grades 9 through 12, 100 or more students per grade. */

data EastHigh;
  format GPA 3.1;
  do Grade=9 to 100;
    do StudentID=1 to 100+int(201*ranuni(432098));
      GPA=2.0 + (2.1*ranuni(34280));
      output;
    end;
  end;
run;


/*  Method 3: Using SAS DATA Step with no sort required  */

data sample3(drop=k n);

  /* change to sample code: hash to keep track of values already selected */
  if _n_=1 then
    do;
      dcl hash h1();
      h1.defineKey('grade');
      h1.defineDone();
    end;
  
  /* Initialize K to the number of sample obs needed and N to the */
  /*  total number of obs in the data set.                        */
  retain k 15 n;
  if _n_=1 then n=total;
  set EastHigh nobs=total;

  /* To randomly select the first observation for the sample, use the */
  /* fact that each obs in the data set has an equal chance of being  */
  /* selected: k/n. If a random number between 0 and 1 is less than   */
  /* or equal to k/n, we select that the first obs for our sample     */
  /* and also adjust k and the number of obs needed to complete the   */
  /* sample.                                                          */

   if ranuni(1230498) <= k/n then
    do;
      /* change to sample code: only execute below if value not already selected */
      if h1.check() ne 0 then
        do;
          h1.add();
          output;
          k=k-1;
        end;
    end;

  /* At every iteration, adjust N, the number of obs left to */
  /* sample from.                                            */
  n=n-1;

  /* Once the desired number of sample points are taken, stop iterating */
  if k=0 then stop;
run;

title "Method 3: DATA step, no sort ";
proc print data=sample3;
run;

 

Anita_n
Pyrite | Level 9

Thanks @Patrick  for the code: I could finally solve this myself using method=sys and control. 

All the same I will also try to understand your code probably for future use

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 622 views
  • 2 likes
  • 3 in conversation