DATA Step, Macro, Functions and more

How to select randomly observation from each 'groups'

Accepted Solution Solved
Reply
Contributor
Posts: 66
Accepted Solution

How to select randomly observation from each 'groups'

Hi all,

 

I have hospital admission data for a two year period. Example of data is showed below (i just put the variables that is connected with my question):

date                 id  

04/05/2012      1

05/07/2012      1

16/07/2012      1

07/12/2013      1

05/08/2012       2

12/10/2012       2

01/05/2012       3

06/05/2012       3

08/06/2012       3

12/10/2012       3

16/11/2012       3

01/01/2013       3

 

For each patient (id) i need to select the first and the last admission during the time period, and i need randomly select another one from the rest. For example, for the ID=1, first admission would be with date 01/05/2012 and the last admission would be 01/01/2013; how can i randomly select one more admission from the rest?

 

Thank you.


Accepted Solutions
Solution
3 weeks ago
Super User
Posts: 13,329

Re: How to select randomly observation from each 'groups'


@viollete wrote:

Do I have to use STRATA option in proc surveyselect?  there  are more patients who had more than two admissions during time period and for each patients I want to select randomly one. 


Your STRATA would be the ID variable and you tell survey select to select one per strata.

Here is my take on this problem:

data have;
 input date ddmmyy10.    id  $;
 format date ddmmyy10.;
datalines;
04/05/2012      1
05/07/2012      1
16/07/2012      1
07/12/2013      1
05/08/2012       2
12/10/2012       2
01/05/2012       3
06/05/2012       3
08/06/2012       3
12/10/2012       3
16/11/2012       3
01/01/2013       3
;
run;
proc sort data=have;
by id date;
run;

data firstlast others;
   set have;
   by id;
   if first.id or last.id then output firstlast;
   else output others;
run;

proc surveyselect data=others
     out=othersamp
     sampsize=1
     ;
   strata id;
run;

data want;
   merge firstlast
       othersamp
   ;
   by id date;
run;

Note the use of a data step to provide the example data in a form that others can use. Also the use of a code box for code using the forum {I} icon to paste code so that the message windows do not reformat code and to visually separate code from narrative.

 

 

You will end up with two additional variables with this that show the sample probability and weight for the other than first or last observations.

 

Do not come to us when you only have one record because your data only has a single observation in the starting data...

View solution in original post


All Replies
Super User
Super User
Posts: 9,427

Re: How to select randomly observation from each 'groups'

Sort the data, then in a datastep output the first and last, then the remaining run a surveyselct procedure over it:

data firstlast other;
  set have;
  by id;
  if first.id or last.id then output firstlast;
  else output other;
run;
proc surveyselect...;
run;

https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_surveyselec...

 

Contributor
Posts: 66

Re: How to select randomly observation from each 'groups'

Do I have to use STRATA option in proc surveyselect?  there  are more patients who had more than two admissions during time period and for each patients I want to select randomly one. 

Super User
Super User
Posts: 9,427

Re: How to select randomly observation from each 'groups'

You would really need to explain your issue.  Post test data in the form of a datastep, and show what the output should look like.  You can where clause the data before taking first and last, that would leave just the visits which are within the window, but not first and last to take a sampling of 1 record per id.

Solution
3 weeks ago
Super User
Posts: 13,329

Re: How to select randomly observation from each 'groups'


@viollete wrote:

Do I have to use STRATA option in proc surveyselect?  there  are more patients who had more than two admissions during time period and for each patients I want to select randomly one. 


Your STRATA would be the ID variable and you tell survey select to select one per strata.

Here is my take on this problem:

data have;
 input date ddmmyy10.    id  $;
 format date ddmmyy10.;
datalines;
04/05/2012      1
05/07/2012      1
16/07/2012      1
07/12/2013      1
05/08/2012       2
12/10/2012       2
01/05/2012       3
06/05/2012       3
08/06/2012       3
12/10/2012       3
16/11/2012       3
01/01/2013       3
;
run;
proc sort data=have;
by id date;
run;

data firstlast others;
   set have;
   by id;
   if first.id or last.id then output firstlast;
   else output others;
run;

proc surveyselect data=others
     out=othersamp
     sampsize=1
     ;
   strata id;
run;

data want;
   merge firstlast
       othersamp
   ;
   by id date;
run;

Note the use of a data step to provide the example data in a form that others can use. Also the use of a code box for code using the forum {I} icon to paste code so that the message windows do not reformat code and to visually separate code from narrative.

 

 

You will end up with two additional variables with this that show the sample probability and weight for the other than first or last observations.

 

Do not come to us when you only have one record because your data only has a single observation in the starting data...

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 132 views
  • 1 like
  • 3 in conversation