BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Student77
Obsidian | Level 7

Hello, I have longitudinal data with multiple health records for multiple dates, per patient. I created a numerical variable for month to simplify (values: 1-12) and because it's for a time series.

 

For now I am trying to randomly select one patient record, per month. so it's ok to have multiple records for a person, as long as there's only one chosen per month. I've been trying to figure out proc surveyselect but cannot get it per patient/per month.

 

id  pt_id   month  ...

1  XXY      1

2  XXY      2

3  XXY      1

4  ZZH      2

5  ZZH      2

6  KKJ      3

7  KKJ      4

8  KKJ      3

9  KKJ      5

10 KKJ     5

11 KKJ     6

 

 

Any suggestions?

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

How large is your input dataset?  How many patient*month (or patient*quarter) combinations are there?

What type of variable is QUARTER (or month)?  If it is actually a date value that is using a format to display only the quarter or year-quarter using an attached format then you might have many more combinations that you thought.

Also how many of you patient*month combinations only have one observation already?

 

I would first try turning off the listing output of PROC SURVERYSELECT by adding the NOPRINT option.  Perhaps your session is just locking up because you are generating pages and pages of output.

 

If you have a lot of patient*month combinations with only one observation already the SAS LOG might get really large with the notes that PROC SURVEYSELECT generates when that happens.

 

If you are doing a SRS then perhaps just do it yourself instead.

proc sql; 
create view for_selection as 
  select *, random('uniform') as rand 
  from udt.aim2b 
  order by patid, quarter, rand
;
quit;
data aim2_random_b ;
  set for_selction;   
  by patid quarter ;
  if first.quarter;
run;

 

 

View solution in original post

9 REPLIES 9
Reeza
Super User

Please show the code you've tried in PROC SURVEYSELECT.

 


@Student77 wrote:

Hello, I have longitudinal data with multiple health records for multiple dates, per patient. I created a numerical variable for month to simplify (values: 1-12) and because it's for a time series.

 

For now I am trying to randomly select one patient record, per month. so it's ok to have multiple records for a person, as long as there's only one chosen per month. I've been trying to figure out proc surveyselect but cannot get it per patient/per month.

 

id  pt_id   month  ...

1  XXY      1

2  XXY      2

3  XXY      1

4  ZZH      2

5  ZZH      2

6  KKJ      3

7  KKJ      4

8  KKJ      3

9  KKJ      5

10 KKJ     5

11 KKJ     6

 

 

Any suggestions?

 

Thanks


 

Student77
Obsidian | Level 7
*randomly select one record per quarter to represent pt;
proc surveyselect data=udt.aim2b outall out=aim2_random_b sampsize=1;
strata patid quarter;
run;

but the samplesize I'm unsure of--either way, every time I try to run it'll freeze after a while and shut off
Reeza
Super User
Your code shows a variable called quarter and your data provided initially has months? Does that matter?
Reeza
Super User

 

data have;
input id  pt_id $  month;
cards;
1  XXY      1
2  XXY      2
3  XXY      1
4  ZZH      2
5  ZZH      2
6  KKJ      3
7  KKJ      4
8  KKJ      3
9  KKJ      5
10 KKJ     5
11 KKJ     6
;;;;

proc sort data=have;
by month;
run;

proc surveyselect data=have method=srs sampsize=1 out=want;
strata month;
run;

 

 

You need to remove the patid from the STRATA statement. Otherwise you're saying pick one per patient per month. 

 

Student77
Obsidian | Level 7
When I used this code, it gave me back just one observation - I think that's what sampsize=1 means.

I'm trying to get data showing for ex:
1 XXY 1
2 XXY 2
4 ZZH 2
6 KKJ 3
7 KKJ 4
9 KKJ 5
11 KKJ 6


So a single patient can be included multiple times but once PER month.
and I'm trying to randomly select the one per month, per patient
Tom
Super User Tom
Super User

How large is your input dataset?  How many patient*month (or patient*quarter) combinations are there?

What type of variable is QUARTER (or month)?  If it is actually a date value that is using a format to display only the quarter or year-quarter using an attached format then you might have many more combinations that you thought.

Also how many of you patient*month combinations only have one observation already?

 

I would first try turning off the listing output of PROC SURVERYSELECT by adding the NOPRINT option.  Perhaps your session is just locking up because you are generating pages and pages of output.

 

If you have a lot of patient*month combinations with only one observation already the SAS LOG might get really large with the notes that PROC SURVEYSELECT generates when that happens.

 

If you are doing a SRS then perhaps just do it yourself instead.

proc sql; 
create view for_selection as 
  select *, random('uniform') as rand 
  from udt.aim2b 
  order by patid, quarter, rand
;
quit;
data aim2_random_b ;
  set for_selction;   
  by patid quarter ;
  if first.quarter;
run;

 

 

Student77
Obsidian | Level 7
It's about 60,000 records and I would say that about 50% already only have 1 per patient per month. (sorry ignore "quarter")
I had already tried noprint, because yes it kept printing endless log info. I created an indicator for month instead of dealing with date formats.

I tried this code and I got:
"
ERROR: Function RANDOM could not be located.
ERROR: SQL View WORK.FOR_SELECTION could not be processed.
"
Is this an issue with my SAS?

Thanks for your help

Student77
Obsidian | Level 7
This worked! just needed to change "random()" to rand()

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1392 views
  • 1 like
  • 3 in conversation