- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello, I have longitudinal data with multiple health records for multiple dates, per patient. I created a numerical variable for month to simplify (values: 1-12) and because it's for a time series.
For now I am trying to randomly select one patient record, per month. so it's ok to have multiple records for a person, as long as there's only one chosen per month. I've been trying to figure out proc surveyselect but cannot get it per patient/per month.
id pt_id month ...
1 XXY 1
2 XXY 2
3 XXY 1
4 ZZH 2
5 ZZH 2
6 KKJ 3
7 KKJ 4
8 KKJ 3
9 KKJ 5
10 KKJ 5
11 KKJ 6
Any suggestions?
Thanks
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How large is your input dataset? How many patient*month (or patient*quarter) combinations are there?
What type of variable is QUARTER (or month)? If it is actually a date value that is using a format to display only the quarter or year-quarter using an attached format then you might have many more combinations that you thought.
Also how many of you patient*month combinations only have one observation already?
I would first try turning off the listing output of PROC SURVERYSELECT by adding the NOPRINT option. Perhaps your session is just locking up because you are generating pages and pages of output.
If you have a lot of patient*month combinations with only one observation already the SAS LOG might get really large with the notes that PROC SURVEYSELECT generates when that happens.
If you are doing a SRS then perhaps just do it yourself instead.
proc sql;
create view for_selection as
select *, random('uniform') as rand
from udt.aim2b
order by patid, quarter, rand
;
quit;
data aim2_random_b ;
set for_selction;
by patid quarter ;
if first.quarter;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please show the code you've tried in PROC SURVEYSELECT.
@Student77 wrote:
Hello, I have longitudinal data with multiple health records for multiple dates, per patient. I created a numerical variable for month to simplify (values: 1-12) and because it's for a time series.
For now I am trying to randomly select one patient record, per month. so it's ok to have multiple records for a person, as long as there's only one chosen per month. I've been trying to figure out proc surveyselect but cannot get it per patient/per month.
id pt_id month ...
1 XXY 1
2 XXY 2
3 XXY 1
4 ZZH 2
5 ZZH 2
6 KKJ 3
7 KKJ 4
8 KKJ 3
9 KKJ 5
10 KKJ 5
11 KKJ 6
Any suggestions?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc surveyselect data=udt.aim2b outall out=aim2_random_b sampsize=1;
strata patid quarter;
run;
but the samplesize I'm unsure of--either way, every time I try to run it'll freeze after a while and shut off
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input id pt_id $ month;
cards;
1 XXY 1
2 XXY 2
3 XXY 1
4 ZZH 2
5 ZZH 2
6 KKJ 3
7 KKJ 4
8 KKJ 3
9 KKJ 5
10 KKJ 5
11 KKJ 6
;;;;
proc sort data=have;
by month;
run;
proc surveyselect data=have method=srs sampsize=1 out=want;
strata month;
run;
You need to remove the patid from the STRATA statement. Otherwise you're saying pick one per patient per month.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm trying to get data showing for ex:
1 XXY 1
2 XXY 2
4 ZZH 2
6 KKJ 3
7 KKJ 4
9 KKJ 5
11 KKJ 6
So a single patient can be included multiple times but once PER month.
and I'm trying to randomly select the one per month, per patient
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How large is your input dataset? How many patient*month (or patient*quarter) combinations are there?
What type of variable is QUARTER (or month)? If it is actually a date value that is using a format to display only the quarter or year-quarter using an attached format then you might have many more combinations that you thought.
Also how many of you patient*month combinations only have one observation already?
I would first try turning off the listing output of PROC SURVERYSELECT by adding the NOPRINT option. Perhaps your session is just locking up because you are generating pages and pages of output.
If you have a lot of patient*month combinations with only one observation already the SAS LOG might get really large with the notes that PROC SURVEYSELECT generates when that happens.
If you are doing a SRS then perhaps just do it yourself instead.
proc sql;
create view for_selection as
select *, random('uniform') as rand
from udt.aim2b
order by patid, quarter, rand
;
quit;
data aim2_random_b ;
set for_selction;
by patid quarter ;
if first.quarter;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I had already tried noprint, because yes it kept printing endless log info. I created an indicator for month instead of dealing with date formats.
I tried this code and I got:
"
ERROR: Function RANDOM could not be located.
ERROR: SQL View WORK.FOR_SELECTION could not be processed.
"
Is this an issue with my SAS?
Thanks for your help
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Just an issue with my memory.
RAND()
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content