Hello Everyone!
In my below sample, I would like to find the previous date of lab sample collected on Sdate. After that, retrieve all invalid records where previous date is greater than sample sdate.
I am not getting proper result. Lag function is not working when PID got changed.
I have some done error in the programming. can anybody help me.
data hospital;
input patno serial lab $ Sdate :date9. visit;
format sdate date9.;
cards;
1001 1 HCG 25JAN2011 1
1001 2 HCG 20JAN2011 2
1001 3 HCG . 5
1001 2 SBP 24MAR2011 2
1001 1 SBP 20MAR2011 2
1002 1 SBP . 2
1002 3 SBP 19FEB2012 8
1002 2 SBP 16FEB2012 4
1003 1 DBP 25MAR2011 4
1003 2 DBP . 5
1003 3 DBP 15MAR2011 6
1003 4 DBP . 8
1004 1 CPK 12MAR2011 1
1004 2 CPK 15MAR2011 4
1005 1 LDL 18JUN2012 1
1005 2 LDL 14JUN2012 2
1005 3 LDL . 3
1005 4 LDL . 4
1005 5 LDL 16JUN2012 5
1005 6 LDL 17JUN2012 6
;
run;
proc sort data=hospital; by patno lab serial;run;
data report;
set hospital(keep=patno serial lab Sdate);
by patno lab serial;
prev_date=lag(sdate);
format prev_date date9.;
run;
data error;
set report;
where prev_date is not missing and sdate is not missing;
if prev_date >sdate;
run;
OK, I can see clearer now.
data report;
set hospital (keep=patno serial lab Sdate);
by patno lab serial;
retain prev_date;
format prev_date date9.;
if first.lab then prev_date = .;
output;
if sdate ne . then prev_date = sdate;
run;
Now prev_date should always be the last non-missing sdate within the "lab" group.
data report;
set hospital(keep=patno serial lab Sdate);
by patno lab serial;
prev_date=lag(sdate);
if first.lab then prev_date = .; * or first.patno?;
format prev_date date9.;
run;
Thanks Kurt for your help. But still i am not getting correct output.
For the pid (1003), lab sample is DBP with first collection of sdate as 25MAR2011 and last collection of sdate as 15MAR2011 .
Therefore previous record of 15MAR2011 is 25MAR2011
When I use the logic of "if prev_date >sdate;", I could not capture this invalid record.
Could you please help once or any other programming approach to consider
patno serial lab Sdate
1003 1 DBP 25MAR2011
1003 2 DBP .
1003 3 DBP 15MAR2011
1003 4 DBP .
OK, I can see clearer now.
data report;
set hospital (keep=patno serial lab Sdate);
by patno lab serial;
retain prev_date;
format prev_date date9.;
if first.lab then prev_date = .;
output;
if sdate ne . then prev_date = sdate;
run;
Now prev_date should always be the last non-missing sdate within the "lab" group.
Thank you very much for providing nice logic. Its working fine for all records except below scenario.
Suppose I change the dataset slightly by adding one more record (1001 4 HCG 24JAN2011 7). In that cases the program is not capturing last record.
The last record should capture because previous occurrence of first lab test date is 25JAN2011 and last occurence is 24JAN2011.
As 25JAN2011 >24JAN2011, the record with serial no '4' satisfy the condition.
May I request you to check once.
1001 1 HCG 25JAN2011 1
1001 2 HCG 20JAN2011 2
1001 3 HCG . 5
1001 4 HCG 24JAN2011 7
data hospital;
input patno serial lab $ Sdate :date9. visit;
format sdate date9.;
cards;
1001 1 HCG 25JAN2011 1
1001 2 HCG 20JAN2011 2
1001 3 HCG . 5
1001 4 HCG 24JAN2011 7
1001 2 SBP 24MAR2011 2
1001 1 SBP 20MAR2011 2
1002 1 SBP . 2
1002 3 SBP 12FEB2012 8
1002 2 SBP 16FEB2012 4
1003 1 DBP 25MAR2011 4
1003 2 DBP . 5
1003 3 DBP 15MAR2011 6
1003 4 DBP . 8
1004 1 CPK 12MAR2011 1
1004 2 CPK 15MAR2011 4
1005 1 LDL 18JUN2012 1
1005 2 LDL 14JUN2012 2
1005 3 LDL . 3
1005 4 LDL . 4
1005 5 LDL 16JUN2012 5
1005 6 LDL 17JUN2012 6
;
run;
proc sort data=hospital; by patno lab serial;run;
data report;
set hospital (keep=patno serial lab Sdate);
by patno lab serial;
retain prev_date;
format prev_date date9.;
if first.lab then prev_date = .;
output;
if sdate ne . then prev_date = sdate;
run;
data error;
set report;
where prev_date is not missing and sdate is not missing;
if prev_date >sdate;
run;
Output
patno | serial | lab | Sdate | prev_date |
1001 | 2 | HCG | 20-Jan-11 | 25-Jan-11 |
1002 | 3 | SBP | 12-Feb-12 | 16-Feb-12 |
1003 | 3 | DBP | 15-Mar-11 | 25-Mar-11 |
1005 | 2 | LDL | 14-Jun-12 | 18-Jun-12 |
1001 | 4 | HCG | 24-JAN-11 | 20-JAN-11 |
1005 | 5 | LDL | 16-JUN-12 | 18-Jun-12 |
1005 | 6 | LDL | 17-JUN-12 | 18-Jun-12 |
So you will move the detection of illegal records into the "retain" data step and discard "illegal" values so that the previous one is kept retained.
Yes, I think I need to use the retain statement several times.
How can I proceed.
Try this:
data report;
set hospital (keep=patno serial lab Sdate);
by patno lab serial;
retain prev_date;
format prev_date date9.;
if first.lab then prev_date = .;
output;
if sdate ne . and sdate > prev_date then prev_date = sdate;
run;
Just for clarification: Which of these 4 records would you like to be marked as "invalid"?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.