Hi All,
I am working on a data set that has PatientID, Visit_Date, and Age. I need to check data quality by flagging those who have a decrease in age from an earlier visit_date to a later visit_date, as this is not possible. My data set looks like the following example:
PatientID | Visit_Date | Age |
1 | 1/2/2003 | 25 |
1 | 2/4/2005 | 23 |
1 | 3/7/2006 | 26 |
2 | 3/8/2003 | 50 |
2 | 5/12/2004 | 51 |
2 | 5/13/2005 | 52 |
2 | 5/14/2006 | 53 |
3 | 4/6/2020 | 72 |
3 | 12/7/2020 | 65 |
where observations in PatientID 1 and 3 would need to be flagged. How should I go about doing this? Could I use the lag function? I am fairly unfamiliar with the lag function, so any help would be greatly appreciated! Thank you in advance!
One way using DIF function:
Data want; set have; ageflag = (dif(age)<0); by patientid; if first.patientid then ageflag=.; run;
DIF returns current value - value on previous record. SAS will return 1 for true and 0 for false, so the comparison will set a flag of 1 when the value on the current record is less than for the previous.
Since you wouldn't want to compare one patient's age with another patient then the first patientid sets the flag to missing (could use 0 but missing is more appropriate in many cases).
The above solution assumes the data is sorted by patientid and date of visit.
If not sorted by patientid but grouped you could use NOTSORTED on the BY patientid statement.
DIF is closely related to LAG but since it is very common to do what you requested, the difference between current and previous, SAS supplies the DIF function as well.
One way using DIF function:
Data want; set have; ageflag = (dif(age)<0); by patientid; if first.patientid then ageflag=.; run;
DIF returns current value - value on previous record. SAS will return 1 for true and 0 for false, so the comparison will set a flag of 1 when the value on the current record is less than for the previous.
Since you wouldn't want to compare one patient's age with another patient then the first patientid sets the flag to missing (could use 0 but missing is more appropriate in many cases).
The above solution assumes the data is sorted by patientid and date of visit.
If not sorted by patientid but grouped you could use NOTSORTED on the BY patientid statement.
DIF is closely related to LAG but since it is very common to do what you requested, the difference between current and previous, SAS supplies the DIF function as well.
Thank you so much! This worked perfectly!
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.