Hello,
I have an example subset of 1 patient that where I created 2 separate dates. That is "Date" and "Pending_Date". I also created a variable where each row is either Pending='YES' or Pending='NO'. Below is my table of what I have and what I want. How can I code if Date is less than Pending_Date then exclude='yes' then I would drop my exclusions.
I tried it using an if then statement but it will only compare dates right next to each other (ex 3/29/2023 > blank date) but i want it to take into consideration the other dates as well. For example Obs 3 is a pending date and not greater Obs 1,2 4,5, and 8 so that wont get excluded. However, Obs 6 is a pending date and is greater than Obs 1,2, and 4 which is how I have my "Want" table.
Again I want my exclusion to be " Pending_Date > Date then Exclude='YES' " but I want all the "Dates" to be taken into consideration.
Whats the best way to approach this?
Thank you in advance!
Have
Obs | Patient_ID | Pending | Pending_Date | Date |
1 | AB1 | NO | 3/29/22 | |
2 | AB1 | NO | 3/29/22 | |
3 | AB1 | YES | 3/29/22 | |
4 | AB1 | NO | 3/29/22 | |
5 | AB1 | NO | 4/5/22 | |
6 | AB1 | YES | 4/5/22 | |
7 | AB1 | YES | 4/5/22 | |
8 | AB1 | NO | 4/5/22 |
Want
Obs | Patient_ID | Pending | Pending_Date | Date |
3 | AB1 | YES | 3/29/22 | |
5 | AB1 | NO | 4/5/22 | |
6 | AB1 | YES | 4/5/22 | |
6 | AB1 | YES | 4/5/22 | |
8 | AB1 | NO | 4/5/22 |
It's not clear what PENDING date you are using as a criterion. Are all dates compared to the highest PENDING date found, or just to the closest upcoming pending date?
I'd use SQL for this:
data have;
infile datalines dsd dlm=' ';
input Obs Patient_ID $ Pending $ Pending_Date :mmddyy8. Date :mmddyy8.;
format Pending_Date Date mmddyy8.;
datalines;
1 AB1 NO 3/29/22
2 AB1 NO 3/29/22
3 AB1 YES 3/29/22
4 AB1 NO 3/29/22
5 AB1 NO 4/5/22
6 AB1 YES 4/5/22
7 AB1 YES 4/5/22
8 AB1 NO 4/5/22
;
run;
PROC SQL;
CREATE TABLE excludes AS
SELECT DISTINCT a.*
FROM have a, have b
WHERE a.Patient_ID eq b.Patient_ID
AND b.pending_date IS NOT NULL
AND a.date IS NOT NULL
AND b.pending_date > a.date
ORDER BY a.obs
;
CREATE TABLE want AS
SELECT *
FROM have
EXCEPT
SELECT *
FROM excludes
;
DROP TABLE excludes;
QUIT;
- Cheers -
Pass through each ID twice. In the first pass, read only the PENDING='YES' observations, and retain the last such observation with variable _MAX_PENDING_DATE. On the second pass output only cases with PENDING='YES' or DATE not preceding _MAX_PENDING_DATE:
data have;
infile datalines truncover;
input Obs Patient_ID $ Pending $ Pending_Date :mmddyy8. Date :mmddyy8.;
format Pending_Date Date mmddyy8.;
datalines;
1 AB1 NO . 3/29/22
2 AB1 NO . 3/29/22
3 AB1 YES 3/29/22
4 AB1 NO . 3/29/22
5 AB1 NO . 4/5/22
6 AB1 YES 4/5/22
7 AB1 YES 4/5/22
8 AB1 NO . 4/5/22
run;
data want (drop=_:);
set have (in=firstpass where=(pending='YES'))
have (in=secondpass);
by patient_id;
retain _max_pending_date ;
if firstpass=1 then _max_pending_date=pending_date;
if secondpass=1 and (pending='YES' or date>=_max_pending_date);
run;
This program assumes the (1) data are sorted by patient_id, (2) that every patient_id has at least one pending_date observation, and (3) the last pending_date value within a patient_id is always the maximum pending_date. It makes no assumptions about the order of the other DATE values.
oh yes good point, it's also possible to simply compare to the max since earlier observations must be dropped anyway
PROC SQL;
CREATE TABLE want AS
SELECT *
FROM have
GROUP BY Patient_ID
HAVING pending='YES' or Date>=max(Pending_Date)
ORDER BY obs
;
QUIT;
- Cheers -
@Oligolas wrote:
oh yes good point, it's also possible to simply compare to the max since earlier observations must be dropped anyway
PROC SQL; CREATE TABLE want AS SELECT * FROM have GROUP BY Patient_ID HAVING pending='YES' or Date>=max(Pending_Date) ORDER BY obs ; QUIT;
"Comparing to the max" actually means reading all the data twice, once to get the max, and once to compare.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.