Hi, this is a question that has evolved out of curiosity…just learning SAS! If you see the table below there are duplicates and assuming there is a hierarchy, I would want to retain the most severe case on the duplicate injury date in the order of Home(least severe)<Clinic<Hospital(more severe). How would I go about retaining only the most severe case if the injury date is the same?
ID | Injury date | Location of injury |
1 | Jan02,2007 | Home |
1 | Jan02, 2007 | Clinic |
2 | Feb01,2000 | Home |
2 | Feb01,2000 | Hospital |
2 | Feb01,2000 | Home |
There are many ways to do this.
Here is an easy to understand approach
data have;
input ID $ Injury_date:$20. Locationofinjury $;
infile datalines dlm=' ';
datalines;
1 Jan02,2007 Home
1 Jan02,2007 Clinic
2 Feb01,2000 Home
2 Feb01,2000 Hospital
2 Feb01,2000 Home
;
data help;
set have;
if Locationofinjury="Home" then severity=1;
else if Locationofinjury="Clinic" then severity=2;
else severity=3;
run;
proc sort data=help;
by ID severity;
run;
data want(drop=severity);
set help;
by ID severity;
if last.ID;
run;
There are many ways to do this.
Here is an easy to understand approach
data have;
input ID $ Injury_date:$20. Locationofinjury $;
infile datalines dlm=' ';
datalines;
1 Jan02,2007 Home
1 Jan02,2007 Clinic
2 Feb01,2000 Home
2 Feb01,2000 Hospital
2 Feb01,2000 Home
;
data help;
set have;
if Locationofinjury="Home" then severity=1;
else if Locationofinjury="Clinic" then severity=2;
else severity=3;
run;
proc sort data=help;
by ID severity;
run;
data want(drop=severity);
set help;
by ID severity;
if last.ID;
run;
And what about if there are other unique (not duplicate) injury dates in the table too? Such as:
ID | Injury date | Location of injury |
1 | Jan02,2007 | Home |
1 | Jan02, 2007 | Clinic |
2 | Feb01,2000 | Home |
2 | Feb06,2000 | Hospital |
2 | Feb01,2000 | Clinic |
Sort by injury_date as well and use last.injury date in the last data step like this
data have;
input ID $ Injury_date:$20. Locationofinjury $;
infile datalines dlm=' ';
datalines;
1 Jan02,2007 Home
1 Jan02,2007 Clinic
2 Feb01,2000 Home
2 Feb06,2000 Hospital
2 Feb01,2000 Home
;
data help;
set have;
if Locationofinjury="Home" then severity=1;
else if Locationofinjury="Clinic" then severity=2;
else severity=3;
run;
proc sort data=help;
by ID Injury_date severity;
run;
data want(drop=severity);
set help;
by ID Injury_date severity;
if last.Injury_date;
run;
Create a new numeric variable that us 1 for Home, 2 for clinic and 3 for Hospital.
Sort by id, date and this new variable.
Then run a data step with
by id date;
if last.date;
Hi @learn2 This is a situation where you typically need or better off having look up table in your DB/DWH design right in the schema. The look up table should be flexible enough to update periodically. Of course programatically fixing can be done but may not be a robust practice.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.