Hi, this is a question that has evolved out of curiosity…just learning SAS! If you see the table below there are duplicates and assuming there is a hierarchy, I would want to retain the most severe case on the duplicate injury date in the order of Home(least severe)<Clinic<Hospital(more severe). How would I go about retaining only the most severe case if the injury date is the same?
ID | Injury date | Location of injury |
1 | Jan02,2007 | Home |
1 | Jan02, 2007 | Clinic |
2 | Feb01,2000 | Home |
2 | Feb01,2000 | Hospital |
2 | Feb01,2000 | Home |
There are many ways to do this.
Here is an easy to understand approach
data have;
input ID $ Injury_date:$20. Locationofinjury $;
infile datalines dlm=' ';
datalines;
1 Jan02,2007 Home
1 Jan02,2007 Clinic
2 Feb01,2000 Home
2 Feb01,2000 Hospital
2 Feb01,2000 Home
;
data help;
set have;
if Locationofinjury="Home" then severity=1;
else if Locationofinjury="Clinic" then severity=2;
else severity=3;
run;
proc sort data=help;
by ID severity;
run;
data want(drop=severity);
set help;
by ID severity;
if last.ID;
run;
There are many ways to do this.
Here is an easy to understand approach
data have;
input ID $ Injury_date:$20. Locationofinjury $;
infile datalines dlm=' ';
datalines;
1 Jan02,2007 Home
1 Jan02,2007 Clinic
2 Feb01,2000 Home
2 Feb01,2000 Hospital
2 Feb01,2000 Home
;
data help;
set have;
if Locationofinjury="Home" then severity=1;
else if Locationofinjury="Clinic" then severity=2;
else severity=3;
run;
proc sort data=help;
by ID severity;
run;
data want(drop=severity);
set help;
by ID severity;
if last.ID;
run;
And what about if there are other unique (not duplicate) injury dates in the table too? Such as:
ID | Injury date | Location of injury |
1 | Jan02,2007 | Home |
1 | Jan02, 2007 | Clinic |
2 | Feb01,2000 | Home |
2 | Feb06,2000 | Hospital |
2 | Feb01,2000 | Clinic |
Sort by injury_date as well and use last.injury date in the last data step like this
data have;
input ID $ Injury_date:$20. Locationofinjury $;
infile datalines dlm=' ';
datalines;
1 Jan02,2007 Home
1 Jan02,2007 Clinic
2 Feb01,2000 Home
2 Feb06,2000 Hospital
2 Feb01,2000 Home
;
data help;
set have;
if Locationofinjury="Home" then severity=1;
else if Locationofinjury="Clinic" then severity=2;
else severity=3;
run;
proc sort data=help;
by ID Injury_date severity;
run;
data want(drop=severity);
set help;
by ID Injury_date severity;
if last.Injury_date;
run;
Create a new numeric variable that us 1 for Home, 2 for clinic and 3 for Hospital.
Sort by id, date and this new variable.
Then run a data step with
by id date;
if last.date;
Hi @learn2 This is a situation where you typically need or better off having look up table in your DB/DWH design right in the schema. The look up table should be flexible enough to update periodically. Of course programatically fixing can be done but may not be a robust practice.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.