BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
learn2
Fluorite | Level 6

Hi, this is a question that has evolved out of curiosity…just learning SAS! If you see the table below there are duplicates and assuming there is a hierarchy, I would want to retain the most severe case on the duplicate injury date in the order of Home(least severe)<Clinic<Hospital(more severe). How would I go about retaining only the most severe case if the injury date is the same?

 

ID

Injury date

Location of injury

1

Jan02,2007

Home

1

Jan02, 2007

Clinic

2

Feb01,2000

Home

2

Feb01,2000

Hospital

2

Feb01,2000

Home

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

There are many ways to do this. 

 

Here is an easy to understand approach

 

data have;
input ID $ Injury_date:$20. Locationofinjury $;
infile datalines dlm=' ';
datalines;
1 Jan02,2007 Home
1 Jan02,2007 Clinic
2 Feb01,2000 Home
2 Feb01,2000 Hospital
2 Feb01,2000 Home
;

data help;
   set have;
   if Locationofinjury="Home" then severity=1;
   else if Locationofinjury="Clinic" then severity=2;
   else severity=3;
run;

proc sort data=help;
   by ID severity;
run;

data want(drop=severity);
   set help;
   by ID severity;
   if last.ID;
run;

View solution in original post

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

There are many ways to do this. 

 

Here is an easy to understand approach

 

data have;
input ID $ Injury_date:$20. Locationofinjury $;
infile datalines dlm=' ';
datalines;
1 Jan02,2007 Home
1 Jan02,2007 Clinic
2 Feb01,2000 Home
2 Feb01,2000 Hospital
2 Feb01,2000 Home
;

data help;
   set have;
   if Locationofinjury="Home" then severity=1;
   else if Locationofinjury="Clinic" then severity=2;
   else severity=3;
run;

proc sort data=help;
   by ID severity;
run;

data want(drop=severity);
   set help;
   by ID severity;
   if last.ID;
run;
learn2
Fluorite | Level 6

And what about if there are other unique (not duplicate) injury dates in the table too? Such as:

ID

Injury date

Location of injury

1

Jan02,2007

Home

1

Jan02, 2007

Clinic

2

Feb01,2000

Home

2

Feb06,2000

Hospital

2

Feb01,2000

Clinic

PeterClemmensen
Tourmaline | Level 20

Sort by injury_date as well and use last.injury date in the last data step like this

 

data have;
input ID $ Injury_date:$20. Locationofinjury $;
infile datalines dlm=' ';
datalines;
1 Jan02,2007 Home
1 Jan02,2007 Clinic
2 Feb01,2000 Home
2 Feb06,2000 Hospital
2 Feb01,2000 Home
;

data help;
   set have;
   if Locationofinjury="Home" then severity=1;
   else if Locationofinjury="Clinic" then severity=2;
   else severity=3;
run;

proc sort data=help;
   by ID Injury_date severity;
run;

data want(drop=severity);
   set help;
   by ID Injury_date severity;
   if last.Injury_date;
run;
Kurt_Bremser
Super User

Create a new numeric variable that us 1 for Home, 2 for clinic and 3 for Hospital.

Sort by id, date and this new variable.

Then run a data step with

by id date;
if last.date;
novinosrin
Tourmaline | Level 20

Hi @learn2   This is a situation where you typically need or better off having look up table in your DB/DWH design right in the schema. The look up table should be flexible enough to update periodically. Of course programatically fixing can be done but may not be a robust practice.