Solved: Re: Retaining only what is needed

learn2 · Posted 01-26-2019 02:11 PM

Hi, this is a question that has evolved out of curiosity…just learning SAS! If you see the table below there are duplicates and assuming there is a hierarchy, I would want to retain the most severe case on the duplicate injury date in the order of Home(least severe)<Clinic<Hospital(more severe). How would I go about retaining only the most severe case if the injury date is the same?

ID	Injury date	Location of injury
1	Jan02,2007	Home
1	Jan02, 2007	Clinic
2	Feb01,2000	Home
2	Feb01,2000	Hospital
2	Feb01,2000	Home

PeterClemmensen · Posted 01-26-2019 02:20 PM

There are many ways to do this.

Here is an easy to understand approach

data have;
input ID $ Injury_date:$20. Locationofinjury $;
infile datalines dlm=' ';
datalines;
1 Jan02,2007 Home
1 Jan02,2007 Clinic
2 Feb01,2000 Home
2 Feb01,2000 Hospital
2 Feb01,2000 Home
;

data help;
   set have;
   if Locationofinjury="Home" then severity=1;
   else if Locationofinjury="Clinic" then severity=2;
   else severity=3;
run;

proc sort data=help;
   by ID severity;
run;

data want(drop=severity);
   set help;
   by ID severity;
   if last.ID;
run;

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PeterClemmensen · Posted 01-26-2019 02:20 PM

There are many ways to do this.

Here is an easy to understand approach

data have;
input ID $ Injury_date:$20. Locationofinjury $;
infile datalines dlm=' ';
datalines;
1 Jan02,2007 Home
1 Jan02,2007 Clinic
2 Feb01,2000 Home
2 Feb01,2000 Hospital
2 Feb01,2000 Home
;

data help;
   set have;
   if Locationofinjury="Home" then severity=1;
   else if Locationofinjury="Clinic" then severity=2;
   else severity=3;
run;

proc sort data=help;
   by ID severity;
run;

data want(drop=severity);
   set help;
   by ID severity;
   if last.ID;
run;

The DATA to DATA Step Macro
Blog: SASnrd

learn2 · Posted 01-26-2019 02:30 PM

And what about if there are other unique (not duplicate) injury dates in the table too? Such as:

ID	Injury date	Location of injury
1	Jan02,2007	Home
1	Jan02, 2007	Clinic
2	Feb01,2000	Home
2	Feb06,2000	Hospital
2	Feb01,2000	Clinic

PeterClemmensen · Posted 01-26-2019 02:36 PM

Sort by injury_date as well and use last.injury date in the last data step like this

data have;
input ID $ Injury_date:$20. Locationofinjury $;
infile datalines dlm=' ';
datalines;
1 Jan02,2007 Home
1 Jan02,2007 Clinic
2 Feb01,2000 Home
2 Feb06,2000 Hospital
2 Feb01,2000 Home
;

data help;
   set have;
   if Locationofinjury="Home" then severity=1;
   else if Locationofinjury="Clinic" then severity=2;
   else severity=3;
run;

proc sort data=help;
   by ID Injury_date severity;
run;

data want(drop=severity);
   set help;
   by ID Injury_date severity;
   if last.Injury_date;
run;

The DATA to DATA Step Macro
Blog: SASnrd

Kurt_Bremser · Posted 01-26-2019 02:23 PM

Create a new numeric variable that us 1 for Home, 2 for clinic and 3 for Hospital.

Sort by id, date and this new variable.

Then run a data step with

by id date;
if last.date;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

novinosrin · Posted 01-26-2019 02:42 PM

Hi @learn2 This is a situation where you typically need or better off having look up table in your DB/DWH design right in the schema. The look up table should be flexible enough to update periodically. Of course programatically fixing can be done but may not be a robust practice.