BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
NewUsrStat
Lapis Lazuli | Level 10

Hi guys, 

suppose to have the following: 

 

 

data DB1;
input ID :$20. (Start End)(:date9.) Absence_reason :$20.;
format Start End date9.;
cards;
0001 01JAN2015 06FEB2015 vacation 
0001 02JAN2015 02JAN2015 vacation 
0001 13APR2015 31DEC2015 sick leave 
0002 01JAN2017 12JUL2017 vacation 
0002 12JUN2017 18JUN2017 vacation 
...;

 

I would like to remove records having the same (!)  "absence reason" when the period falls into a larger one. For example for 0001, the row 02JAN2015 -  02JAN2015 should be removed because already in 01JAN2015 - 06FEB2015. The same for 0002 relative to 12JUN2017  -  18JUN2017. It is clearly an error in the file I received to perform the analysis. 

 

Desired output: 

 

 

data DB1;
input ID :$20. (Start End)(:date9.) Absence_reason :$20.;
format Start End date9.;
cards;
0001 01JAN2015 06FEB2015 vacation 
0001 13APR2015 31DEC2015 sick leave 
0002 01JAN2017 12JUL2017 vacation 
...;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data DB1;
input ID :$20. (Start End)(:date9.) Absence_reason $20.;
format Start End date9.;
cards;
0001 01JAN2015 06FEB2015 vacation 
0001 02JAN2015 02JAN2015 vacation 
0001 13APR2015 31DEC2015 sick leave 
0002 01JAN2017 12JUL2017 vacation 
0002 12JUN2017 18JUN2017 vacation 
;
data temp;
 set db1;
do date=start to end;
 output;
end;
drop start end;
format date date9.;
run;
proc sort data=temp out=temp2 nodupkey;
by id date Absence_reason;
run;
data temp3;
 set temp2;
if id ne lag(id) or date ne lag(date)+1 or Absence_reason ne lag(Absence_reason) then group+1;
run;
proc sql;
create table want as
select group,max(id) as id,min(date) as start format=date9.,max(date) as end format=date9.,max(Absence_reason) as Absence_reason
 from temp3
  group by group;
quit;

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26

Is the problem always that there are two consecutive lines and one needs to be removed? Could there ever be three consecutive or non-consecutive lines where some need to be removed?

 

I don't understand the result for ID 0002, they two records overlap, why are they not combined?

--
Paige Miller
NewUsrStat
Lapis Lazuli | Level 10

I don't understand your questions. Data periods should not be combined, but only removed when they are sub periods of a greater one(s) by the same absence reason. The records relative to 0002 overlap and this is the reason why I would like to remove the shortest one. Unfortunately it is a mistake made by the person that annotated the absence for the worker 0002.

PaigeMiller
Diamond | Level 26

Is it always two consecutive records where one needs to be removed? Could there ever be three records (consecutive or non-consecutive) where records need to be removed?

 

When there is partial overlap such as ID 0002, you want to keep the longest and remove the shortest?

--
Paige Miller
NewUsrStat
Lapis Lazuli | Level 10
So: "Could there ever be 3 records....?" Yes.
"When there is partial overlap....?" Yes, the longest is the informative one I would like to retain.
Ksharp
Super User
data DB1;
input ID :$20. (Start End)(:date9.) Absence_reason $20.;
format Start End date9.;
cards;
0001 01JAN2015 06FEB2015 vacation 
0001 02JAN2015 02JAN2015 vacation 
0001 13APR2015 31DEC2015 sick leave 
0002 01JAN2017 12JUL2017 vacation 
0002 12JUN2017 18JUN2017 vacation 
;
data temp;
 set db1;
do date=start to end;
 output;
end;
drop start end;
format date date9.;
run;
proc sort data=temp out=temp2 nodupkey;
by id date Absence_reason;
run;
data temp3;
 set temp2;
if id ne lag(id) or date ne lag(date)+1 or Absence_reason ne lag(Absence_reason) then group+1;
run;
proc sql;
create table want as
select group,max(id) as id,min(date) as start format=date9.,max(date) as end format=date9.,max(Absence_reason) as Absence_reason
 from temp3
  group by group;
quit;
NewUsrStat
Lapis Lazuli | Level 10
Thank you very much! It works perfectly!

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1531 views
  • 1 like
  • 3 in conversation