BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
NewUsrStat
Lapis Lazuli | Level 10

Hi guys, 

suppose to have the following: 

data have;
  input ID :$20. Start :ddmmyy. End :ddmmyy. Label :$20.;
  format start end yymmdd10.;
cards;
0001 01JAN1998 18JAN1998 A
0001 20JAN1998 22JAN1998 A 0001 02FEB1998 02FEB1998 A 0002 06FEB1998 08FEB1998 B 0002 03MAR2000 15MAR2000 A 0002 16MAR2000 18MAR2000 A

0003 01MAY2000 18MAY2000 C
0003 19MAY2000 22MAY2000 C
..... ....................
;

I need the following:
data want;
set have;
ID Start End Label Days
0001 01JAN1998 22JAN1998 A 22
0001 02FEB1998 02FEB1998 A 1 0002 06FEB1998 08FEB1998 B 2 0002 03MAR2000 18MAR2000 A 16
0003 01MAY2000 22MAY2000 C 22
..................................
;

Meaning, for each ID and for each condition summarise periods (update the end date and calculate the sum of the days) only if the periods differ of 1 day or they are consecutive like 18-20JAN and 15-16MAR, otherwise do nothing and only count the days in the interval.

Can anyone help me please? I tried to remove the cases where I have not to sum and then merge with the records where I have to sum but SAS stops running. I have 68000 records totally.

Thank you in advance

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Try this

 

data have;
  input ID :$20. (Start End)(:date9.) Label $;
  format Start End date9.;
cards;
0001 01JAN1998 18JAN1998 A 
0001 20JAN1998 22JAN1998 A 
0001 02FEB1998 02FEB1998 A 
0002 06FEB1998 08FEB1998 B 
0002 03MAR2000 15MAR2000 A 
0002 16MAR2000 18MAR2000 A 
0003 01MAY2000 18MAY2000 C 
0003 19MAY2000 22MAY2000 C 
;

data temp(drop = s);
   set have;
   by ID;
   set have(firstobs = 2 keep = start rename = start = s) have(obs=1 drop=_all_);
   if last.ID then s = .;
   n = s - end;
run;

data want(drop = s _start n);
   do _N_ = 1 by 1 until (n > 2 | last.ID);
      set temp;
      by ID;
      if _N_ = 1 then _start = Start;
   end;

   Start = ifn(_start = ., Start, _start);
   days = End - Start + 1;
run;

 

Result:

 

ID    Start      End        Label  days
0001  01JAN1998  22JAN1998  A      22
0001  02FEB1998  02FEB1998  A      1
0002  06FEB1998  08FEB1998  B      3
0002  03MAR2000  18MAR2000  A      16
0003  01MAY2000  22MAY2000  C      22    

View solution in original post

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

Try this

 

data have;
  input ID :$20. (Start End)(:date9.) Label $;
  format Start End date9.;
cards;
0001 01JAN1998 18JAN1998 A 
0001 20JAN1998 22JAN1998 A 
0001 02FEB1998 02FEB1998 A 
0002 06FEB1998 08FEB1998 B 
0002 03MAR2000 15MAR2000 A 
0002 16MAR2000 18MAR2000 A 
0003 01MAY2000 18MAY2000 C 
0003 19MAY2000 22MAY2000 C 
;

data temp(drop = s);
   set have;
   by ID;
   set have(firstobs = 2 keep = start rename = start = s) have(obs=1 drop=_all_);
   if last.ID then s = .;
   n = s - end;
run;

data want(drop = s _start n);
   do _N_ = 1 by 1 until (n > 2 | last.ID);
      set temp;
      by ID;
      if _N_ = 1 then _start = Start;
   end;

   Start = ifn(_start = ., Start, _start);
   days = End - Start + 1;
run;

 

Result:

 

ID    Start      End        Label  days
0001  01JAN1998  22JAN1998  A      22
0001  02FEB1998  02FEB1998  A      1
0002  06FEB1998  08FEB1998  B      3
0002  03MAR2000  18MAR2000  A      16
0003  01MAY2000  22MAY2000  C      22    
NewUsrStat
Lapis Lazuli | Level 10

Thank you dottor Clemmensen! Finally it works. I had only to manage some cases by removing them from the data.set, processing them separately and then I merged all and the result was what I was looking for. Thank you so much.

Kurt_Bremser
Super User

Do a "look-ahead":

data have;
  input ID :$20. Start :date9. End :date9. Label :$20.;
  format start end yymmdd10.;
cards;
0001 01JAN1998 18JAN1998 A
0001 20JAN1998 22JAN1998 A
0001 02FEB1998 02FEB1998 A
0002 06FEB1998 08FEB1998 B
0002 03MAR2000 15MAR2000 A
0002 16MAR2000 18MAR2000 A
0003 01MAY2000 18MAY2000 C
0003 19MAY2000 22MAY2000 C
;

data want;
merge
  have
  have (firstobs=2 drop=end rename=(id=_id start=_start label=_label))
;
retain __start;
if _n_ = 1 then __start = start;
if
  id = _id and label = _label and _start - end gt 2
  or id ne _id
  or label ne _label
then do;
  start = __start;
  days = end - start + 1;
  output;
  __start = _start;
end;
drop _:;
run;

proc print data=want noobs;
run;

Result:

ID	Start	End	Label	days
0001	1998-01-01	1998-01-22	A	22
0001	1998-02-02	1998-02-02	A	1
0002	1998-02-06	1998-02-08	B	3
0002	2000-03-03	2000-03-18	A	16
0003	2000-05-01	2000-05-22	C	22

Please test your example data code before posting; I had to change the informats for the dates.

NewUsrStat
Lapis Lazuli | Level 10
Sorry dott. Bremen. I was quite in trouble with timing.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1544 views
  • 1 like
  • 3 in conversation