BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
NewUsrStat
Lapis Lazuli | Level 10

Hi guys, 

suppose to have the following: 

data have;
  input ID :$20. Start :ddmmyy. End :ddmmyy. Label :$20.;
  format start end yymmdd10.;
cards;
0001 01JAN1998 18JAN1998 A
0001 20JAN1998 22JAN1998 A 0001 02FEB1998 02FEB1998 A 0002 06FEB1998 08FEB1998 B 0002 03MAR2000 15MAR2000 A 0002 16MAR2000 18MAR2000 A

0003 01MAY2000 18MAY2000 C
0003 19MAY2000 22MAY2000 C
..... ....................
;

I need the following:
data want;
set have;
ID Start End Label Days
0001 01JAN1998 22JAN1998 A 22
0001 02FEB1998 02FEB1998 A 1 0002 06FEB1998 08FEB1998 B 2 0002 03MAR2000 18MAR2000 A 16
0003 01MAY2000 22MAY2000 C 22
..................................
;

Meaning, for each ID and for each condition summarise periods (update the end date and calculate the sum of the days) only if the periods differ of 1 day or they are consecutive like 18-20JAN and 15-16MAR, otherwise do nothing and only count the days in the interval.

Can anyone help me please? I tried to remove the cases where I have not to sum and then merge with the records where I have to sum but SAS stops running. I have 68000 records totally.

Thank you in advance

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Try this

 

data have;
  input ID :$20. (Start End)(:date9.) Label $;
  format Start End date9.;
cards;
0001 01JAN1998 18JAN1998 A 
0001 20JAN1998 22JAN1998 A 
0001 02FEB1998 02FEB1998 A 
0002 06FEB1998 08FEB1998 B 
0002 03MAR2000 15MAR2000 A 
0002 16MAR2000 18MAR2000 A 
0003 01MAY2000 18MAY2000 C 
0003 19MAY2000 22MAY2000 C 
;

data temp(drop = s);
   set have;
   by ID;
   set have(firstobs = 2 keep = start rename = start = s) have(obs=1 drop=_all_);
   if last.ID then s = .;
   n = s - end;
run;

data want(drop = s _start n);
   do _N_ = 1 by 1 until (n > 2 | last.ID);
      set temp;
      by ID;
      if _N_ = 1 then _start = Start;
   end;

   Start = ifn(_start = ., Start, _start);
   days = End - Start + 1;
run;

 

Result:

 

ID    Start      End        Label  days
0001  01JAN1998  22JAN1998  A      22
0001  02FEB1998  02FEB1998  A      1
0002  06FEB1998  08FEB1998  B      3
0002  03MAR2000  18MAR2000  A      16
0003  01MAY2000  22MAY2000  C      22    

View solution in original post

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

Try this

 

data have;
  input ID :$20. (Start End)(:date9.) Label $;
  format Start End date9.;
cards;
0001 01JAN1998 18JAN1998 A 
0001 20JAN1998 22JAN1998 A 
0001 02FEB1998 02FEB1998 A 
0002 06FEB1998 08FEB1998 B 
0002 03MAR2000 15MAR2000 A 
0002 16MAR2000 18MAR2000 A 
0003 01MAY2000 18MAY2000 C 
0003 19MAY2000 22MAY2000 C 
;

data temp(drop = s);
   set have;
   by ID;
   set have(firstobs = 2 keep = start rename = start = s) have(obs=1 drop=_all_);
   if last.ID then s = .;
   n = s - end;
run;

data want(drop = s _start n);
   do _N_ = 1 by 1 until (n > 2 | last.ID);
      set temp;
      by ID;
      if _N_ = 1 then _start = Start;
   end;

   Start = ifn(_start = ., Start, _start);
   days = End - Start + 1;
run;

 

Result:

 

ID    Start      End        Label  days
0001  01JAN1998  22JAN1998  A      22
0001  02FEB1998  02FEB1998  A      1
0002  06FEB1998  08FEB1998  B      3
0002  03MAR2000  18MAR2000  A      16
0003  01MAY2000  22MAY2000  C      22    
NewUsrStat
Lapis Lazuli | Level 10

Thank you dottor Clemmensen! Finally it works. I had only to manage some cases by removing them from the data.set, processing them separately and then I merged all and the result was what I was looking for. Thank you so much.

Kurt_Bremser
Super User

Do a "look-ahead":

data have;
  input ID :$20. Start :date9. End :date9. Label :$20.;
  format start end yymmdd10.;
cards;
0001 01JAN1998 18JAN1998 A
0001 20JAN1998 22JAN1998 A
0001 02FEB1998 02FEB1998 A
0002 06FEB1998 08FEB1998 B
0002 03MAR2000 15MAR2000 A
0002 16MAR2000 18MAR2000 A
0003 01MAY2000 18MAY2000 C
0003 19MAY2000 22MAY2000 C
;

data want;
merge
  have
  have (firstobs=2 drop=end rename=(id=_id start=_start label=_label))
;
retain __start;
if _n_ = 1 then __start = start;
if
  id = _id and label = _label and _start - end gt 2
  or id ne _id
  or label ne _label
then do;
  start = __start;
  days = end - start + 1;
  output;
  __start = _start;
end;
drop _:;
run;

proc print data=want noobs;
run;

Result:

ID	Start	End	Label	days
0001	1998-01-01	1998-01-22	A	22
0001	1998-02-02	1998-02-02	A	1
0002	1998-02-06	1998-02-08	B	3
0002	2000-03-03	2000-03-18	A	16
0003	2000-05-01	2000-05-22	C	22

Please test your example data code before posting; I had to change the informats for the dates.

NewUsrStat
Lapis Lazuli | Level 10
Sorry dott. Bremen. I was quite in trouble with timing.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1403 views
  • 1 like
  • 3 in conversation