Hi guys,
suppose to have the following:
data have; input ID :$20. Start :ddmmyy. End :ddmmyy. Label :$20.; format start end yymmdd10.; cards;
0001 01JAN1998 18JAN1998 A
0001 20JAN1998 22JAN1998 A 0001 02FEB1998 02FEB1998 A 0002 06FEB1998 08FEB1998 B 0002 03MAR2000 15MAR2000 A 0002 16MAR2000 18MAR2000 A0003 01MAY2000 18MAY2000 C
0003 19MAY2000 22MAY2000 C
..... ....................
;
I need the following:
data want;
set have;
ID Start End Label Days0001 01JAN1998 22JAN1998 A 22
0001 02FEB1998 02FEB1998 A 1 0002 06FEB1998 08FEB1998 B 2 0002 03MAR2000 18MAR2000 A 16
0003 01MAY2000 22MAY2000 C 22
..................................
;
Meaning, for each ID and for each condition summarise periods (update the end date and calculate the sum of the days) only if the periods differ of 1 day or they are consecutive like 18-20JAN and 15-16MAR, otherwise do nothing and only count the days in the interval.
Can anyone help me please? I tried to remove the cases where I have not to sum and then merge with the records where I have to sum but SAS stops running. I have 68000 records totally.
Thank you in advance
Try this
data have;
input ID :$20. (Start End)(:date9.) Label $;
format Start End date9.;
cards;
0001 01JAN1998 18JAN1998 A
0001 20JAN1998 22JAN1998 A
0001 02FEB1998 02FEB1998 A
0002 06FEB1998 08FEB1998 B
0002 03MAR2000 15MAR2000 A
0002 16MAR2000 18MAR2000 A
0003 01MAY2000 18MAY2000 C
0003 19MAY2000 22MAY2000 C
;
data temp(drop = s);
set have;
by ID;
set have(firstobs = 2 keep = start rename = start = s) have(obs=1 drop=_all_);
if last.ID then s = .;
n = s - end;
run;
data want(drop = s _start n);
do _N_ = 1 by 1 until (n > 2 | last.ID);
set temp;
by ID;
if _N_ = 1 then _start = Start;
end;
Start = ifn(_start = ., Start, _start);
days = End - Start + 1;
run;
Result:
ID Start End Label days 0001 01JAN1998 22JAN1998 A 22 0001 02FEB1998 02FEB1998 A 1 0002 06FEB1998 08FEB1998 B 3 0002 03MAR2000 18MAR2000 A 16 0003 01MAY2000 22MAY2000 C 22
Try this
data have;
input ID :$20. (Start End)(:date9.) Label $;
format Start End date9.;
cards;
0001 01JAN1998 18JAN1998 A
0001 20JAN1998 22JAN1998 A
0001 02FEB1998 02FEB1998 A
0002 06FEB1998 08FEB1998 B
0002 03MAR2000 15MAR2000 A
0002 16MAR2000 18MAR2000 A
0003 01MAY2000 18MAY2000 C
0003 19MAY2000 22MAY2000 C
;
data temp(drop = s);
set have;
by ID;
set have(firstobs = 2 keep = start rename = start = s) have(obs=1 drop=_all_);
if last.ID then s = .;
n = s - end;
run;
data want(drop = s _start n);
do _N_ = 1 by 1 until (n > 2 | last.ID);
set temp;
by ID;
if _N_ = 1 then _start = Start;
end;
Start = ifn(_start = ., Start, _start);
days = End - Start + 1;
run;
Result:
ID Start End Label days 0001 01JAN1998 22JAN1998 A 22 0001 02FEB1998 02FEB1998 A 1 0002 06FEB1998 08FEB1998 B 3 0002 03MAR2000 18MAR2000 A 16 0003 01MAY2000 22MAY2000 C 22
Thank you dottor Clemmensen! Finally it works. I had only to manage some cases by removing them from the data.set, processing them separately and then I merged all and the result was what I was looking for. Thank you so much.
Anytime 🙂
Do a "look-ahead":
data have;
input ID :$20. Start :date9. End :date9. Label :$20.;
format start end yymmdd10.;
cards;
0001 01JAN1998 18JAN1998 A
0001 20JAN1998 22JAN1998 A
0001 02FEB1998 02FEB1998 A
0002 06FEB1998 08FEB1998 B
0002 03MAR2000 15MAR2000 A
0002 16MAR2000 18MAR2000 A
0003 01MAY2000 18MAY2000 C
0003 19MAY2000 22MAY2000 C
;
data want;
merge
have
have (firstobs=2 drop=end rename=(id=_id start=_start label=_label))
;
retain __start;
if _n_ = 1 then __start = start;
if
id = _id and label = _label and _start - end gt 2
or id ne _id
or label ne _label
then do;
start = __start;
days = end - start + 1;
output;
__start = _start;
end;
drop _:;
run;
proc print data=want noobs;
run;
Result:
ID Start End Label days 0001 1998-01-01 1998-01-22 A 22 0001 1998-02-02 1998-02-02 A 1 0002 1998-02-06 1998-02-08 B 3 0002 2000-03-03 2000-03-18 A 16 0003 2000-05-01 2000-05-22 C 22
Please test your example data code before posting; I had to change the informats for the dates.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.