- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi guys,
suppose to have the following:
data have; input ID :$20. Start :ddmmyy. End :ddmmyy. Label :$20.; format start end yymmdd10.; cards;
0001 01JAN1998 18JAN1998 A
0001 20JAN1998 22JAN1998 A 0001 02FEB1998 02FEB1998 A 0002 06FEB1998 08FEB1998 B 0002 03MAR2000 15MAR2000 A 0002 16MAR2000 18MAR2000 A0003 01MAY2000 18MAY2000 C
0003 19MAY2000 22MAY2000 C
..... ....................
;
I need the following:
data want;
set have;
ID Start End Label Days0001 01JAN1998 22JAN1998 A 22
0001 02FEB1998 02FEB1998 A 1 0002 06FEB1998 08FEB1998 B 2 0002 03MAR2000 18MAR2000 A 16
0003 01MAY2000 22MAY2000 C 22
..................................
;
Meaning, for each ID and for each condition summarise periods (update the end date and calculate the sum of the days) only if the periods differ of 1 day or they are consecutive like 18-20JAN and 15-16MAR, otherwise do nothing and only count the days in the interval.
Can anyone help me please? I tried to remove the cases where I have not to sum and then merge with the records where I have to sum but SAS stops running. I have 68000 records totally.
Thank you in advance
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try this
data have;
input ID :$20. (Start End)(:date9.) Label $;
format Start End date9.;
cards;
0001 01JAN1998 18JAN1998 A
0001 20JAN1998 22JAN1998 A
0001 02FEB1998 02FEB1998 A
0002 06FEB1998 08FEB1998 B
0002 03MAR2000 15MAR2000 A
0002 16MAR2000 18MAR2000 A
0003 01MAY2000 18MAY2000 C
0003 19MAY2000 22MAY2000 C
;
data temp(drop = s);
set have;
by ID;
set have(firstobs = 2 keep = start rename = start = s) have(obs=1 drop=_all_);
if last.ID then s = .;
n = s - end;
run;
data want(drop = s _start n);
do _N_ = 1 by 1 until (n > 2 | last.ID);
set temp;
by ID;
if _N_ = 1 then _start = Start;
end;
Start = ifn(_start = ., Start, _start);
days = End - Start + 1;
run;
Result:
ID Start End Label days 0001 01JAN1998 22JAN1998 A 22 0001 02FEB1998 02FEB1998 A 1 0002 06FEB1998 08FEB1998 B 3 0002 03MAR2000 18MAR2000 A 16 0003 01MAY2000 22MAY2000 C 22
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try this
data have;
input ID :$20. (Start End)(:date9.) Label $;
format Start End date9.;
cards;
0001 01JAN1998 18JAN1998 A
0001 20JAN1998 22JAN1998 A
0001 02FEB1998 02FEB1998 A
0002 06FEB1998 08FEB1998 B
0002 03MAR2000 15MAR2000 A
0002 16MAR2000 18MAR2000 A
0003 01MAY2000 18MAY2000 C
0003 19MAY2000 22MAY2000 C
;
data temp(drop = s);
set have;
by ID;
set have(firstobs = 2 keep = start rename = start = s) have(obs=1 drop=_all_);
if last.ID then s = .;
n = s - end;
run;
data want(drop = s _start n);
do _N_ = 1 by 1 until (n > 2 | last.ID);
set temp;
by ID;
if _N_ = 1 then _start = Start;
end;
Start = ifn(_start = ., Start, _start);
days = End - Start + 1;
run;
Result:
ID Start End Label days 0001 01JAN1998 22JAN1998 A 22 0001 02FEB1998 02FEB1998 A 1 0002 06FEB1998 08FEB1998 B 3 0002 03MAR2000 18MAR2000 A 16 0003 01MAY2000 22MAY2000 C 22
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you dottor Clemmensen! Finally it works. I had only to manage some cases by removing them from the data.set, processing them separately and then I merged all and the result was what I was looking for. Thank you so much.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Anytime 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do a "look-ahead":
data have;
input ID :$20. Start :date9. End :date9. Label :$20.;
format start end yymmdd10.;
cards;
0001 01JAN1998 18JAN1998 A
0001 20JAN1998 22JAN1998 A
0001 02FEB1998 02FEB1998 A
0002 06FEB1998 08FEB1998 B
0002 03MAR2000 15MAR2000 A
0002 16MAR2000 18MAR2000 A
0003 01MAY2000 18MAY2000 C
0003 19MAY2000 22MAY2000 C
;
data want;
merge
have
have (firstobs=2 drop=end rename=(id=_id start=_start label=_label))
;
retain __start;
if _n_ = 1 then __start = start;
if
id = _id and label = _label and _start - end gt 2
or id ne _id
or label ne _label
then do;
start = __start;
days = end - start + 1;
output;
__start = _start;
end;
drop _:;
run;
proc print data=want noobs;
run;
Result:
ID Start End Label days 0001 1998-01-01 1998-01-22 A 22 0001 1998-02-02 1998-02-02 A 1 0002 1998-02-06 1998-02-08 B 3 0002 2000-03-03 2000-03-18 A 16 0003 2000-05-01 2000-05-22 C 22
Please test your example data code before posting; I had to change the informats for the dates.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content