I have a database which contains IDs , dates and costs like this:
| ID | date | cost | Difference |
| ID1 | 25/02/2020 | 20 | 1 |
| ID1 | 26/02/2020 | 10 | . |
| ID2 | 01/05/2020 | 15 | 6 |
| ID2 | 07/05/2020 | 23 | . |
| ID3 | 03/01/2019 | 18 | 1 |
| ID3 | 04/01/2019 | 17 | 1 |
| ID3 | 05/01/2019 | 11 | . |
what I want to do is to sum the cost by ID and by dates with gap of 1 to 6 days, so the required data would be as below:
| ID | date | cost |
| ID1 | 25/02/2020 | 30 |
| ID2 | 01/05/2020 | 38 |
| ID3 | 03/01/2019 | 46 |
What I tried to do is to create another date variable that takes the same date if we have gaps, and then do the sum by this date variable and the ID with this code but it doesn't work:
data new; set old; format new_date yymmdd10.; if (1<=lag_Diff<=6 ) then new_date=lag_date; if (1<=lag_Diff<=6 and 1<=lag_lag_Diff<=6) then new_date=lag_lag_date; if new_date eq . then new_date=date; run;
I would be thankful if you help me 🙂
First, create a "group date", then run PROC SUMMARY:
data have;
input ID $ date :ddmmyy10. cost Difference;
format date yymmdd10.;
datalines;
ID1 25/02/2020 20 1
ID1 26/02/2020 10 .
ID2 01/05/2020 15 6
ID2 07/05/2020 23 .
ID3 03/01/2019 18 1
ID3 04/01/2019 17 1
ID3 05/01/2019 11 .
;
data grouped;
set have;
by id;
retain gapdate;
format gapdate yymmdd10.;
if first.id
then gapdate = date;
else if date - gapdate > 6 then gapdate = date;
run;
proc summary data=grouped;
by id gapdate;
var cost;
output
out=want (
drop=_:
rename=(gapdate=date)
)
sum()=
;
run;
proc print data=want noobs;
run;
Result:
ID date cost ID1 2020-02-25 30 ID2 2020-05-01 38 ID3 2019-01-03 46
Edit: changed the conditions in the second data step to avoid a "missing" NOTE.
First, create a "group date", then run PROC SUMMARY:
data have;
input ID $ date :ddmmyy10. cost Difference;
format date yymmdd10.;
datalines;
ID1 25/02/2020 20 1
ID1 26/02/2020 10 .
ID2 01/05/2020 15 6
ID2 07/05/2020 23 .
ID3 03/01/2019 18 1
ID3 04/01/2019 17 1
ID3 05/01/2019 11 .
;
data grouped;
set have;
by id;
retain gapdate;
format gapdate yymmdd10.;
if first.id
then gapdate = date;
else if date - gapdate > 6 then gapdate = date;
run;
proc summary data=grouped;
by id gapdate;
var cost;
output
out=want (
drop=_:
rename=(gapdate=date)
)
sum()=
;
run;
proc print data=want noobs;
run;
Result:
ID date cost ID1 2020-02-25 30 ID2 2020-05-01 38 ID3 2019-01-03 46
Edit: changed the conditions in the second data step to avoid a "missing" NOTE.
Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.
Explore Now →SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.