Hello,
I have a large data set with customer IDs and daily cumulative usage readings. I'm looking to calculate the daily usage per customer from the cumulative usage and add it as a variable in the data set. Any suggestions on how to even begin doing this? Thank you!
HAVE
ID date cumulative_reading
1 10/1/19 100
1 10/2/19 150
1 10/3/19 200
2 10/1/19 100
2 10/2/19 500
2 10/3/19 900
WANT
ID Date Cumulative_reading Daily Usage
1 10/1/19 100 50
1 10/2/19 150 50
1 10/3/19 200
2 10/1/19 100 400
2 10/2/19 500 400
2 10/3/19 900
Much easier if you put the value on the current record instead of the previous one.
data want;
set have ;
by id ;
daily_usage=dif(cumulative_reading);
if first.id then daily_usage=.;
run;
Results:
cumulative_ daily_ Obs id date reading usage 1 1 2019-10-01 100 . 2 1 2019-10-02 150 50 3 1 2019-10-03 200 50 4 2 2019-10-01 100 . 5 2 2019-10-02 500 400 6 2 2019-10-03 900 400
You can even decide if you want the first record's value to be missing or set it to the first value.
if first.id then daily_usage=cumulative_reading;
Results:
cumulative_ daily_ Obs id date reading usage 1 1 2019-10-01 100 100 2 1 2019-10-02 150 50 3 1 2019-10-03 200 50 4 2 2019-10-01 100 100 5 2 2019-10-02 500 400 6 2 2019-10-03 900 400
data want;
set have;
by id;
prev_cumulative=lag(cumulative_reading);
if first.id then daily=cumulative_reading;
else daily=cumulative_reading-prev_cumulative;
drop prev_cumulative;
run;
This assumes the data is properly sorted.
Also, you might want to re-think your example, there is no logical way to get a daily of 50 for the first row.
Much easier if you put the value on the current record instead of the previous one.
data want;
set have ;
by id ;
daily_usage=dif(cumulative_reading);
if first.id then daily_usage=.;
run;
Results:
cumulative_ daily_ Obs id date reading usage 1 1 2019-10-01 100 . 2 1 2019-10-02 150 50 3 1 2019-10-03 200 50 4 2 2019-10-01 100 . 5 2 2019-10-02 500 400 6 2 2019-10-03 900 400
You can even decide if you want the first record's value to be missing or set it to the first value.
if first.id then daily_usage=cumulative_reading;
Results:
cumulative_ daily_ Obs id date reading usage 1 1 2019-10-01 100 100 2 1 2019-10-02 150 50 3 1 2019-10-03 200 50 4 2 2019-10-01 100 100 5 2 2019-10-02 500 400 6 2 2019-10-03 900 400
You need to do a "look-ahead":
data have;
input ID date :mmddyy10. cumulative_reading;
format date yymmddd10.;
datalines;
1 10/1/19 100
1 10/2/19 150
1 10/3/19 200
2 10/1/19 100
2 10/2/19 500
2 10/3/19 900
;
options mergenoby=nowarn;
data want;
merge
have
have (
firstobs=2
keep=id cumulative_reading
rename=(id=nextid cumulative_reading=nextval)
)
;
if id = nextid then daily_usage = nextval - cumulative_reading;
drop nextid nextval;
run;
proc print data=want noobs;
run;
Result:
cumulative_ daily_ ID date reading usage 1 2019-10-01 100 50 1 2019-10-02 150 50 1 2019-10-03 200 . 2 2019-10-01 100 400 2 2019-10-02 500 400 2 2019-10-03 900 .
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.