Hi,
I have a SA dataset as below. Customer ID and their MoM balance(Apr17- Monthyear). I wnat to calculate growth in each month. i.e, May growth is SAV_May17-SAV-Apr17
Customer ID | SAV_APR17 | SAV_May17 | SAV_Jun17 |
A | 100 | 200 | 30 |
B | 200 | 300 | 500 |
CD | 100 | 600 | 1000 |
Output:
Customer ID | May growth | Jun growth |
A | 100 | -170 |
B | 100 | 200 |
CD | 500 | 400 |
How to do this in SAS?
Please note that I can change column names.
Thanks and Regards
Sandhya S
Please try the below code
data have;
input Customer_ID$ SAV_APR17 SAV_May17 SAV_Jun17;
cards;
A 100 200 30
B 200 300 500
CD 100 600 1000
;
data want;
set have;
array vals(3) SAV_APR17 SAV_May17 SAV_Jun17;
array vals2(3) may june july;
do i = 1 to 3;
if i<=3 then vals2(i)=vals(i)-vals(i+(i<3));
end;
run;
You suffer from an unfavorable data structure; with a longitudinal structure, the code is a breeze:
So let's first bring your wide dataset (probably coming from an Excel sheet) into form, and make SAS dates out of your columns names:
data have_wide;
input Customer_ID$ SAV_APR17 SAV_May17 SAV_Jun17;
cards;
A 100 200 30
B 200 300 500
CD 100 600 1000
;
proc transpose data=have_wide out=trans (rename=(col1=sav));
by customer_id;
var sav:;
run;
data have;
set trans;
period = input(substr(_name_,5),monyy5.);
format period yymms7.;
drop _name_;
run;
We then make sure that the dataset is sorted, and create the wanted result:
proc sort data=have;
by customer_id period;
run;
data want;
set have;
by customer_id;
growth = sav - lag(sav);
if first.customer_id then growth = .;
run;
You can see how simple the data step that does all the work becomes, as soon as you have an intelligent data structure. See Maxims 19 & 33.
So you get 500000 * 27 observations, each having an observation size of
length(id) + 8 (date) + 8(sav) + 8 (growth), or
500000 * 27 * 34 (if id is 10 bytes long).
This amounts to 459000000 bytes, or approx. 450 MB. That's puny in all respects (think: about 100 photos on the SD card of your high-res camera).
You can even run such datasets through the rather limited University Edition.
And think about this: in your wide dataset, each month for each customer will need space, even if the customer has no value for that month. In a long dataset, you can omit missing or zero observations, as in most calculations you will do you can use the date values to calculate the timespan between observations. That offsets the space needed for the additional date (period) column.
Additionally the long form with an actual date value will allow things like graphs with date as the xaxis that the original data would not support.
Date values also can create groups of values just by changing the format. If you want to get an average balance for a quarter use any of the report or summary functions with the date variable and change the format to yyQ. Or a calendar summary by using applying the YEAR format to the same report.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.