I have a data set comparing year over year summary detail. The output on my difference calculation gets added to the final row. As seen below.
data have;
infile datalines dlm="09"x;
input YEAR COHORT $ TOTAL_E IP OP SNF HH HO PR DM ;
datalines;
2019 ALL 220383712 1318640 24958942 411863 1940778 6878 183047172 8699439
2020 ALL 216412261 1493011 24395788 388176 2094616 8989 178342349 9689332
;
run;
data want;
set have;
Total_change= dif(TOTAL_E)/lag(TOTAL_E) *100;
IP_change= dif(IP)/lag(IP) *100;
OP_change= dif(OP)/lag(OP) *100;
SNF_change= dif(SNF)/lag(SNF) *100;
HH_change= dif(HH)/lag(HH) *100;
HO_change= dif(HO)/lag(HO) *100;
PR_change= dif(PR)/lag(PR) *100;
DM_change= dif(DM)/lag(DM) *100;
run;
What I would like to do is add these changes to a new row and round the decimal to format as -1.82 or 30.69 or even a percent format. The final data set i'm trying to achieve is something like below
YEAR | COHORT_TYPE_DESC | TOTAL_E | IP | OP | SNF | HH | HO | PR | DM |
2019 | ALL | 220,383,712 | 1,318,640 | 24,958,942 | 411,863 | 1,940,778 | 6,878 | 183,047,172 | 8,699,439 |
2020 | ALL | 216,412,261 | 1,493,011 | 24,395,788 | 388,176 | 2,094,616 | 8,989 | 178,342,349 | 9,689,332 |
Difference | -1.802 | 13.224 | -2.256 | -5.751 | 7.927 | 30.692 | -2.570 | 11.379 |
I'm not sure what to do with the year or how to label it as its a numeric variable, i tried running an insert through proc sql and it didn't like the negative notation i was adding to the decimals.
A question first.
Does your actual data consist of exactly two rows of data that requires adding a summary row after the second?
If so then consider
data want; length cohort $ 10.; set have; output; dt = dif(TOTAL_E); di = dif(ip); do = dif(op); ds = dif(snf); dh = dif(hh); dho = dif(ho); dp = dif(pr); ddm = dif(dm); lt = lag(TOTAL_E); li = lag(ip); lo = lag(op); ls = lag(snf); lh = lag(hh); lho = lag(ho); lp = lag(pr); ldm = lag(dm); if _n_=2 then do; year=.; cohort = "Difference"; TOTAL_E = round( dt/lt *100, 0.01); IP= round( di/li *100, 0.01); OP= round( do/lo *100, 0.01); SNF=round( ds/ls *100, 0.01); HH= round( dH/lH *100, 0.01); HO= round( dHO/lHO *100, 0.01); PR= round( dp/lp *100, 0.01); DM= round( ddm/ldm *100, 0.01); output; end; drop dt--ldm; run;
You don't want a new format because a format is per variable and would change the displayed values of the first two rows of values. Round will get the appearance.
The _n_ is an automatic variable that indicates the number of times the data step iterates, so in this case can be used as a row counter.
The additional variables for the DIF and LAG are needed because if you place the function calls inside an "if" structure the last value used is the last time the if was true. Which for your data would never be the case. So get the values before the IF, calculate the rate and then discard the dif and lag values.
The default length of cohort with the data step shown will be 8 characters which is to short to hold the word "Difference" so the Length statement before the SET establishes a length long enough to hold the desired word. If your actual data has a longer length for cohort that would not be needed.
I may catch some criticism for this answer but ...
Export to Excel and then its simple. It would probably take longer to program this in SAS than it would take to export to Excel and do the calculations there.
@PaigeMiller Hahaha, I wish i could but this will be a running process going forward so i'd like to fine tune a process to do this.
@PaigeMiller wrote:
I may catch some criticism for this answer but ...
Export to Excel and then its simple. It would probably take longer to program this in SAS than it would take to export to Excel and do the calculations there.
Had a similar heretical thought with the "display last row as percent" desired result.
A question first.
Does your actual data consist of exactly two rows of data that requires adding a summary row after the second?
If so then consider
data want; length cohort $ 10.; set have; output; dt = dif(TOTAL_E); di = dif(ip); do = dif(op); ds = dif(snf); dh = dif(hh); dho = dif(ho); dp = dif(pr); ddm = dif(dm); lt = lag(TOTAL_E); li = lag(ip); lo = lag(op); ls = lag(snf); lh = lag(hh); lho = lag(ho); lp = lag(pr); ldm = lag(dm); if _n_=2 then do; year=.; cohort = "Difference"; TOTAL_E = round( dt/lt *100, 0.01); IP= round( di/li *100, 0.01); OP= round( do/lo *100, 0.01); SNF=round( ds/ls *100, 0.01); HH= round( dH/lH *100, 0.01); HO= round( dHO/lHO *100, 0.01); PR= round( dp/lp *100, 0.01); DM= round( ddm/ldm *100, 0.01); output; end; drop dt--ldm; run;
You don't want a new format because a format is per variable and would change the displayed values of the first two rows of values. Round will get the appearance.
The _n_ is an automatic variable that indicates the number of times the data step iterates, so in this case can be used as a row counter.
The additional variables for the DIF and LAG are needed because if you place the function calls inside an "if" structure the last value used is the last time the if was true. Which for your data would never be the case. So get the values before the IF, calculate the rate and then discard the dif and lag values.
The default length of cohort with the data step shown will be 8 characters which is to short to hold the word "Difference" so the Length statement before the SET establishes a length long enough to hold the desired word. If your actual data has a longer length for cohort that would not be needed.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.