BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bknitch
Quartz | Level 8

I have a data set comparing year over year summary detail. The output on my difference calculation gets added to the final row. As seen below.

 

SAS_OUTPUT.JPG

 

data have;
	infile datalines dlm="09"x;
	input YEAR COHORT $ TOTAL_E  IP OP SNF HH HO PR DM ;
	datalines;
2019	ALL	220383712	1318640	24958942	411863	1940778	6878	183047172	8699439
2020	ALL	216412261	1493011	24395788	388176	2094616	8989	178342349	9689332
;
run;

data want;
set have;
Total_change= dif(TOTAL_E)/lag(TOTAL_E) *100;
IP_change= dif(IP)/lag(IP) *100;
OP_change= dif(OP)/lag(OP) *100;
SNF_change= dif(SNF)/lag(SNF) *100;
HH_change= dif(HH)/lag(HH) *100;
HO_change= dif(HO)/lag(HO) *100;
PR_change= dif(PR)/lag(PR) *100;
DM_change= dif(DM)/lag(DM) *100;
run;

What I would like to do is add these changes to a new row and round the decimal to format as -1.82 or 30.69 or even a percent format.  The final data set i'm trying to achieve is something like below 

 

YEAR COHORT_TYPE_DESC TOTAL_E IP OP SNF HH HO PR DM
2019 ALL          220,383,712          1,318,640                24,958,942                  411,863            1,940,778                6,878          183,047,172          8,699,439
2020 ALL          216,412,261          1,493,011                24,395,788                  388,176            2,094,616                8,989          178,342,349          9,689,332
  Difference -1.802 13.224 -2.256 -5.751 7.927 30.692 -2.570 11.379

 

I'm not sure what to do with the year or how to label it as its a numeric variable, i tried running an insert through proc sql and it didn't like the negative notation i was adding to the decimals. 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

A  question first.

Does your actual data consist of exactly two rows of data that requires adding a summary row after the second?

If so then consider

data want;
   length cohort $ 10.;
   set have;
   output;
   dt = dif(TOTAL_E);
   di = dif(ip);
   do = dif(op);
   ds = dif(snf);
   dh = dif(hh);
   dho = dif(ho);
   dp = dif(pr);
   ddm = dif(dm);
   lt = lag(TOTAL_E);
   li = lag(ip);
   lo = lag(op);
   ls = lag(snf);
   lh = lag(hh);
   lho = lag(ho);
   lp = lag(pr);
   ldm = lag(dm);

   if _n_=2 then do;
      year=.;
      cohort = "Difference";
      TOTAL_E  = round( dt/lt *100, 0.01);
      IP= round( di/li *100, 0.01);
      OP= round( do/lo *100, 0.01);
      SNF=round( ds/ls *100, 0.01);
      HH= round( dH/lH *100, 0.01);
      HO= round( dHO/lHO *100, 0.01);
      PR= round( dp/lp *100, 0.01);
      DM= round( ddm/ldm *100, 0.01);
      output;
  end;
  drop dt--ldm;
run;

You don't want a new format because a format is per variable and would change the displayed values of the first two rows of values. Round will get the appearance.

The _n_ is an automatic variable that indicates the number of times the data step iterates, so in this case can be used as a row counter.

The additional variables for the DIF and LAG are needed because if you place the function calls inside an "if" structure the last value used is the last time the if was true. Which for your data would never be the case. So get the values before the IF, calculate the rate and then discard the dif and lag values.

The default length of cohort with the data step shown will be 8 characters which is to short to hold the word "Difference" so the Length statement before the SET establishes a length long enough to hold the desired word. If your actual data has a longer length for cohort that would not be needed.

 

 

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

I may catch some criticism for this answer but ...

 

Export to Excel and then its simple. It would probably take longer to program this in SAS than it would take to export to Excel and do the calculations there.

--
Paige Miller
bknitch
Quartz | Level 8

@PaigeMiller Hahaha, I wish i could but this will be a running process going forward so i'd like to fine tune a process to do this. 

ballardw
Super User

@PaigeMiller wrote:

I may catch some criticism for this answer but ...

 

Export to Excel and then its simple. It would probably take longer to program this in SAS than it would take to export to Excel and do the calculations there.


Had a similar heretical thought with the "display last row as percent" desired result.

 

 

ballardw
Super User

A  question first.

Does your actual data consist of exactly two rows of data that requires adding a summary row after the second?

If so then consider

data want;
   length cohort $ 10.;
   set have;
   output;
   dt = dif(TOTAL_E);
   di = dif(ip);
   do = dif(op);
   ds = dif(snf);
   dh = dif(hh);
   dho = dif(ho);
   dp = dif(pr);
   ddm = dif(dm);
   lt = lag(TOTAL_E);
   li = lag(ip);
   lo = lag(op);
   ls = lag(snf);
   lh = lag(hh);
   lho = lag(ho);
   lp = lag(pr);
   ldm = lag(dm);

   if _n_=2 then do;
      year=.;
      cohort = "Difference";
      TOTAL_E  = round( dt/lt *100, 0.01);
      IP= round( di/li *100, 0.01);
      OP= round( do/lo *100, 0.01);
      SNF=round( ds/ls *100, 0.01);
      HH= round( dH/lH *100, 0.01);
      HO= round( dHO/lHO *100, 0.01);
      PR= round( dp/lp *100, 0.01);
      DM= round( ddm/ldm *100, 0.01);
      output;
  end;
  drop dt--ldm;
run;

You don't want a new format because a format is per variable and would change the displayed values of the first two rows of values. Round will get the appearance.

The _n_ is an automatic variable that indicates the number of times the data step iterates, so in this case can be used as a row counter.

The additional variables for the DIF and LAG are needed because if you place the function calls inside an "if" structure the last value used is the last time the if was true. Which for your data would never be the case. So get the values before the IF, calculate the rate and then discard the dif and lag values.

The default length of cohort with the data step shown will be 8 characters which is to short to hold the word "Difference" so the Length statement before the SET establishes a length long enough to hold the desired word. If your actual data has a longer length for cohort that would not be needed.

 

 

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1180 views
  • 1 like
  • 3 in conversation