Solved: Adding row of summary differences to final table

bknitch · Posted 06-16-2020 05:20 PM

I have a data set comparing year over year summary detail. The output on my difference calculation gets added to the final row. As seen below.

data have;
	infile datalines dlm="09"x;
	input YEAR COHORT $ TOTAL_E  IP OP SNF HH HO PR DM ;
	datalines;
2019	ALL	220383712	1318640	24958942	411863	1940778	6878	183047172	8699439
2020	ALL	216412261	1493011	24395788	388176	2094616	8989	178342349	9689332
;
run;

data want;
set have;
Total_change= dif(TOTAL_E)/lag(TOTAL_E) *100;
IP_change= dif(IP)/lag(IP) *100;
OP_change= dif(OP)/lag(OP) *100;
SNF_change= dif(SNF)/lag(SNF) *100;
HH_change= dif(HH)/lag(HH) *100;
HO_change= dif(HO)/lag(HO) *100;
PR_change= dif(PR)/lag(PR) *100;
DM_change= dif(DM)/lag(DM) *100;
run;

What I would like to do is add these changes to a new row and round the decimal to format as -1.82 or 30.69 or even a percent format. The final data set i'm trying to achieve is something like below

YEAR	COHORT_TYPE_DESC	TOTAL_E	IP	OP	SNF	HH	HO	PR	DM
2019	ALL	220,383,712	1,318,640	24,958,942	411,863	1,940,778	6,878	183,047,172	8,699,439
2020	ALL	216,412,261	1,493,011	24,395,788	388,176	2,094,616	8,989	178,342,349	9,689,332
	Difference	-1.802	13.224	-2.256	-5.751	7.927	30.692	-2.570	11.379

I'm not sure what to do with the year or how to label it as its a numeric variable, i tried running an insert through proc sql and it didn't like the negative notation i was adding to the decimals.

ballardw · Posted 06-16-2020 05:56 PM

A question first.

Does your actual data consist of exactly two rows of data that requires adding a summary row after the second?

If so then consider

data want;
   length cohort $ 10.;
   set have;
   output;
   dt = dif(TOTAL_E);
   di = dif(ip);
   do = dif(op);
   ds = dif(snf);
   dh = dif(hh);
   dho = dif(ho);
   dp = dif(pr);
   ddm = dif(dm);
   lt = lag(TOTAL_E);
   li = lag(ip);
   lo = lag(op);
   ls = lag(snf);
   lh = lag(hh);
   lho = lag(ho);
   lp = lag(pr);
   ldm = lag(dm);

   if _n_=2 then do;
      year=.;
      cohort = "Difference";
      TOTAL_E  = round( dt/lt *100, 0.01);
      IP= round( di/li *100, 0.01);
      OP= round( do/lo *100, 0.01);
      SNF=round( ds/ls *100, 0.01);
      HH= round( dH/lH *100, 0.01);
      HO= round( dHO/lHO *100, 0.01);
      PR= round( dp/lp *100, 0.01);
      DM= round( ddm/ldm *100, 0.01);
      output;
  end;
  drop dt--ldm;
run;

You don't want a new format because a format is per variable and would change the displayed values of the first two rows of values. Round will get the appearance.

The _n_ is an automatic variable that indicates the number of times the data step iterates, so in this case can be used as a row counter.

The additional variables for the DIF and LAG are needed because if you place the function calls inside an "if" structure the last value used is the last time the if was true. Which for your data would never be the case. So get the values before the IF, calculate the rate and then discard the dif and lag values.

The default length of cohort with the data step shown will be 8 characters which is to short to hold the word "Difference" so the Length statement before the SET establishes a length long enough to hold the desired word. If your actual data has a longer length for cohort that would not be needed.

View solution in original post

PaigeMiller · Posted 06-16-2020 05:34 PM

I may catch some criticism for this answer but ...

Export to Excel and then its simple. It would probably take longer to program this in SAS than it would take to export to Excel and do the calculations there.

--
Paige Miller

bknitch · Posted 06-16-2020 05:41 PM

@PaigeMiller Hahaha, I wish i could but this will be a running process going forward so i'd like to fine tune a process to do this.

ballardw · Posted 06-16-2020 07:26 PM

@PaigeMiller wrote:

I may catch some criticism for this answer but ...

Export to Excel and then its simple. It would probably take longer to program this in SAS than it would take to export to Excel and do the calculations there.

Had a similar heretical thought with the "display last row as percent" desired result.

ballardw · Posted 06-16-2020 05:56 PM

A question first.

Does your actual data consist of exactly two rows of data that requires adding a summary row after the second?

If so then consider

data want;
   length cohort $ 10.;
   set have;
   output;
   dt = dif(TOTAL_E);
   di = dif(ip);
   do = dif(op);
   ds = dif(snf);
   dh = dif(hh);
   dho = dif(ho);
   dp = dif(pr);
   ddm = dif(dm);
   lt = lag(TOTAL_E);
   li = lag(ip);
   lo = lag(op);
   ls = lag(snf);
   lh = lag(hh);
   lho = lag(ho);
   lp = lag(pr);
   ldm = lag(dm);

   if _n_=2 then do;
      year=.;
      cohort = "Difference";
      TOTAL_E  = round( dt/lt *100, 0.01);
      IP= round( di/li *100, 0.01);
      OP= round( do/lo *100, 0.01);
      SNF=round( ds/ls *100, 0.01);
      HH= round( dH/lH *100, 0.01);
      HO= round( dHO/lHO *100, 0.01);
      PR= round( dp/lp *100, 0.01);
      DM= round( ddm/ldm *100, 0.01);
      output;
  end;
  drop dt--ldm;
run;

You don't want a new format because a format is per variable and would change the displayed values of the first two rows of values. Round will get the appearance.

The _n_ is an automatic variable that indicates the number of times the data step iterates, so in this case can be used as a row counter.

The additional variables for the DIF and LAG are needed because if you place the function calls inside an "if" structure the last value used is the last time the if was true. Which for your data would never be the case. So get the values before the IF, calculate the rate and then discard the dif and lag values.

The default length of cohort with the data step shown will be 8 characters which is to short to hold the word "Difference" so the Length statement before the SET establishes a length long enough to hold the desired word. If your actual data has a longer length for cohort that would not be needed.

Adding row of summary differences to final table

Re: Adding row of summary differences to final table

Re: Adding row of summary differences to final table

Re: Adding row of summary differences to final table

Re: Adding row of summary differences to final table

Re: Adding row of summary differences to final table

Adding row of summary differences to final table

Re: Adding row of summary differences to final table

Re: Adding row of summary differences to final table

Re: Adding row of summary differences to final table

Re: Adding row of summary differences to final table

Re: Adding row of summary differences to final table

Click image to register for webinar

Classroom Training Available!