Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- SAS Procedures
- /
- Adding row of summary differences to final table

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 06-16-2020 05:20 PM
(442 views)

I have a data set comparing year over year summary detail. The output on my difference calculation gets added to the final row. As seen below.

```
data have;
infile datalines dlm="09"x;
input YEAR COHORT $ TOTAL_E IP OP SNF HH HO PR DM ;
datalines;
2019 ALL 220383712 1318640 24958942 411863 1940778 6878 183047172 8699439
2020 ALL 216412261 1493011 24395788 388176 2094616 8989 178342349 9689332
;
run;
data want;
set have;
Total_change= dif(TOTAL_E)/lag(TOTAL_E) *100;
IP_change= dif(IP)/lag(IP) *100;
OP_change= dif(OP)/lag(OP) *100;
SNF_change= dif(SNF)/lag(SNF) *100;
HH_change= dif(HH)/lag(HH) *100;
HO_change= dif(HO)/lag(HO) *100;
PR_change= dif(PR)/lag(PR) *100;
DM_change= dif(DM)/lag(DM) *100;
run;
```

What I would like to do is add these changes to a new row and round the decimal to format as -1.82 or 30.69 or even a percent format. The final data set i'm trying to achieve is something like below

YEAR | COHORT_TYPE_DESC | TOTAL_E | IP | OP | SNF | HH | HO | PR | DM |

2019 | ALL | 220,383,712 | 1,318,640 | 24,958,942 | 411,863 | 1,940,778 | 6,878 | 183,047,172 | 8,699,439 |

2020 | ALL | 216,412,261 | 1,493,011 | 24,395,788 | 388,176 | 2,094,616 | 8,989 | 178,342,349 | 9,689,332 |

Difference | -1.802 | 13.224 | -2.256 | -5.751 | 7.927 | 30.692 | -2.570 | 11.379 |

I'm not sure what to do with the year or how to label it as its a numeric variable, i tried running an insert through proc sql and it didn't like the negative notation i was adding to the decimals.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

A question first.

Does your actual data consist of exactly two rows of data that requires adding a summary row after the second?

If so then consider

data want; length cohort $ 10.; set have; output; dt = dif(TOTAL_E); di = dif(ip); do = dif(op); ds = dif(snf); dh = dif(hh); dho = dif(ho); dp = dif(pr); ddm = dif(dm); lt = lag(TOTAL_E); li = lag(ip); lo = lag(op); ls = lag(snf); lh = lag(hh); lho = lag(ho); lp = lag(pr); ldm = lag(dm); if _n_=2 then do; year=.; cohort = "Difference"; TOTAL_E = round( dt/lt *100, 0.01); IP= round( di/li *100, 0.01); OP= round( do/lo *100, 0.01); SNF=round( ds/ls *100, 0.01); HH= round( dH/lH *100, 0.01); HO= round( dHO/lHO *100, 0.01); PR= round( dp/lp *100, 0.01); DM= round( ddm/ldm *100, 0.01); output; end; drop dt--ldm; run;

You don't want a new format because a format is per variable and would change the displayed values of the first two rows of values. Round will get the appearance.

The _n_ is an automatic variable that indicates the number of times the data step iterates, so in this case can be used as a row counter.

The additional variables for the DIF and LAG are needed because if you place the function calls inside an "if" structure the last value used is the last time the if was true. Which for your data would never be the case. So get the values before the IF, calculate the rate and then discard the dif and lag values.

The default length of cohort with the data step shown will be 8 characters which is to short to hold the word "Difference" so the Length statement before the SET establishes a length long enough to hold the desired word. If your actual data has a longer length for cohort that would not be needed.

4 REPLIES 4

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I may catch some criticism for this answer but ...

Export to Excel and then its simple. It would probably take longer to program this in SAS than it would take to export to Excel and do the calculations there.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@PaigeMiller wrote:

I may catch some criticism for this answer but ...

Export to Excel and then its simple. It would probably take longer to program this in SAS than it would take to export to Excel and do the calculations there.

Had a similar heretical thought with the "display last row as percent" desired result.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

A question first.

Does your actual data consist of exactly two rows of data that requires adding a summary row after the second?

If so then consider

data want; length cohort $ 10.; set have; output; dt = dif(TOTAL_E); di = dif(ip); do = dif(op); ds = dif(snf); dh = dif(hh); dho = dif(ho); dp = dif(pr); ddm = dif(dm); lt = lag(TOTAL_E); li = lag(ip); lo = lag(op); ls = lag(snf); lh = lag(hh); lho = lag(ho); lp = lag(pr); ldm = lag(dm); if _n_=2 then do; year=.; cohort = "Difference"; TOTAL_E = round( dt/lt *100, 0.01); IP= round( di/li *100, 0.01); OP= round( do/lo *100, 0.01); SNF=round( ds/ls *100, 0.01); HH= round( dH/lH *100, 0.01); HO= round( dHO/lHO *100, 0.01); PR= round( dp/lp *100, 0.01); DM= round( ddm/ldm *100, 0.01); output; end; drop dt--ldm; run;

You don't want a new format because a format is per variable and would change the displayed values of the first two rows of values. Round will get the appearance.

The _n_ is an automatic variable that indicates the number of times the data step iterates, so in this case can be used as a row counter.

The additional variables for the DIF and LAG are needed because if you place the function calls inside an "if" structure the last value used is the last time the if was true. Which for your data would never be the case. So get the values before the IF, calculate the rate and then discard the dif and lag values.

The default length of cohort with the data step shown will be 8 characters which is to short to hold the word "Difference" so the Length statement before the SET establishes a length long enough to hold the desired word. If your actual data has a longer length for cohort that would not be needed.

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.