turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Merging Summary Statistics with the Original Data

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-21-2015 08:51 AM

I am learning about how to merge summary statistics with the original data in order to compute fraction values which need e.g. an overall mean or a mean per category. It could also be done e.g. with proc sql, but my problem is not the logic, but why I get at the final report a line I do not want to have.

I have a data set "mergesummary":

Max Flight running 1930

Zip Fit Leather walking 2250

Zoom Airborne running 4150

Light Step walking 1130

Max Step Woven walking 2230

Zip Sneak c-train 1190

I read this data with an infile and input statement. Then I sort it:

data mergesummary;

infile 'D:\SAS\running.txt';

input stylename $15. +1 typeex $7. +1 totalsales 4.;

run;

proc sort data=mergesummary;

by typeex;

run;

Then I calculate the summaries and use them to output the SUM of totalsales into a new data set "summarydata":

proc means NOPRINT data=mergesummary ;

var totalsales;

class typeex;

output out=summarydata SUM(totalsales)=Totalcategory;

run;

Now I sort again:

proc sort data=summarydata;

by typeex;

run;

Then I merge the original data with the summary data set:

data finareport;

merge mergesummary summarydata;

by typeex;

PercentageEach=totalsales/totalcategory;

run;

It all works fine, so I have no problem here!

Then I use a proc print to show the results:

proc print data=finareport;

by typeex;

ID typeex;

var stylename totalsales totalcategory percentageeach;

format PercentageEach Percent9.3;

run;

This looks like:

At the beginning the first row is just showing the total sum of all sales (the variable unfortunately is called totalsales, so it shows the total sum of totalsales). This is due to the fact that is has been calculated and output to the "summarydata" set. However, due to matching (since there is no match and it has no typeex entry) I thought it will drop out. I have the same solution in a book and there the line is not shown? But they use the same code?

Where is my mistake and how can I get rid of this row?

Accepted Solutions

Solution

04-21-2015
09:01 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-21-2015 09:01 AM

Would replacing class with by also work?

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-21-2015 08:57 AM

It comes from PROC MEANS, if you don't want it use the NWAY option in PROC MEANS

There's no need to do a sort before your PROC MEANS.

Solution

04-21-2015
09:01 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-21-2015 09:01 AM

Would replacing class with by also work?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-22-2015 05:09 AM

Wow! Nice one! This is EXACTLY my mistake! I used a class statement, but the book uses a by statement. What's the reason for this? Why does class give me this and by not?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-22-2015 06:00 AM

Third from last section in here:

Base SAS(R) 9.2 Procedures Guide

Yes, I have always used by statements in means until recently someone stated to use class. They are similar but have differing options.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-22-2015 08:22 AM

CLASS doesn't require pre-sorting of the data, and it gives you all possible comibnations of the class variables, including the "null combination" which is the statistics for all of your data regardless of the level of the class variables. That is the first line that you see in your output from PROC MEANS, it is the sum across all of the data. The NWAY turns off this feature and only gives you the sums for each level of the class variables. In a lot of ways, CLASS is a much more powerful tool than using BY in PROC MEANS. CLASS does everything that BY does, so BY is a subset of CLASS.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-22-2015 11:04 AM

And an occasionally beneficial side effect with NWAY is the data on output is sorted by the class variables.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-22-2015 05:07 AM

Thanks a lot that helped! However this is not included in the book solution... but thanks!

I have to sort before proc means, not because of proc means, but because later on I use a data merge with a by statement.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-21-2015 11:10 AM

Maybe this gets close to what you want

proc tabulate data=mergesummary;

class typeex;

class stylename;

var totalsales;

table typeex*stylename,

totalsales=''*(sum='Total Sales'*f=best6. pctsum<stylename>='Percentage Each'*f=best7.);

run;