04-21-2015 08:51 AM
I am learning about how to merge summary statistics with the original data in order to compute fraction values which need e.g. an overall mean or a mean per category. It could also be done e.g. with proc sql, but my problem is not the logic, but why I get at the final report a line I do not want to have.
I have a data set "mergesummary":
Max Flight running 1930
Zip Fit Leather walking 2250
Zoom Airborne running 4150
Light Step walking 1130
Max Step Woven walking 2230
Zip Sneak c-train 1190
I read this data with an infile and input statement. Then I sort it:
input stylename $15. +1 typeex $7. +1 totalsales 4.;
proc sort data=mergesummary;
Then I calculate the summaries and use them to output the SUM of totalsales into a new data set "summarydata":
proc means NOPRINT data=mergesummary ;
output out=summarydata SUM(totalsales)=Totalcategory;
Now I sort again:
proc sort data=summarydata;
Then I merge the original data with the summary data set:
merge mergesummary summarydata;
It all works fine, so I have no problem here!
Then I use a proc print to show the results:
proc print data=finareport;
var stylename totalsales totalcategory percentageeach;
format PercentageEach Percent9.3;
This looks like:
At the beginning the first row is just showing the total sum of all sales (the variable unfortunately is called totalsales, so it shows the total sum of totalsales). This is due to the fact that is has been calculated and output to the "summarydata" set. However, due to matching (since there is no match and it has no typeex entry) I thought it will drop out. I have the same solution in a book and there the line is not shown? But they use the same code?
Where is my mistake and how can I get rid of this row?
04-22-2015 05:09 AM
Wow! Nice one! This is EXACTLY my mistake! I used a class statement, but the book uses a by statement. What's the reason for this? Why does class give me this and by not?
04-22-2015 06:00 AM
Third from last section in here:
Yes, I have always used by statements in means until recently someone stated to use class. They are similar but have differing options.
04-22-2015 08:22 AM
CLASS doesn't require pre-sorting of the data, and it gives you all possible comibnations of the class variables, including the "null combination" which is the statistics for all of your data regardless of the level of the class variables. That is the first line that you see in your output from PROC MEANS, it is the sum across all of the data. The NWAY turns off this feature and only gives you the sums for each level of the class variables. In a lot of ways, CLASS is a much more powerful tool than using BY in PROC MEANS. CLASS does everything that BY does, so BY is a subset of CLASS.
04-22-2015 05:07 AM
Thanks a lot that helped! However this is not included in the book solution... but thanks!
I have to sort before proc means, not because of proc means, but because later on I use a data merge with a by statement.
04-21-2015 11:10 AM
Maybe this gets close to what you want
proc tabulate data=mergesummary;
totalsales=''*(sum='Total Sales'*f=best6. pctsum<stylename>='Percentage Each'*f=best7.);
Need further help from the community? Please ask a new question.