So the following comes from http://support.sas.com/kb/43/975.html.
What if we add data by year and I want to show MSRP and Invoice as rows and show by year as the column where a user can scan across to see how MSRP has changed across years or one could compute a col of differences between years?
How would the following code change or would it be so overwhelming it would be easier to do with tabulate writing it out to a dataset?
Multiple Rows Summarizing Type Variable
Origin Type MSRP Invoice
Asia Hybrid $59,760 $55,288
SUV $739,225 $672,912
Sedan $2,139,813 $1,954,101
Sports $552,681 $503,556
Truck $163,069 $150,412
Wagon $254,581 $234,875
Europe SUV $483,460 $442,913
Sedan $3,353,380 $3,096,312
Sports $1,655,970 $1,501,327
Wagon $454,215 $420,043
USA SUV $864,730 $781,696
Sedan $2,307,495 $2,125,714
Sports $407,315 $370,302
Truck $435,524 $392,390
Wagon $156,420 $144,451
TOTALS Hybrid $59,760 $55,288
SUV $2,087,415 $1,897,521
Sedan $7,800,688 $7,176,127
Sports $2,615,966 $2,375,185
Truck $598,593 $542,802
Wagon $865,216 $799,369
data cars1;
set sashelp.cars;
dummy1 = 1;
dummy2 = 1;
dummy3 = 1;
dummy4 = 1;
dummy5 = 1;
dummy6 = 1;
run;
proc report data=cars1 nowd;
column dummy1 dummy2 dummy3 dummy4 dummy5 dummy6
origin type msrp invoice;
define dummy1 / group noprint;
define dummy2 / group noprint;
define dummy3 / group noprint;
define dummy4 / group noprint;
define dummy5 / group noprint;
define dummy6 / group noprint;
define origin / group;
define type / group;
define msrp / format=dollar12.;
define invoice / format=dollar12.;
break after dummy1 / summarize;
break after dummy2 / summarize;
break after dummy3 / summarize;
break after dummy4 / summarize;
break after dummy5 / summarize;
break after dummy6 / summarize;
/* Calculate subtotals */
compute invoice; *Use the last variable on the COLUMN statement;
if upcase(type) = "HYBRID" then do;
msrptyp6 + msrp.sum;
invotyp6 + invoice.sum;
end;
else if upcase(type) = "SUV" then do;
msrptyp5 + msrp.sum;
invotyp5 + invoice.sum;
end;
else if upcase(type) = "SEDAN" then do;
msrptyp4 + msrp.sum;
invotyp4 + invoice.sum;
end;
else if upcase(type) = "SPORTS" then do;
msrptyp3 + msrp.sum;
invotyp3 + invoice.sum;
end;
else if upcase(type) = "TRUCK" then do;
msrptyp2 + msrp.sum;
invotyp2 + invoice.sum;
end;
else if upcase(type) = "WAGON" then do;
msrptyp1 + msrp.sum;
invotyp1 + invoice.sum;
end;
endcomp;
/* Reassign to subtotal values */
compute after dummy6;
origin = "TOTALS";
type = "Hybrid";
msrp.sum = msrptyp6;
invoice.sum = invotyp6;
endcomp;
compute after dummy5;
type = "SUV";
msrp.sum = msrptyp5;
invoice.sum = invotyp5;
endcomp;
compute after dummy4;
type = "Sedan";
msrp.sum = msrptyp4;
invoice.sum = invotyp4;
endcomp;
compute after dummy3;
type = "Sports";
msrp.sum = msrptyp3;
invoice.sum = invotyp3;
endcomp;
compute after dummy2;
type = "Truck";
msrp.sum = msrptyp2;
invoice.sum = invotyp2;
endcomp;
compute after dummy1;
type = "Wagon";
msrp.sum = msrptyp1;
invoice.sum = invotyp1;
endcomp;
run;
If the difference could be as simple as the largest -smallest for a given group then the RANGE statistic is a likely candidate. But if actual order is important then I suspect preprocessing may be a better way to go in general. The headaches about trying to get the correct columns into a calculation in proc report when you want to use different columns if data is missing is not a trivial exercise.
A common issue being a "year to date" with monthly columns, similar to what I think you describe. Adjusting to get the column sums(means, ranges or what ever) shifting as the year goes on..
Depending on how you need to summarize the data I often find proc summary helpful as I can get multiple groups of summaries using the class statement and then watching the _type_ variable for which levels.
And often the report procedure will do a pretty good job of certain forms of transposition.
Good luck.
Not fan of spending the effort to place formulae into Excel cells as part of a solution because as soon as I do I know someone will ask for the output in PDF, or RTF, or make a chart to go along with a table in HTML, PDF or RTF output. Or use it to feed another procedure...
I don't think you are providing a very clear description of what you want.
@TimPrice wrote:
So the following comes from http://support.sas.com/kb/43/975.html.
What if we add data by year and I want to show MSRP and Invoice as rows and show by year as the column where a user can scan across to see how MSRP has changed across years or one could compute a col of differences between years?
You example should include something related to year in the data since that matches your description.
I suspect you might be looking for something like a YEAR variable used in an across role spanning other columns.
If you could make a small table that shows a desired appearance then we might be able to work toward it.
For example using the same data set (without all those dummy variables I am not quite sure why they were needed)
Proc report data=sashelp.cars; column origin type, msrp,(max min mean) ; define origin /group; define type/across; run;
If the data set had a YEAR variable you could replace TYPE above with the YEAR and the result would have block of statistics for MSRP for each year.
To be clear here is a partial sample of what the report might look like:
Multiple Rows Summarizing Type Variable |
|||||
year | |||||
Origin | Type | 2017 | 2018 | Difference | |
Asia | Hybrid | MSRP | $59,760 | $60,760 | $1,000 |
Invoice | $55,288 | $56,288 | $1,000 | ||
Sedan | MSRP | $2,139,813 | $2,140,900 | $1,087 | |
Invoice | $1,954,101 | $1,956,800 | $2,699 |
@TimPrice wrote:
To be clear here is a partial sample of what the report might look like:
Multiple Rows Summarizing Type Variable
year Origin Type 2017 2018 Difference Asia Hybrid MSRP $59,760 $60,760 $1,000 Invoice $55,288 $56,288 $1,000 Sedan MSRP $2,139,813 $2,140,900 $1,087 Invoice $1,954,101 $1,956,800 $2,699
Since you are showing what would have to be a calculated column you are calling difference what would that actual contain if you have data from three years? Largest - smallest? Last year - First year? (which gets a bit problematic inside a report if all years are not present for all records).
For my preferences to get something similar to that appearance I would likely use proc tabulate instead of report.
Here is an example of creating a small data set simulating 3 years of data based on the SASHELP.CARS data set and then a summary. I am not claiming this to be a good simulation just something that generates 3 different values for each year.
Since you have not shown what the statistic requested might be I am following with the sum of msrp and invoice for the years and a range calculation for the period of data.
data cars2015; set sashelp.cars; (keep=origin type msrp invoice); where origin='Asia' and type in ('Sedan' 'SUV') and make in ('Acura' 'Kia'); year=2015; run; data cars2016; set cars2015 ; msrp = msrp*(1+rand('uniform')); invoice = invoice*(1+rand('uniform')); year=2016; run; data cars2017; set cars2016 ; msrp = msrp*(1+rand('uniform')); invoice = invoice*(1+rand('uniform')); year=2017; run; data newcarset; set cars2015 cars2016 cars2017; run; proc tabulate data=newcarset format=dollar12.; class origin type year; var msrp invoice; table origin*type*(msrp invoice), year * sum=' ' All='Difference'*range=' ' ; run;
Ballard,
First, thank you so much for taking the time to respond.
I agree the straight forward solution would be to use tabulate. And we could use whatever statistic, as we could always substitute in whatever we wanted. I was using this data as a simple example, my exact application would be to show last month and current month or qtr over qtr change. Then my ultimate goal was to use proc report to write a formula into the last cell involving the differences into an excel spreadsheet. But the more I think about it, I'm thinking I might have to preprocess my data and transpose to use proc report.
If the difference could be as simple as the largest -smallest for a given group then the RANGE statistic is a likely candidate. But if actual order is important then I suspect preprocessing may be a better way to go in general. The headaches about trying to get the correct columns into a calculation in proc report when you want to use different columns if data is missing is not a trivial exercise.
A common issue being a "year to date" with monthly columns, similar to what I think you describe. Adjusting to get the column sums(means, ranges or what ever) shifting as the year goes on..
Depending on how you need to summarize the data I often find proc summary helpful as I can get multiple groups of summaries using the class statement and then watching the _type_ variable for which levels.
And often the report procedure will do a pretty good job of certain forms of transposition.
Good luck.
Not fan of spending the effort to place formulae into Excel cells as part of a solution because as soon as I do I know someone will ask for the output in PDF, or RTF, or make a chart to go along with a table in HTML, PDF or RTF output. Or use it to feed another procedure...
Hi:
Speaking up for PROC REPORT, here. With a different structure to the data, PROC REPORT can do that same/similar report as TABULATE using YEAR as an ACROSS item. The data does have to be structured differently. PROC REPORT uses the "NEW" data in this program and TABULATE uses the "CARS" data in this program:
data new2015(keep=origin type make year varname varvalue)
cars2015;
set sashelp.cars(keep=origin type make msrp invoice);
where origin='Asia' and
type in ('Sedan' 'SUV') and
Make in ('Acura' 'Kia');
year=2015;
output cars2015;
varname = 'MSRP ';
varvalue = msrp;
output new2015;
varname = 'Invoice';
varvalue = invoice;
output new2015;
run;
data new2016(keep=origin type make year varname varvalue)
cars2016;
set cars2015 ;
msrp = msrp*(1+rand('uniform'));
invoice = invoice*(1+rand('uniform'));
year=2016;
output cars2016;
varname = 'MSRP ';
varvalue = msrp;
output new2016;
varname = 'Invoice';
varvalue = invoice;
output new2016;
run;
data new2017(keep=origin type make year varname varvalue)
cars2017;
set cars2016 ;
msrp = msrp*(1+rand('uniform'));
invoice = invoice*(1+rand('uniform'));
year=2017;
output cars2017;
varname = 'MSRP ';
varvalue = msrp;
output new2017;
varname = 'Invoice';
varvalue = invoice;
output new2017;
run;
data newcarset;
set cars2015 cars2016 cars2017;
run;
data altcarset;
set new2015 new2016 new2017;
run;
Then, here are the results from both TABULATE and REPORT (I added some more statistics to REPORT):
and here's the REPORT code using the newly structured data (TABULATE code is the same as already posted):
proc report data=altcarset spanrows;
title 'REPORT';
column origin type varname varvalue,year
varvalue=vdiff varvalue=vmean varvalue=vmin varvalue=vmax;
define origin / group style(column)=Header;
define type / group style(column)=Header;
define varname / group ' ' order=data style(column)=Header;
define year / across 'Year';
define varvalue/ sum f=dollar12. ' ';
define vdiff / 'Difference' range f=dollar12.;
define vmean / mean 'Average' f=dollar12.;
define vmin / min 'Min' f=dollar12.;
define vmax / max 'Max' f=dollar12.;
run;
Depends on the structure of the data and whether you need any of the other PROC REPORT features.
Cynthia
Cynthia,
I have followed you on several other posts over the last year or so and have told others about your solutions. Thank you for coming to the rescue! I like this solution and I think this is what I will use as my template.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.