Help using Base SAS procedures

Proc Report - group variables - multiple across variable

Reply
Contributor
Posts: 58

Proc Report - group variables - multiple across variable

Hello, I'm trying to get a fairly extensive report of group variables. I have one variable for the column, then looking to have 3 variables (and all 8 permutations) broken up across. Not sure what i'm doing wrong here:

[pre]proc report data=testdata;
Column OutsBefore Baserunners Count BattedBallVelocity;
Define OutsBefore / Group;
Define Baserunners / Across;
Define Count / Across;
Define BattedBallVelocity / Across;
run;
[/pre]
SAS Super FREQ
Posts: 8,866

Re: Proc Report - group variables - multiple across variable

Hi:
I'm not sure what you want to accomplish. With your current code, you will probably not get what you want. I'm guessing that you want COUNT and BATTEDBALLVELOCITY -underneath- each unique value (column) for the BASERUNNERS variable???? Not entirely sure what you mean by 3 variables and 8 permutations "broken up across".

You may need to use the comma operator in the COLUMN statement to get the values of numeric variables nested underneath your ACROSS variable(s). Run the code below and review the output. Consider the difference between report #1 (probably wrong) and report #2 or #3. And, then look at the way the comma operator and parentheses are used to make report #2 and #3 happen.

cynthia
[pre]
ods html file='c:\temp\how_to_across.html' style=sasweb;
proc report data=sashelp.class nowd;
title '1) probably wrong across output (no crossing in column statement)';
column sex age height;
define sex / group;
define age / across;
define height / across;
run;

proc report data=sashelp.class nowd;
title '2) want avg height UNDER each age (use comma for crossing)';
column sex age,height;
define sex / group;
define age / across;
define height / mean;
run;

proc report data=sashelp.class nowd;
title '3) want avg height and median weight UNDER each gender';
column age sex,(height weight);
define age / group;
define sex / across;
define height / mean;
define weight / median;
run;

ods _all_ close;
[/pre]
Contributor
Posts: 58

Re: Proc Report - group variables - multiple across variable

Posted in reply to Cynthia_sas
You know, i'm thinking i really don't want to be using proc report. i want the summary stats of this data set nested within each other. i.e. i want the table to show me howmany BallinPlayTypes there were within each type of BattedBallVelocity which is in turn broken down under each type of Count, etc.

I'm thinking i need to use arrays?
SAS Super FREQ
Posts: 8,866

Re: Proc Report - group variables - multiple across variable

Sounds like PROC TABULATE to me, if your variables are all CLASS variables;
cynthia
Contributor
Posts: 58

Re: Proc Report - group variables - multiple across variable

Posted in reply to Cynthia_sas
here's a better look of what i'm trying to do:

http://support.sas.com/forums/thread.jspa?messageID=40410鷚

proc tabulate still make sense? only way to learn all the nuances of a language are this way, but what a pain when you're trying to get a project done . . .
Contributor
Posts: 58

Re: Proc Report - group variables - multiple across variable

Just read thru proc report, and i think this is the way to go. Really appreciate all your help!!!!
SAS Super FREQ
Posts: 8,866

Re: Proc Report - group variables - multiple across variable

Hi:
It depends on how necessary the labelling on the left is. PROC TABULATE makes true cross-tabular tables...so you can put some extra information into the "box" that's formed at the intersection of the column headers and the row header area. PROC REPORT, on the other hand, can also do cross-tabs, but there's no box area to label the way there is with TABULATE.

I did post a code sample for you under your new post that showed the output you wanted.

cynthia
Contributor
Posts: 58

Re: Proc Report - group variables - multiple across variable

Posted in reply to Cynthia_sas
Hey, Proc Tabulate has worked just fine. Now I have a different issue: my data is divided out into 70 data sets. Is there an easy way to incorporate ods with a macro to create, say, an excel spreadsheet of the data (anything with ods really).
SAS Super FREQ
Posts: 8,866

Re: Proc Report - group variables - multiple across variable

Hi:
I'm not sure what you're asking.

If you want your output from PROC TABULATE to be openable with Excel then you just have to use your ODS "sandwich", as I showed in my response to your other posting. There are a variety of ODS destinations you could choose from to hold the results of your PROC TABULATE, such as:
[pre]
ods msoffice2k file='c:\temp\nested_tab.xls' style=sasweb;
ods rtf file='c:\temp\nested_tab.rtf';
ods pdf file='c:\temp\nested_tab.pdf';
ods csv file='c:\temp\nested_tab.csv';
ods tagsets.excelxp file='c:\temp\nested_tab.xls' style=sasweb;

... your proc tabulate code ...

ods _all_ close;
[/pre]

When you create output with ODS, you are either creating proprietary format files, such as PDF -- which is meant to be opened with Adobe Acrobat Reader. When you use ODS RTF or ODS HTML or ODS MSOFFICE2K, you are creating ASCII text files that can be opened with a Word processor, such as Microsoft Word (RTF and HTML files respectively).

If you want to use ODS to create files that can be opened with Excel, you have 3 basic choices:
1) create a CSV file (ODS CSV) and open the CSV file with Excel
2) create an HTML file (ODS HTML and ODS MSOFFICE2K) and open the HTML file with Excel or
3) create a Spreadsheet Markup Language XML file (using ODS TAGSETS.EXCELXP) which creates Spreadsheet Markup Language XML and open the XML file with Excel

When you create an HTML file or an XML file with ODS and give that file an extension of .XLS, you are NOT creating a true, binary Excel file...you are merely creating a file that Excel knows how to open and render. The extension of .XLS, just "fools" the Windows Registry into launching Excel when you double click on the file icon. So, you have to decide which destination you want. (I included ODS RTF and ODS PDF in the above "sandwich" invocation, just for comparison purposes.)

Other considerations are:
1) there is no color or fonts with ODS CSV output -- no graph objects like logos can be included in CSV output. All output goes into 1 worksheet;
2) HTML files do use ODS colors and fonts. You can use graph objects like logos or charts with HTML files. All output goes into 1 worksheet;
3) Spreadsheet Markup Language XML files do use ODS colors and fonts. You cannot use graph objects like logos or charts with SpreadsheetML XML files. However, you can create multi-sheet workbooks when you use TAGSETS.EXCELXP to create SpreadsheetML XML files.

I'm not sure what you're asking when you say "my data is divided into 70 datasets" -- does that mean you want to create 1 SAS file from all 70 datasets and then use the big file with PROC TABULATE????

If you want to append or concatenate SAS datasetes together vertically, then you would need to use a SET statement. For example, SASHELP.PRDSAL2 and SASHELP.PRDSAL3 are related datasets. The issue is that there's a variable called MONYR in PRDSAL2 and the same variable is called DATE in PRDSAL3. So if I want to concatenate the 2 datasets together, I have to rename one of the variables to be the same as what is in the other file. So, code like that shown in the SAS Log below will do both the RENAME of the variable and the concatenate of the 2 files together:
[pre]
7006 data all_sales;
7007 set sashelp.prdsal2
7008 sashelp.prdsal3(rename=(date=monyr));
7009 label monyr='Month/Year';
7010 run;

NOTE: There were 23040 observations read from the data set
SASHELP.PRDSAL2.
NOTE: There were 11520 observations read from the data set
SASHELP.PRDSAL3.
NOTE: The data set WORK.ALL_SALES has 34560 observations and 11
variables.
NOTE: DATA statement used (Total process time):
real time 0.15 seconds
cpu time 0.03 seconds

[/pre]

Note how 11520 obs were read from SASHELP.PRDSAL3 and 23040 obs were read from SASHELP.PRDSAL2. The new file, WORK.ALL_SALES has 34560 observations and represents all the obs from the first file in the SET statement concatenated with all the obs from the second file in the SET statement. Now, if I wanted to run PROC TABULATE, I would use WORK.ALL_SALES in my PROC TABULATE step. If you have 70 datasets and your issue is that you don't want to type all 70 names, then there are a couple of different techniques you could use to help with that task.

Or, you could mean that you have 70 different datasets and you want to run 70 different PROC TABULATE steps -- using some kind of SAS Macro program.

Or you could mean that you've already brought your 70 datasets together and you want to create 70 total worksheets in 1 or more Excel workbooks.

Or, you could mean that you want to have each of your 70 different datasets run with PROC TABULATE and create 70 separate files that can be opened in Excel.

Can you clarify what it is that you want/need to do???

cynthia
Contributor
Posts: 58

Re: Proc Report - group variables - multiple across variable

Posted in reply to Cynthia_sas
It's a baseball project. Essentially, the field is divided into 70 different zones. Then every play which takes place within a zone (a hit or an out made in the field basically) is characterized by several different variables (number of outs currently, how hard the ball is hit, etc.). I then need the summary stats from each of these zones. So i have the data cleaned up and subsetted into 70 different sas data sets.

Now i want to use macros to run the darn thing. i have zero experience with macros and am learning on the fly. I got frustrated last night and basically created a proc tabulate for each of the zones (this is what happens when one hits midnight or so after a long ass day, lol). basically, i couldn't figure out how to get the "out=" to run correctly. It'll have to do for now. I think i'm pretty well set. I've created a macro to transpose some of the rows (proc tabulate doesn't format a sas data set the same way it does the output and when you use a different format, you can't convert it to a sas data set because it populates like this:
[pre]b_run Count Vel Outs Err Hit Out
0 even 1 0 0 7 24
1 0 4 17
2 0 5 8
2 0 0 16 5
1 0 11 4
2 0 11 5
3 0 0 31 0
1 0 19 0
2 0 18 0
hitter 1 0 0 4 16
1 0 0 11
2 0 0 3
2 0 0 5 2
1 0 4 2
2 0 8 1
3 0 0 20 0
1 0 12 0
2 0 6 0
pitche 1 0 0 5 7
1 0 6 10
2 0 2 11
2 0 1 7 4
1 1 6 3
2 0 5 1
3 0 0 12 0
1 0 11 0
2 0 10 0

[/pre]

Well, this means you have to fill in 0 for the baserunner(b_run) variable, then even/hitter/pitcher for Count, etc to be able to make this SAS data set. Anyway, i have a workaround for it.

Now i can actually get to the business of analyzing the data . . .

Thanks for all your help!!!!
SAS Super FREQ
Posts: 8,866

Re: Proc Report - group variables - multiple across variable

Hi,

If you used PROC REPORT, you could create a "display" version of b_run Count and Vel that would populate down each row. (see below how in PROC REPORT, #1 has AGE at the beginning for each group and for #2, AGE is populated on every row).

It sounds to me like you want to run the same PROC TABULATE on 70 different datasets ...one for each time zone?? Do you also want to create 70 different Excel workbooks or 1 workbook with 70 different sheets?? It almost sounds to me that in spite of the fact that you said you wanted ODS -- it sounds more like you want PROC EXPORT or ODS CSV. It seems you don't want a formatted PROC TABULATE report in Excel....it sounds like you want the equivalent of a dataset in Excel -- with all the values filled in on every row.

Personally, I would not go down the "adding a variable value with a macro" route -- which is way more complex -- until I had exhausted other methods, like PROC REPORT.

Can you post a sample of one of your TABULATES that you've tried??? And the ODS "wrapper" code that you've used?? I think the solution is not going to be a macro solution.

Also, can you articulate what you want to see in Excel --
1) Do you want to see colors and fonts or no colors and fonts?
2) Do you want to see a "report" format in Excel or do you want to see values populated on every row?
3) Do you want 1 workbook with 70 worksheets -- one for each dataset
4) Do you want to see 70 workbooks -- one for each dataset.
5) Are you using PROC EXPORT or ODS and if ODS, which destination are you using ODS CSV, ODS CSVALL, ODS HTML or ODS MSOFFICE2K or ODS TAGSETS.EXCELXP??

cynthia

ps...to run the code below, copy from the forum posting window and paste into Microsoft Word (to maintain the line breaks and indenting). Then copy and paste again from Word into SAS in order to run the code and see the results.

[pre]
** PROC REPORT makes display version of AGE and SEX;
** using MSOFFICE2K and ODS CSVALL -- no macro required;

ods msoffice2k file='c:\temp\compare_in_Excel.xls' style=sasweb;
ods csvall file='c:\temp\compare_in_CSV.csv';

proc report data=sashelp.class nowd;
where age in (11,12,13);
title '1) Duplicated values of AGE and SEX will be suppressed';
column age sex name height;
define age / order;
define sex / order;
define name / display;
define height/ display;
run;

proc report data=sashelp.class nowd;
where age in (11,12,13);
title '2) Display values on every row for AGE and SEX';
column age dispage sex dispsex name height;
define age / order noprint;
define dispage / computed 'Age';
define sex / order noprint;
define dispsex / computed 'Sex';
define name / display;
define height/ display;
compute before age;
holdage = age;
endcomp;
compute dispage ;
dispage = holdage;
endcomp;
compute before sex;
holdsex = sex;
endcomp;
compute dispsex / character length=1;
dispsex = holdsex;
endcomp;
run;

ods _all_ close;

[/pre]
Contributor
Posts: 58

Re: Proc Report - group variables - multiple across variable

Posted in reply to Cynthia_sas
So I think you should change your handle. Cynthia is a really pretty name, but "IamtheSASGoddess" seems to be quite appropriate . . . lol. Really impressed with your knowledge base and thanks for the help . . .

But I'm guessing you don't know a ton about baseball. Here's a link to the zone map i was referring to. http://www.retrosheet.org/location.htm
Essentially a ball is hit in a zone and it is either fielded for an out or it goes for a hit. Now, this can happen under a number of different situations, hence the categorical variables which i'm evaluating. So I have sorted the raw data (and filtered it; took me a LONG time) and then subsetted each event into one of the zones on the map. Then within eacch zone, i then evaluate how good the model is at identifying who is a good fielder and who isn't. To do so, i need the group data/summary stats.

So yes, an excel spreadsheet with 70 sheets would be perfect.

Ok, here's what i've got from proc tabulate.
[pre] proc tabulate data=_9S out = _9SSummary;
options missing = 0;
class OutsBefore baserunners Count BattedBallVelocity PlayResult;
table baserunners*Count*OutsBefore*BattedBallVelocity*BallInPlayType,
PlayResult*N;
run;[/pre]

As for ODS, i was simply outputting a new sas dataset when i executed proc tabulate, as you can see above. Then running proc transpose . . .

[pre] proc transpose data=&dsname(drop = _Type_ _Page_ _Table_) out=&dsname(drop = _Name_);
by baserunners Count OutsBefore BattedBallVelocity;
id PlayResult;
run;[/pre]


Didn't realize proc report had so many options. Oh, and as for the ODS wrapper, i could care less about what it is. Excel i guess because that's what most "laypeople" are familiar with.

I then run the macro for transpose which i mentioned. This has been easy. Not clean, but it's worked.
SAS Super FREQ
Posts: 8,866

Re: Proc Report - group variables - multiple across variable

Hi:
Thanks for the compliment! You're right, I know nothing about baseball. Well, I know that guys pitch, hit and run. And from my friends who are baseball fans, I have heard that AAA games can be more fun than the "big league" games.

If your only issue is the empty cells on every row when you use TABULATE. I'd probably just use PROC REPORT. (But if you're happy with your PROC TABULATE, TRANSPOSE and macro solution, and if you get the numbers you expect, than that is good.)

I'd recommend TAGSETS.EXCELXP for your ODS wrapper...It will create one workbook with multiple worksheets. Since it would automatically create one worksheet for every BY group, I'd be tempted to put all 70 datasets together into 1 dataset and make sure that you had a unique BY variable for ZONE and then design something like this -- using BY ZONE in the PROC REPORT code. (Otherwise, you'll have to design your macro program to work 'inside' the TAGSETS.EXCELXP wrapper code.)

cynthia

[pre]
** make some fake baseball data for 6 zones;
** the numbers in the cells will be all the same;
data baseball;
do zone = '23F', '25F', '58D', '78D', '89D', '8XD';
do baserunners = 0, 1;
do count = 'even ', 'hitter ', 'pitcher';
do battedballvelocity = 1,2,3;
do outsbefore = 0, 1, 2;
do PlayResult = 'err', 'hits', 'out';
output; output; output; output; output;
output; output; output; output; output;
end;
end;
end;
end;
end;
end;
run;

proc sort data=baseball;
by zone baserunners count battedballvelocity outsbefore ;
run;

ods listing close;
options nobyline;
title;
ods tagsets.excelxp file='c:\temp\allzones.xls' style=sasweb
options(embedded_titles='yes');

proc report data=baseball nowd
style(header)={cellwidth=1in}
style(column)={cellwidth=1in};
by zone;
title 'Zone #byval(zone)';
column baserunners count BattedBallVelocity OutsBefore d_run d_cnt d_Vel d_outs n,PlayResult;
define baserunners / group noprint;
define count / group noprint;
define BattedBallVelocity / group noprint;
define OutsBefore/group noprint;
define d_run / computed 'baserunners';
define d_cnt / computed 'Count';
define d_vel / computed 'BattedBallVelocity';
define d_outs/computed 'OutsBefore';
define PlayResult / across;
compute before baserunners;
holdb = baserunners;
endcomp;
compute before count;
length holdc $7;
holdc = count;
endcomp;
compute before BattedBallVelocity;
holdv = BattedBallVelocity;
endcomp;
compute before OutsBefore;
holdo = OutsBefore;
endcomp;
compute d_run;
d_run = holdb;
endcomp;
compute d_cnt / character length=7;
d_cnt = holdc;
endcomp;
compute d_vel;
d_vel = holdv;
endcomp;
compute d_outs;
d_outs = holdo;
endcomp;
run;
ods _all_ close;

title; options byline;

[/pre]
Contributor
Posts: 58

Re: Proc Report - group variables - multiple across variable

Posted in reply to Cynthia_sas
Hey thanks again as always! Was wrestling with a bunch of this stuff and tied up at work and finally got to working through the suggestion you had.

For the most part, it pretty much wroked, but for some reason it totals the hits/errors/outs as all having values of 10. So i.e., the first part works fine; it creates a separate data set with only the variables i am using in my model. I then sort, no problems. But then the last step has each permutation having an equal number of hits, errors and outs, all totalling either 10 or 20. I'm wondering if the compute statement has to have something summing the results so that null values don't cause problems. Here's a sample of the results:
[pre] baserunners Count BBVelocity OutsBefor err hit out
5 hitt 1 0 10 10 10
5 hitt 1 1 10 10 10
5 hitt 1 2 10 10 10
5 hitt 2 0 10 10 10
5 hitt 2 1 10 10 10
5 hitt 2 2 10 10 10
5 hitt 3 0 10 10 10
5 hitt 3 1 10 10 10
5 hitt 3 2 10 10 10
5 pitc 1 0 10 10 10
5 pitc 1 1 10 10 10
5 pitc 1 2 10 10 10
5 pitc 2 0 10 10 10
5 pitc 2 1 10 10 10
5 pitc 2 2 10 10 10
5 pitc 3 0 10 10 10
5 pitc 3 1 10 10 10
5 pitc 3 2 10 10 10
6 even 1 0 10 10 10
6 even 1 1 10 10 10
6 even 1 2 10 10 10
6 even 2 0 10 10 10
6 even 2 1 10 10 10[/pre]

Thanks as always!

Message was edited by: CharlesR

Message was edited by: CharlesR

Message was edited by: CharlesR

Message was edited by: CharlesR

Message was edited by: CharlesR

Message was edited by: CharlesR Message was edited by: CharlesR
SAS Super FREQ
Posts: 8,866

Re: Proc Report - group variables - multiple across variable

Hi:
Well, like I said, I know nothing about the data -- so I made a bunch of nested DO loops. At the very, very inside of ALL the nested DO loops are 10 output statements. That means that there will be 10 observations. So there's nothing wrong with PROC REPORT -- it is reporting EXACTLY what it sees....10 observations for EACH. I figured you'd run the program once and then use a version of the program on your own data. I didn't know how to make "true" baseball data (and in truth, probably wouldn't have gotten it right anyway. As I said, some guy pitches, other guys bat, everybody runs around and when someone hits it out of the park, everybody cheers and they play Take Me Out To the Ballgame on the loud speaker. Bad food, lots of beer, too much yelling and what is it with all the spitting?)

You'd have to run a version of MY program without MY data, but with YOUR data. Looks like I may have not coded the right length for the COUNT variable. Look at the program. Change the number of OUTPUT statements from 10 to 1 and you'll only see the number 1 in each cell. You get the idea.

cynthia
Ask a Question
Discussion stats
  • 15 replies
  • 2589 views
  • 0 likes
  • 2 in conversation