- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In PROC Report, is it possible to group using a computed variable?
For example, in the PROC below, I'm creating a computed variable (rbcgen) based on the values of RBC and GENDER. Assuming, of course, that I change the other variable types from display to group or analysis, would it be possible to group by rbcgen?
Tasha
P.S. I know how to do this in a DATA step (and other workarounds). I'm just using a very simple example to ask the question about the capabilities of PROC Report.
data bloodRBC;
infile datalines;
input gender $ rbc;
datalines;
M 5.5
M 4.8
M 6.2
M 6.5
M 5.9
M 5.2
M 5.9
F 5.9
F 4.5
F 5.2
F 5.5
F 4.3
F 3.9
F 4.5
;
proc report data=bloodRBC nowd;
column gender rbc rbcgen;
define gender / display;
define rbc / display;
define rbcgen / computed;
compute rbcgen / character length=10;
if rbc = . then rbcgen = 'N/A';
else if gender = 'M' and rbc lt 4.7 then rbcgen = 'Low RBC';
else if gender = 'M' and 4.7 le rbc le 6.1 then rbcgen = 'Normal';
else if gender = 'M' and rbc gt 6.1 then rbcgen = 'High RBC';
else if gender = 'F' and rbc lt 4.2 then rbcgen = 'Low RBC';
else if gender = 'F' and 4.2 le rbc le 5.4 then rbcgen = 'Normal';
else if gender = 'F' and rbc gt 5.4 then rbcgen = 'High RBC';
endcomp;
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I understand what you're saying, but I disagree. Grouping and sorting based on computed values might seem like a basic functionality, but when you understand how PROC REPORT builds a report, then you will understand why it is not there.
Basically, PROC REPORT builds a report 1 single row at a time. It knows when it is at the beginning of a GROUP and it knows which GROUP comes first based on the ORDER= option that you might specify or the order that the data are in. PROC REPORT does go through a pre-processing phase, but that phase is not exactly what you might imagine. For example, if you have ACROSS variables, REPORT will create absolute column numbers (_c2_, _c3_) for the ACROSS variable values. It does some summarizing, but not to the computed item level.
So again, the basic concept is that PROC REPORT only builds one report row at a time. So for computed columns, there is no visibility of the row that came before or the row that will come after. There is no "post processing" phase -- so if the COMPUTED items are created on every row and there is no post processing, then there is no opportunity to "sort" by the computed item. You can make 2 passes through the data. Let's assume that you are NOT reading data in from a flat file and that you already have the BLOODRBC file. How would the 2 passes look. Well, you could do them with a DATA step or PROC SQL, but I'm going to suggest making both passes with PROC REPORT. A little known feature of PROC REPORT is that it can create an output dataset, which you could use in a subsequent step.
But key to figuring out your two passes through the data to produce your report will be to understand how you want the final report to appear. Do you still want to see 1 report row for every observation in the input dataset , or do you want to see a summarized report perhaps with an average and a count? Can you describe your desired report and the output destination you want (RTF, PDF, HTML)?
cynthia
Possible output #1:
rbcgen gender rbc
Low RBC F 3.9
Normal M 5.5
M 4.8
M 5.9
M 5.2
M 5.9
F 4.5
F 5.2
F 4.3
F 4.5
High RBC M 6.2
M 6.5
F 5.9
F 5.5
OR
Possible output report #2:
Average
rbcgen gender RBC Count
Low RBC F 3.9 1
Normal F 4.6 4
M 5.5 5
High RBC F 5.7 2
M 6.4 2
5.3 14
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't think it is. Plus it would group in alphabetical order which may not be what you want anyways, so I'd specify it outside of proc report somehow.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm guessing that it's not, based on some other posts that I've seen, but I'm really, really hoping that I'm wrong.
Grouping and sorting based on computed values seems like a basic functionality that is oddly missing from Proc Report.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I understand what you're saying, but I disagree. Grouping and sorting based on computed values might seem like a basic functionality, but when you understand how PROC REPORT builds a report, then you will understand why it is not there.
Basically, PROC REPORT builds a report 1 single row at a time. It knows when it is at the beginning of a GROUP and it knows which GROUP comes first based on the ORDER= option that you might specify or the order that the data are in. PROC REPORT does go through a pre-processing phase, but that phase is not exactly what you might imagine. For example, if you have ACROSS variables, REPORT will create absolute column numbers (_c2_, _c3_) for the ACROSS variable values. It does some summarizing, but not to the computed item level.
So again, the basic concept is that PROC REPORT only builds one report row at a time. So for computed columns, there is no visibility of the row that came before or the row that will come after. There is no "post processing" phase -- so if the COMPUTED items are created on every row and there is no post processing, then there is no opportunity to "sort" by the computed item. You can make 2 passes through the data. Let's assume that you are NOT reading data in from a flat file and that you already have the BLOODRBC file. How would the 2 passes look. Well, you could do them with a DATA step or PROC SQL, but I'm going to suggest making both passes with PROC REPORT. A little known feature of PROC REPORT is that it can create an output dataset, which you could use in a subsequent step.
But key to figuring out your two passes through the data to produce your report will be to understand how you want the final report to appear. Do you still want to see 1 report row for every observation in the input dataset , or do you want to see a summarized report perhaps with an average and a count? Can you describe your desired report and the output destination you want (RTF, PDF, HTML)?
cynthia
Possible output #1:
rbcgen gender rbc
Low RBC F 3.9
Normal M 5.5
M 4.8
M 5.9
M 5.2
M 5.9
F 4.5
F 5.2
F 4.3
F 4.5
High RBC M 6.2
M 6.5
F 5.9
F 5.5
OR
Possible output report #2:
Average
rbcgen gender RBC Count
Low RBC F 3.9 1
Normal F 4.6 4
M 5.5 5
High RBC F 5.7 2
M 6.4 2
5.3 14
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Cynthia,
I meant basic from a user standpoint, as in "Of course I'd want to be able to do that." I know that doesn't always correspond with easy to implement.
Right now I'm not working on a specific report in particular. I'm actually putting together some training documentation, and I was looking for an easy way to group based on computed values, since I thought that might be a question that would come up.
Thanks!
Tasha
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Tasha:
I have posted about my paper "Creating Complex Reports" (from SGF 2008) before. It has a similar (but not exactly the same example) of ordering on a summarized column. As in, you want to summarize sales variable by region and then order by descending sales within each region group. Still needs 2 passes through the data. There's an example in that paper.
cynthia