Hi everyone,
I'm trying to generate a summary table grouped by certain categories using the list report wizard in Enterprise Guide 4.3 / Office Add-In and I'm puzzled by the way missing values are treated. The output table doesn't show the missing values, instead it contains two lines with the same categories.
Here's a piece of code generated by the wizard as part of a proc report that aroused my suspicion:
define myvar / group 'myvar' missing;
compute GP_Brarisk_CK;
if myvar ne ' ' then hold2=myvar;
if myvar eq ' ' then myvar=hold2;
endcomp;
My question: Is there any way to modify the list report so that it treats the missing values as a separate category? Is there an alternative method that does what I want and that's available via the SAS add-in for Excel? Thanks!
Hi:
Before we talk about PROC REPORT. Let's talk about your data -- it becomes important to understand the full COLUMN statement - -which variables are coming from your data and which variables are being computed. And important to understand what values are missing, truly missing, missing numeric variables, missing character variables that you are using for grouping or ordering or ???
When you talk about "missing values" keep in mind that the MISSING option does not refer to missing values for numeric variables (such as a . for a missing SALARY or missing . for AGE). If MYVAR is a GROUP item, then is it possible that you will have space or missing as a valid value for MYVAR???? Or, are you referring to the places where the report does not display duplicate or repeated values for your data in the final report.
The code that is generated by the List Report Wizard sometimes makes assumptions about your report or inserts extra code that might not be needed if you were using just regular PROC REPORT and writing your own code...so knowing what choices you picked in the selection panes is important too. Did you say that you wanted a summary report and/or did you click the choice to show the values on every row.
Let's look at some data:
grade name advisor donation donation_num
------ ------ ------- -------- ------------
junior Alan Jones 25 101
junior Barb 10 102 <----
junior Carl Wilson 20 103
junior Dave Jones 10 104
junior Edna 5 105 <----
senior Alfred Wilson 20 106
senior Bob Wilson 15 107
senior Cathy Jones 10 108
senior Dana 25 109 <----
senior Edgar Jones 25 110
senior Fran Wilson 20 111
You can see that 3 donations have missing values for the ADVISOR column (donation 102, 105 and 109), but that all the other columns have values. The MISSING option in PROC REPORT only says that you want to see the 3 blank observations for ADVISOR, if ADVISOR is listed as a GROUP or ORDER item on your report and you use MISSING on the DEFINE statement for ADVISOR.
So here's a default report using both ADVISO and GRADE as ORDER variables.
advisor grade donation donation_num
Jones junior 25 101
10 104
senior 10 108
25 110
Wilson junior 20 103
senior 20 106
15 107
20 111
Notice in the above report, that you are seeing almost, but not all of the data rows that I initially showed above. Without the MISSING option in my code (or as an option), donations 102, 105 and 109 are NOT on the above report. All the other rows are here. But some of the rows show spaces where "Jones" or "Wilson" or "junior" or "senior" would have been repetitive. The List Report Wizard (and PROC REPORT) make the assumption that in a production quality report, you do not want to see duplicated values repeated on every report row.
Now, if I USE the MISSING option specified on a DEFINE statement for advisor, notice how the donations for 102, 105 and 109 now appear with blanks for the advisor name. These 3 observations are truly missing the ADVISOR name. This is the ONLY thing that the MISSING option should do:
advisor grade donation donation_num
junior 10 102 ** these 3 rows are from using the
5 105 ** MISSING option with the DEFINE
senior 25 109 ** statement for ADVISOR
Jones junior 25 101
10 104
senior 10 108
25 110
Wilson junior 20 103
senior 20 106
15 107
20 111
I put ** on the rows with missing values for ADVISOR. All the other blank spaces under advisor and grade rows are NOT missing. PROC REPORT suppresses the repetitious display of duplicate values. So you don't see Jones or Wilson on every report row. You also do not see junior on senior on every report row. The syntax that you show (using MYVAR and using HOLD2) is typically syntax that is sometimes used by the List Report Wizard to "fill" in these normally empty rows. This syntax usually has NOTHING to do with MISSING values or the MISSING option. As you can see, on the row for donation 104, the repetition of Jones is suppressed by default and the repetition of junior is suppressed by default.
You might want to work with Tech Support on this question, because someone needs to look at your data and at the code generated by the List Report Wizard and at the output and help you figure out whether you are getting the right results from the List Report Wizard. If I wanted to "fill in" the rows for ADVISOR and GRADE on my report, I could have produced this:
advisor grade donation donation_num
junior 10 102
junior 5 105
senior 25 109
Jones junior 25 101
Jones junior 10 104
Jones senior 10 108
Jones senior 25 110
Wilson junior 20 103
Wilson senior 20 106
Wilson senior 15 107
Wilson senior 20 111
Notice that for the 3 rows with missing values for ADVISOR, there was nothing to "fill in" but for the rows with grade, I could "fill in" junior and senior where appropriate. Tech Support is better able to look at your data and look at the generated code and help you figure out this issue. And, since you say you are using the Office Add-in, that may complicate things a bit in terms of how you fix the issue.
To open a track with Tech Support, fill out the form at this link:
http://support.sas.com/ctx/supportform/createForm
cynthia
Thanks a lot for your detailed reply; it is very helpful to understand these ideas. However, I'm still puzzled by the result I get.
I have extracted a simple example data set that shows the same phenomenon:
amount country myvar
-------- ------- -----
1 BEL 11
100 LUX
10.000 BEL
1.000.000 LUX 56
10 BEL 32
1 BEL 11
100 LUX
What I did: Using the list report wizard and without changing any of the settings, I created a summary report on this data set. "myvar" is a character field that happens to contain numeric values.
Here's what I got:
amount country myvar
-------- ------- -----
10.000 BEL
2 BEL 11
10 BEL 32
200 LUX 32
1.000.000 LUX 56
Now I'm wondering why I'm getting '32' where I would expect a missing value; I'm also wondering why, of the three nonempty values, it is this value that is displayed in the 4th line.
For completeness, here's the code (generated by Enterprise Guide 4.3):
proc report data=WORK.mydata nowd;
column Amount, SUM=Amount_SUM Country myvar;
define Amount / analysis SUM 'Amount' format=COMMAX22.2 missing;
define Amount_SUM / format=COMMAX22.2;
define Country / group 'Country' missing;
compute Country;
if Country ne ' ' then hold1=Country;
if Country eq ' ' then Country=hold1;
endcomp;
define myvar / group 'myvar' missing;
compute myvar;
if myvar ne ' ' then hold2=myvar;
if myvar eq ' ' then myvar=hold2;
endcomp;
run;
quit;
It seems to me that the list report wizard is maybe not what I want; is there anything similar that produces the result I would expect, i.e., something like a pivot table, but with each value shown in each line of the result? In SQL this would be a simple "group by"/"sum" statement, but as I said, I'd like to do this using the Excel add-in. Thanks for your suggestions!
Hi:
I'm curious. Are you building a stored process using EG and then trying to run the stored process using the Add-in for Microsoft Office? Or are you using the List Report Wizard from inside the Add-in? Or, are you building the code in EG before using the Wizard inside AMO? I'm just trying to get a full picture of your process.
You are correct, that you should not be seeing the "32" on the row for LUX. And, you are also correct that the extra code added by EG is responsible for making you get the "32" on the row for LUX (carried forward from the row for BEL). EG generally inserts code based on the choices you make in the Wizard, however, and I am not on a machine where I can test all the point and click choices that would have generated the code you posted.
However, if you edit the generated code to REMOVE the COMPUTE block (not the DEFINE statement), but just the COMPUTE block for MYVAR, you would get this:
Amount Country myvar
-------------- ------------ -----
10.000 BEL
2 BEL 11
10 BEL 32
200 LUX
1.000.000 LUX 56
for your report. If you would open a track with Tech Support, they would help you figure out the correct click-path to generate the code without the COMPUTE block. To open a track with Tech Support, fill out the form at this link:
http://support.sas.com/ctx/supportform/createForm
cynthia
Hello and thank you for your reply. I'm using the wizard from inside the Excel Add-in; I ran the same Wizard in Enterprise Guide only to copy the SAS code, there's no stored process involved. Sorry for the confusion. I might try Tech Support next, thanks for your help though.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!