BookmarkSubscribeRSS Feed
Calcite | Level 5
Hi all,

I've a dataset with 4 variables. I want to create a report in excel that shows the count of missing value for each variable, & the count of duplicate variables. I can get the desired values using proc FREQ, but m not sure how to export/print the results to excel.

Here is the sample data:
Surname Firstname DOB AccountNo
Lee Brett 12/01/1960 1234
Smith Adam 1234
L Jason 14/05/1944 454664

The report should look something like:
Surname Missing 0 0.0%
Firstname Missing 0 0.0%
DOB Missing 1 33.33%
AccountNo Missing 0 0.0%
Duplicate 1 33.33%
You say that you can get the desired values using PROC FREQ, can you show your FREQ program?? Have you used the feature of PROC FREQ that will create an output dataset??? PROC FREQ, by default, would probably not create your desired report and you might need to create an output dataset from PROC FREQ and then organize your results so they are in the right form for PROC PRINT or PROC REPORT.

As for how you would get your report into Excel, you would either use:
1) PROC EXPORT (uses SAS dataset)
2) LIBNAME engine for Excel (uses SAS dataset)
3) ODS HTML or ODS MSOFFICE2K (uses procedure output)
4) ODS TAGSETS.EXCELXP (uses procedure output)
5) ODS CSV or ODS CSVALL (uses procedure output)

Since you posted your question to the ODS and Reporting forum, I figure you are planning to use ODS to create an HTML or XML or CSV file that Excel can open and render.

Calcite | Level 5
Hi Cynthia,

Currently I m manually pasting the PROC FREQ results to excel. But I m trying to create a dataset from PROC FREQ using macro(not sure how at this stage- my actual data has 28 variable so I want to use macro) And the next step will be to use ODS MSOFFICE2K.

The problem is that I haven't use much of macro or ODS so need assistance from someone.
The first rule of using SAS Macro variables and SAS Macro programs is to understand the code you want to generate (since the SAS Macro facility only "types" or generates code that gets sent to the compiler).

So, I asked whether you had already created an OUTPUT dataset from PROC FREQ. Before we go down the macro road, let's look at an output dataset created by ODS and PROC FREQ.

Let's say that I have manipulated SASHELP.CLASS to have the following missing data (first 4 rows are altered to have different variable values missing):
Data with Missing Values and Dup Height from SASHELP.CLASS

Obs Name Sex Age Height Weight

1 M . 69.0 112.5
2 Alice F 13 . 84.0
3 F 13 65.3 .
4 Carol 14 62.8 102.5
5 Henry M 14 63.5 102.5

Also note that observation 5 has a duplicate value for the WEIGHT variable (102.5).

So, if we generate PROC FREQ for the entire dataset (using _ALL_) and a format that translates a blank to "Missing" and a . to "Missing", then we get this for the output dataset:
1) What is in Dataset created by FREQ?

Table F_Name F_Sex F_Age F_Height F_Weight Frequency Percent
Table Name Missing 2 10.53
Alice 1 5.26
Carol 1 5.26
Henry 1 5.26
James 1 5.26
Jane 1 5.26
Janet 1 5.26
Jeffrey 1 5.26
John 1 5.26
Joyce 1 5.26
Judy 1 5.26
Louise 1 5.26
Mary 1 5.26
Philip 1 5.26
Robert 1 5.26
Ronald 1 5.26
Thomas 1 5.26
William 1 5.26

Table Sex Missing 1 5.26
F 8 42.11
M 10 52.63

Table Age Missing 1 5.26
11 2 10.53
12 5 26.32
13 3 15.79
14 3 15.79
15 4 21.05
16 1 5.26

Table Height Missing 1 5.26
51.3 1 5.26
56.3 1 5.26
57.3 1 5.26
57.5 1 5.26
59 1 5.26
59.8 1 5.26
62.5 2 10.53
62.8 1 5.26
63.5 1 5.26
64.3 1 5.26
64.8 1 5.26
65.3 1 5.26
66.5 2 10.53
67 1 5.26
69 1 5.26
72 1 5.26

Table Weight Missing 1 5.26
50.5 1 5.26
77 1 5.26
83 1 5.26
84 2 10.53
84.5 1 5.26
85 1 5.26
90 1 5.26
99.5 1 5.26
102.5 2 10.53
112 2 10.53
112.5 2 10.53
128 1 5.26
133 1 5.26
150 1 5.26

It is possible that once you have an output dataset, you can generate the report you need without the use of SAS Macro variables or code. You might be able to generate the report for "Missing" with just a simple WHERE clause added to the PROC REPORT. First, though, you have to understand how the creation of an OUTPUT dataset using ODS can simplify the task you want to accomplish. Toward this end, I suggest that you look at the doc for PROC FREQ and/or look for user-group papers on all you can do with PROC FREQ.

Note that the final report is shown using the ODS MSOFFICE2K "sandwich". This code will create an HTML file that Excel can open and render. The use of the .XLS file extension "fools" the Windows registry into launching Excel when you double-click on the output file. If you look inside the file with Notepad, you will see that it is really an HTML file.

data class;
set sashelp.class;
if _n_ = 1 then do;
name = ' ';
age = .;
else if _n_ = 2 then do;
height = .;
else if _n_ = 3 then do;
weight = .;
name = ' ';
else if _n_ = 4 then do;
sex = ' ';

proc print data=class(obs=5);
title 'Partial Data with Missing Values and Dup Height from SASHELP.CLASS';

proc format;
value $cms ' '='Missing';
value nms . = 'Missing';

options ls=256 nodate nonumber nocenter ps=200;
ods listing close;
ods output onewayfreqs=work.owf;
proc freq data=class ;
tables _all_ / missing;
format _character_ $cms. _numeric_ nms.;

ods listing;
ods msoffice2k file='c:\temp\freq_report.xls' style=sasweb;
proc report data=owf nowd;
column Table F_Name F_Sex F_Age F_Height F_Weight Frequency Percent;
define Table / order order=data f=$15.;
define F_Name / 'F_Name';
define F_Sex / 'F_Sex';
define F_Age / 'F_Age';
define F_Height / 'F_Height' f=$10.;
define F_Weight / 'F_Weight' f=$10.;
define Frequency / display f=10.;
define Percent / display f=8.2;
break after Table /;
compute after Table;
line ' ';
title '1) What is in Dataset created by FREQ?';
ods msoffice2k close;
Calcite | Level 5
Thanks Cynthia.But I would like to take a step back and work on PROC FREQ at this stage.

Ideally I am looking to get the count of each variable and a count of missing values for each variable. And for some variables I want to do PROC FREQ & to check if there is any invalid data. Finally I want to have all these results in an excel table.

Lets say from CLASS table , I want to have results somewhat like:
Name Total no 5 X%
Missing 2 X%

Sex Total no 5 X%
M 2 X%
F 2 X%
Missing 2 X%

Weight Total no 5 X%
Invalid 2 X% (invalid - for this scenario we are saying where the count of Age is > 1)

Is there any way of writing multiple statements and get the results populated in excel?
When you use the forum posting mechanism, certain characters, such as the LT sign (<) or the GT sign (>) are interpreted as HTML tags. Sometimes, the use of the < or the > inside a forum posting can cause your post to appear truncated or badly formatted.

You can avoid this behavior by altering what you write and using &lt; for < and &gt; for > symbols. This previous forum posting outlines other oddities and "protections" that you can apply when you make posts in order to protect > and < symbols. It talks about the LT and GT symbols and also discusses how to emphasize code and how to surround code snippets with [pre] and [/pre]毙

As I said, once you understand how to create an output dataset from PROC FREQ, you might have to use a DATA step program to restructure or "massage" your data. You will probably not get your desired report layout from one pass of PROC FREQ. For example, you show some kind of "post-processing" logic here, when you say:
Weight Total no 5 X%
Invalid 2 X% (invalid - for this scenario we are saying where the count of Age is > 1)

There was nothing in my code that considered AGE in setting the % or label for the WEIGHT variable, so I don't understand the added criteria you list "where the count of Age is > 1" -- if you want to change the label for the missing values for WEIGHT, you would have to create and use a different user-defined format for the WEIGHT variable. And, it looks like the requirement to report "duplicates" has dropped out of this new version of the report.

You might want to get some of your information from PROC MEANS or from the NLEVELS option of PROC FREQ. For example, if you add the NLEVELS option to the PROC FREQ shown above for WORK.CLASS, you would see this information on the "levels" of values for the variables:
The FREQ Procedure

Number of Variable Levels

Missing Nonmissing
Variable Levels Levels Levels
Name 18 1 17
Sex 3 1 2
Age 7 1 6
Height 17 1 16
Weight 15 1 14

Or, you may want to consider the use of PROC MEANS or PROC TABULATE along with PROC FREQ and PROC REPORT, as described in some of these papers and book excerpts:

But the bottom line is that you will need to understand what you get by default from the various SAS procedures and then take that default and possibly restructure the output from the procedures to produce the final report you want. You might want to treat character variables differently from numeric variables, you might want to consider incompatible combinations of "missing" such as when AGE is missing, then WEIGHT is invalid for that observation, etc, etc.

Calcite | Level 5
Thanks Cynthia. But I think I have gone bit off the track here.

Ideally I want to generate a report in excel that gets the count of each variable, count of missing values for each variable and for some variables I want to do proc FREQ.

Let's say if my data is likeSASHELP.CLASS, i need a report that looks something like:

Name Total count 5 x%
Missing 2 x%

Sex Total count 5 x%
Missing 1 x%
M 2 x%
F 2 x%

Weight Total count 5 x%
Invalid 2 x%(invalid- where the count of weight is greater than 1)

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2 in conversation