turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Data Management
- /
- Forum
- /
- SAS DATA QUALITY TESTING

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-28-2017 09:19 AM

Hi masters,

Now I have a dataset, including about 50 character variables and about another 50 numeric variables and millions of observations. I would like to have a SAS report, which will include the counts and percentages of each category under each character variable (the count, the category, the percentage and also the missing values), and the statistical property for each numerical variables (like count, mean, standard deviation, and etc).

Could anyone tell me, what SAS procedures I should use to get this kind of report? Or, is there any SAS code can count those character and numeric variables separately?

Moreover, could anyone tell me, how could I export missing values for each variable to a new dataset?

Many appreciation.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-28-2017 10:12 AM

It would help if you could provide a small data set with maybe 5 of each type of variable and 10 or so rows of data and what you would expect the output to look like.

One thing to note is that SAS will not know which variables are "categories" and which should have summary statistics calculated.

I would start with:

proc freq data=have;

tables _character_;

run;

This will provide a single table for each character variable, the count and percentage of each occurence, a cumulative count and percentage and a note after the table how many records had a missing value, if any for each, character variable.

Proc means data=have n mean min max std;

var _numeric_;

run;

Will provide requested statistics for each numeric variable.

The _character_ and _numeric_ are special variable list names maintained by SAS for each type of variable.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-28-2017 10:56 AM

Thank you.

I just read one paper, it showed that SAS macro can solved this problem. However, it is not working good in my pc. (This is the link. https://www.mwsug.org/proceedings/2011/pharma/MWSUG-2011-PH03.pdf)

I also copied the beginning part of the codeing in the following. Is there any possible, you could explain me in detail how to inuput the xlsfile, indata, outfile path to my macro? I tried to create new excels in my D drive, and did like this :

%macro onedata (xlsfile=D:\Scorecard Jinbo\input, indata=training_dataset, outfile=D:\Scorecard Jinbo\, trim=0.1);

but not success.

Could you help me about this?

The coding:

/******User guide******

xlsfile: Input the path and excel file name which the missing and outlier is output to.

indata: Input the data name which you want to check.

outfile: Input the path and file name for rtf report.

trim: Input the percentage you want to trim when calculate mean and SD.

***********************************************************/;

/********** Check for one dataset**********/

ods listing close;

%macro onedata (xlsfile=, indata=, outfile=, trim=);

/*Create content table*/

proc contents data=&indata

out=tmp(keep=memname name type label);

run;

/*Check frequency for Character variable, mean for numerical variable*/

/*Get the number of character and numerical variable*/

proc freq data=tmp;

table type;

ods output Freq.Table1.OneWayFreqs=tmp1(keep=type frequency);

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-28-2017 11:04 AM

proc FREQ and proc MEANS are very good and proc UNIVARIATE can also be usefull providing different statistics in the same way.

```
PROC UNIVARIATE DATA=sashelp.class;
VAR _numeric_;
OUTPUT
OUT=WORK.output
N=N
NMISS=NMiss
MEAN=Mean
MIN=Min
MAX=Max
MEDIAN=Median
STDMEAN=StdMean
SUM=Total;
RUN;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-28-2017 10:48 AM

If you using Enterprise Guide (or Add-in for MS Office) try the Characterize Data Task.

Or you might find Explorer data useful.