SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

SAS DATA QUALITY TESTING

Reply
Contributor
Posts: 34

SAS DATA QUALITY TESTING

Hi masters,

 

Now I have a dataset, including about 50 character variables and about another 50 numeric variables and millions of observations. I would like to have a SAS report, which will include the counts and percentages of each category under each character variable (the count, the category, the percentage and also the missing values), and the statistical property for each numerical variables (like count, mean, standard deviation, and etc).

 

Could anyone tell me, what SAS procedures I should use to get this kind of report? Or, is there any SAS code can count those character and numeric variables separately?

 

Moreover, could anyone tell me, how could I export missing values for each variable to a new dataset?

 

Many appreciation.

Super User
Posts: 11,343

Re: SAS DATA QUALITY TESTING

Posted in reply to JinboZhao

It would help if you could provide a small data set with maybe 5 of each type of variable and 10 or so rows of data and what you would expect the output to look like.

One thing to note is that SAS will not know which variables are "categories" and which should have summary statistics calculated.

 

I would start with:

 

proc freq data=have;

   tables _character_;

run;

This will provide a single table for each character variable, the count and percentage of each occurence, a cumulative count and percentage and a note after the table how many records had a missing value, if any for each, character variable.

 

Proc means data=have n mean min max std;

   var _numeric_;

run;

Will provide requested statistics for each numeric variable.

 

The _character_ and _numeric_ are special variable list names maintained by SAS for each type of variable.

Contributor
Posts: 34

Re: SAS DATA QUALITY TESTING

Thank you.

I just read one paper, it showed that SAS macro can solved this problem. However, it is not working good in my pc.  (This is the link. https://www.mwsug.org/proceedings/2011/pharma/MWSUG-2011-PH03.pdf)

 

I also copied the beginning part of the codeing in the following. Is there any possible, you could explain me in detail how to inuput the xlsfile, indata, outfile path to my macro? I tried to create new excels in my D drive, and did like this : 

%macro onedata (xlsfile=D:\Scorecard Jinbo\input, indata=training_dataset, outfile=D:\Scorecard Jinbo\, trim=0.1); 

but not success. 

 

 

Could you help me about this?

 

The coding:

 

/******User guide******
xlsfile: Input the path and excel file name which the missing and outlier is output to.
indata: Input the data name which you want to check.
outfile: Input the path and file name for rtf report.
trim: Input the percentage you want to trim when calculate mean and SD.
***********************************************************/;


/********** Check for one dataset**********/
ods listing close;
%macro onedata (xlsfile=, indata=, outfile=, trim=);

/*Create content table*/
proc contents data=&indata
out=tmp(keep=memname name type label);
run;

/*Check frequency for Character variable, mean for numerical variable*/

/*Get the number of character and numerical variable*/
proc freq data=tmp;
table type;
ods output Freq.Table1.OneWayFreqs=tmp1(keep=type frequency);
run;

 

 

SAS Employee
Posts: 174

Re: SAS DATA QUALITY TESTING

proc FREQ and proc MEANS are very good and proc UNIVARIATE can also be usefull providing different statistics in the same way.

		PROC UNIVARIATE DATA=sashelp.class; 
			VAR _numeric_; 
			OUTPUT  
				OUT=WORK.output  
				N=N  
				NMISS=NMiss  
				MEAN=Mean  
				MIN=Min  
				MAX=Max  
				MEDIAN=Median  
				STDMEAN=StdMean  
				SUM=Total; 
		RUN; 
SAS Employee
Posts: 174

Re: SAS DATA QUALITY TESTING

Posted in reply to JinboZhao

If you using Enterprise Guide (or Add-in for MS Office) try the Characterize Data Task.

 

EG characterize data task.png

 

EG characterize data task output.png

 

Or you might find Explorer data useful.

 

EG Explorer data.png

 

Ask a Question
Discussion stats
  • 4 replies
  • 196 views
  • 3 likes
  • 3 in conversation