06-28-2017 09:19 AM
Now I have a dataset, including about 50 character variables and about another 50 numeric variables and millions of observations. I would like to have a SAS report, which will include the counts and percentages of each category under each character variable (the count, the category, the percentage and also the missing values), and the statistical property for each numerical variables (like count, mean, standard deviation, and etc).
Could anyone tell me, what SAS procedures I should use to get this kind of report? Or, is there any SAS code can count those character and numeric variables separately?
Moreover, could anyone tell me, how could I export missing values for each variable to a new dataset?
06-28-2017 10:12 AM
It would help if you could provide a small data set with maybe 5 of each type of variable and 10 or so rows of data and what you would expect the output to look like.
One thing to note is that SAS will not know which variables are "categories" and which should have summary statistics calculated.
I would start with:
proc freq data=have;
This will provide a single table for each character variable, the count and percentage of each occurence, a cumulative count and percentage and a note after the table how many records had a missing value, if any for each, character variable.
Proc means data=have n mean min max std;
Will provide requested statistics for each numeric variable.
The _character_ and _numeric_ are special variable list names maintained by SAS for each type of variable.
06-28-2017 10:56 AM
I just read one paper, it showed that SAS macro can solved this problem. However, it is not working good in my pc. (This is the link. https://www.mwsug.org/proceedings/2011/pharma/MWSUG-2011-PH03.pdf)
I also copied the beginning part of the codeing in the following. Is there any possible, you could explain me in detail how to inuput the xlsfile, indata, outfile path to my macro? I tried to create new excels in my D drive, and did like this :
%macro onedata (xlsfile=D:\Scorecard Jinbo\input, indata=training_dataset, outfile=D:\Scorecard Jinbo\, trim=0.1);
but not success.
Could you help me about this?
xlsfile: Input the path and excel file name which the missing and outlier is output to.
indata: Input the data name which you want to check.
outfile: Input the path and file name for rtf report.
trim: Input the percentage you want to trim when calculate mean and SD.
/********** Check for one dataset**********/
ods listing close;
%macro onedata (xlsfile=, indata=, outfile=, trim=);
/*Create content table*/
proc contents data=&indata
out=tmp(keep=memname name type label);
/*Check frequency for Character variable, mean for numerical variable*/
/*Get the number of character and numerical variable*/
proc freq data=tmp;
ods output Freq.Table1.OneWayFreqs=tmp1(keep=type frequency);
06-28-2017 11:04 AM
proc FREQ and proc MEANS are very good and proc UNIVARIATE can also be usefull providing different statistics in the same way.
PROC UNIVARIATE DATA=sashelp.class; VAR _numeric_; OUTPUT OUT=WORK.output N=N NMISS=NMiss MEAN=Mean MIN=Min MAX=Max MEDIAN=Median STDMEAN=StdMean SUM=Total; RUN;