Hi masters,
Now I have a dataset, including about 50 character variables and about another 50 numeric variables and millions of observations. I would like to have a SAS report, which will include the counts and percentages of each category under each character variable (the count, the category, the percentage and also the missing values), and the statistical property for each numerical variables (like count, mean, standard deviation, and etc).
Could anyone tell me, what SAS procedures I should use to get this kind of report? Or, is there any SAS code can count those character and numeric variables separately?
Moreover, could anyone tell me, how could I export missing values for each variable to a new dataset?
Many appreciation.
It would help if you could provide a small data set with maybe 5 of each type of variable and 10 or so rows of data and what you would expect the output to look like.
One thing to note is that SAS will not know which variables are "categories" and which should have summary statistics calculated.
I would start with:
proc freq data=have;
tables _character_;
run;
This will provide a single table for each character variable, the count and percentage of each occurence, a cumulative count and percentage and a note after the table how many records had a missing value, if any for each, character variable.
Proc means data=have n mean min max std;
var _numeric_;
run;
Will provide requested statistics for each numeric variable.
The _character_ and _numeric_ are special variable list names maintained by SAS for each type of variable.
Thank you.
I just read one paper, it showed that SAS macro can solved this problem. However, it is not working good in my pc. (This is the link. https://www.mwsug.org/proceedings/2011/pharma/MWSUG-2011-PH03.pdf)
I also copied the beginning part of the codeing in the following. Is there any possible, you could explain me in detail how to inuput the xlsfile, indata, outfile path to my macro? I tried to create new excels in my D drive, and did like this :
%macro onedata (xlsfile=D:\Scorecard Jinbo\input, indata=training_dataset, outfile=D:\Scorecard Jinbo\, trim=0.1);
but not success.
Could you help me about this?
The coding:
/******User guide******
xlsfile: Input the path and excel file name which the missing and outlier is output to.
indata: Input the data name which you want to check.
outfile: Input the path and file name for rtf report.
trim: Input the percentage you want to trim when calculate mean and SD.
***********************************************************/;
/********** Check for one dataset**********/
ods listing close;
%macro onedata (xlsfile=, indata=, outfile=, trim=);
/*Create content table*/
proc contents data=&indata
out=tmp(keep=memname name type label);
run;
/*Check frequency for Character variable, mean for numerical variable*/
/*Get the number of character and numerical variable*/
proc freq data=tmp;
table type;
ods output Freq.Table1.OneWayFreqs=tmp1(keep=type frequency);
run;
proc FREQ and proc MEANS are very good and proc UNIVARIATE can also be usefull providing different statistics in the same way.
PROC UNIVARIATE DATA=sashelp.class;
VAR _numeric_;
OUTPUT
OUT=WORK.output
N=N
NMISS=NMiss
MEAN=Mean
MIN=Min
MAX=Max
MEDIAN=Median
STDMEAN=StdMean
SUM=Total;
RUN;
If you using Enterprise Guide (or Add-in for MS Office) try the Characterize Data Task.
Or you might find Explorer data useful.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.