Hello
I want to calculate the following statistics for each numeric variable:
nr,nr_POS,nr_Neg,nr_Nulls,nr_0,nr_distinct_values,AVG,STD,UCL,LCL,Sum,Min,Max,Q1,Median,
Q3,P1,P5,P95,P99,P75,P25,CV,Range
I want to identify automatically the numeric variables.
In this code I perform wide to Long change of data structure.
My question- Is there a better more efficient way to calculate the required statistics? May you show code?
In real life the data set has 2 million rows and 200 numeric variables .
I am not sure if this method of work (Long to wide) is efficient in this case
proc sql;
create table numeric_Vars as
select libname, memname, name, type, length,
format, informat, label
from dictionary.columns
where libname = 'SASHELP' and memname='CLASS' and upcase(type)='NUM'
;
quit;
proc sql noprint;
select name into : Numeric_Vars_List SEPARATED by ' '
from numeric_Vars
;
quit;
%put &Numeric_Vars_List.; /**Age Height Weight**/
proc sql noprint;
select count(*) as nr_numeric_Vars into :nr_numeric_Vars
from numeric_Vars
;
quit;
%put &nr_numeric_Vars.;
data Long_Structure_Numeric_Vars_Data(DROP=&Numeric_Vars_List. J);
set sashelp.class(KEEP=&Numeric_Vars_List.);
array vv{&nr_numeric_Vars.} &Numeric_Vars_List.; /* _NUMERIC_ */
do j=1 to dim(vv);
Var_Name = vname(vv(j));
Var_Value =vv(j);
output;
end;
run;
proc sql;
create table Want1 as
select Var_Name,
count(*) as nr,
sum(case when Var_Value>0 then 1 else 0 end ) as nr_POS,
sum(case when Var_Value<0 AND Var_Value=. then 1 else 0 end ) as nr_Neg,
sum(case when Var_Value=. then 1 else 0 end ) as nr_Nulls,
sum(case when Var_Value=0 then 1 else 0 end ) as nr_0,
count(distinct Var_Value) as nr_distinct_values,
mean(Var_Value) as AVG format=8.2,
STD(Var_Value) as STD format=8.2,
calculated AVG+calculated STD *3 as UCL,
calculated AVG-calculated STD *3 as LCL
from Long_Structure_Numeric_Vars_Data
group by Var_Name
;
quit;
proc means data=sashelp.class sum min max q1 median q3 P1 p5 p95 P99 P75 P25 cv SUM Range /**N mean std**/ STACKODSOUTPUT maxdec=2;
var _numeric_;
ods output summary=Want2(Rename=(Variable=Var_Name));
run;
data want_summary_Report;
merge Want1 Want2;
by Var_Name;
Run;
The only quick and easy way I know to get number of distinct values is to use Proc Freq with the NLEVELS option.
I see three steps, get your counts of ,nr_POS,nr_Neg,nr_Nulls,nr_0 which is one of your other questions,
the distinct values
Proc freq data=sashelp.class nlevels; tables _numeric_; ods output nlevels= distinctcount; run;
And then proc means for the rest
proc means data=sashelp.class stackods Mean STD UCLm LCLm Sum Min Max Q1 Median Q3 P1 P5 P95 P99 P75 P25 CV Range; var _numeric_; ods output summary=mysummary; run;
merge the three sets together on the variable name.
SASHELP.CLASS is not a wide data set. It is a long data set. There is no benefit to doing a wide to long conversion. Don't waste your time.
Step 1: Use PROC FREQ and the formats you showed in your other thread to do the counting.
Step 2: Use the information already given to you by @Reeza here: https://communities.sas.com/t5/SAS-Programming/proc-freq-one-way-table-for-multiple-vars-export-to-d... to convert the PROC FREQ output to usable data set
Step 3: Use PROC SUMMARY/PROC UNIVARIATE to compute the rest of the statistics
Step 4: Merge step 2 and SUMMARY output together
The only quick and easy way I know to get number of distinct values is to use Proc Freq with the NLEVELS option.
I see three steps, get your counts of ,nr_POS,nr_Neg,nr_Nulls,nr_0 which is one of your other questions,
the distinct values
Proc freq data=sashelp.class nlevels; tables _numeric_; ods output nlevels= distinctcount; run;
And then proc means for the rest
proc means data=sashelp.class stackods Mean STD UCLm LCLm Sum Min Max Q1 Median Q3 P1 P5 P95 P99 P75 P25 CV Range; var _numeric_; ods output summary=mysummary; run;
merge the three sets together on the variable name.
Sounds like you need three passes through the data to generate all of those stats.
proc format;
value sign
low-<0 = 'Negative'
0 = 'Zero'
0<-high = 'Positive'
other = 'Missing'
;
run;
ods select none;
proc freq nlevels data=have;
tables _numeric_ / noprint ;
ods output nlevels=nlevels;
run;
proc means data=have stackods
N Nmiss Mean STD UCLm LCLm Sum Min Max Q1 Median
Q3 P1 P5 P95 P99 P75 P25 CV Range
;
var _numeric_;
ods output summary=means;
run;
proc freq data=have ;
tables _numeric_ / list nopercent nocum;
format _numeric_ sign.;
ods output OneWayFreqs=outfreq;
run;
ods select all;
data freqs;
length Variable _name_ $32;
keep Variable _name_ Frequency ;
set outfreq;
Variable = substr(table,7);
_name_ =vvaluex('F_'||variable);
run;
proc transpose data=freqs out=Neg0Pos(drop=_:);
by Variable notsorted;
id _name_;
var Frequency;
run;
data want;
merge Neg0Pos nlevels(rename=(TableVar=Variable)) means ;
by Variable;
Negative=sum(0,Negative);
Positive=sum(0,Positive);
Zero=sum(0,Zero);
run;
Example results:
NNon NMiss Miss Obs Variable Negative Zero Positive NLevels Levels Levels N NMiss Mean 1 var1 1 1 7 5 1 4 9 1 0.888889 2 var2 0 4 4 4 1 3 8 2 0.750000 3 var3 3 3 2 4 1 3 8 2 0.125000 4 var4 4 3 2 4 1 3 9 1 -0.222222 Obs StdDev UCLM LCLM Sum Min Max 1 0.927961 1.602183 0.175595 8.000000 -1.000000 2.000000 2 0.886405 1.491053 0.008947 6.000000 0 2.000000 3 1.246423 1.167036 -0.917036 1.000000 -1.000000 2.000000 4 0.833333 0.418334 -0.862779 -2.000000 -1.000000 1.000000 Obs Q1 Median Q3 P1 P5 P95 1 1.000000 1.000000 1.000000 -1.000000 -1.000000 2.000000 2 0 0.500000 1.500000 0 0 2.000000 3 -1.000000 0 1.000000 -1.000000 -1.000000 2.000000 4 -1.000000 0 0 -1.000000 -1.000000 1.000000 Obs P99 P75 P25 CV Range 1 2.000000 1.000000 1.000000 104.395582 3.000000 2 2.000000 1.500000 0 118.187368 2.000000 3 2.000000 1.000000 -1.000000 997.138764 3.000000 4 1.000000 0 -1.000000 -375.000000 2.000000
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.