Hello
I want to calculate the following statistics for each numeric variable:
nr,nr_POS,nr_Neg,nr_Nulls,nr_0,nr_distinct_values,AVG,STD,UCL,LCL,Sum,Min,Max,Q1,Median,
Q3,P1,P5,P95,P99,P75,P25,CV,Range
I want to identify automatically the numeric variables.
In this code I perform wide to Long change of data structure.
My question- Is there a better more efficient way to calculate the required statistics? May you show code?
In real life the data set has 2 million rows and 200 numeric variables .
I am not sure if this method of work (Long to wide) is efficient in this case
proc sql;
create table numeric_Vars as
select libname, memname, name, type, length,
format, informat, label
from dictionary.columns
where libname = 'SASHELP' and memname='CLASS' and upcase(type)='NUM'
;
quit;
proc sql noprint;
select name into : Numeric_Vars_List SEPARATED by ' '
from numeric_Vars
;
quit;
%put &Numeric_Vars_List.; /**Age Height Weight**/
proc sql noprint;
select count(*) as nr_numeric_Vars into :nr_numeric_Vars
from numeric_Vars
;
quit;
%put &nr_numeric_Vars.;
data Long_Structure_Numeric_Vars_Data(DROP=&Numeric_Vars_List. J);
set sashelp.class(KEEP=&Numeric_Vars_List.);
array vv{&nr_numeric_Vars.} &Numeric_Vars_List.; /* _NUMERIC_ */
do j=1 to dim(vv);
Var_Name = vname(vv(j));
Var_Value =vv(j);
output;
end;
run;
proc sql;
create table Want1 as
select Var_Name,
count(*) as nr,
sum(case when Var_Value>0 then 1 else 0 end ) as nr_POS,
sum(case when Var_Value<0 AND Var_Value=. then 1 else 0 end ) as nr_Neg,
sum(case when Var_Value=. then 1 else 0 end ) as nr_Nulls,
sum(case when Var_Value=0 then 1 else 0 end ) as nr_0,
count(distinct Var_Value) as nr_distinct_values,
mean(Var_Value) as AVG format=8.2,
STD(Var_Value) as STD format=8.2,
calculated AVG+calculated STD *3 as UCL,
calculated AVG-calculated STD *3 as LCL
from Long_Structure_Numeric_Vars_Data
group by Var_Name
;
quit;
proc means data=sashelp.class sum min max q1 median q3 P1 p5 p95 P99 P75 P25 cv SUM Range /**N mean std**/ STACKODSOUTPUT maxdec=2;
var _numeric_;
ods output summary=Want2(Rename=(Variable=Var_Name));
run;
data want_summary_Report;
merge Want1 Want2;
by Var_Name;
Run;
The only quick and easy way I know to get number of distinct values is to use Proc Freq with the NLEVELS option.
I see three steps, get your counts of ,nr_POS,nr_Neg,nr_Nulls,nr_0 which is one of your other questions,
the distinct values
Proc freq data=sashelp.class nlevels; tables _numeric_; ods output nlevels= distinctcount; run;
And then proc means for the rest
proc means data=sashelp.class stackods Mean STD UCLm LCLm Sum Min Max Q1 Median Q3 P1 P5 P95 P99 P75 P25 CV Range; var _numeric_; ods output summary=mysummary; run;
merge the three sets together on the variable name.
SASHELP.CLASS is not a wide data set. It is a long data set. There is no benefit to doing a wide to long conversion. Don't waste your time.
Step 1: Use PROC FREQ and the formats you showed in your other thread to do the counting.
Step 2: Use the information already given to you by @Reeza here: https://communities.sas.com/t5/SAS-Programming/proc-freq-one-way-table-for-multiple-vars-export-to-d... to convert the PROC FREQ output to usable data set
Step 3: Use PROC SUMMARY/PROC UNIVARIATE to compute the rest of the statistics
Step 4: Merge step 2 and SUMMARY output together
The only quick and easy way I know to get number of distinct values is to use Proc Freq with the NLEVELS option.
I see three steps, get your counts of ,nr_POS,nr_Neg,nr_Nulls,nr_0 which is one of your other questions,
the distinct values
Proc freq data=sashelp.class nlevels; tables _numeric_; ods output nlevels= distinctcount; run;
And then proc means for the rest
proc means data=sashelp.class stackods Mean STD UCLm LCLm Sum Min Max Q1 Median Q3 P1 P5 P95 P99 P75 P25 CV Range; var _numeric_; ods output summary=mysummary; run;
merge the three sets together on the variable name.
Sounds like you need three passes through the data to generate all of those stats.
proc format;
value sign
low-<0 = 'Negative'
0 = 'Zero'
0<-high = 'Positive'
other = 'Missing'
;
run;
ods select none;
proc freq nlevels data=have;
tables _numeric_ / noprint ;
ods output nlevels=nlevels;
run;
proc means data=have stackods
N Nmiss Mean STD UCLm LCLm Sum Min Max Q1 Median
Q3 P1 P5 P95 P99 P75 P25 CV Range
;
var _numeric_;
ods output summary=means;
run;
proc freq data=have ;
tables _numeric_ / list nopercent nocum;
format _numeric_ sign.;
ods output OneWayFreqs=outfreq;
run;
ods select all;
data freqs;
length Variable _name_ $32;
keep Variable _name_ Frequency ;
set outfreq;
Variable = substr(table,7);
_name_ =vvaluex('F_'||variable);
run;
proc transpose data=freqs out=Neg0Pos(drop=_:);
by Variable notsorted;
id _name_;
var Frequency;
run;
data want;
merge Neg0Pos nlevels(rename=(TableVar=Variable)) means ;
by Variable;
Negative=sum(0,Negative);
Positive=sum(0,Positive);
Zero=sum(0,Zero);
run;
Example results:
NNon NMiss Miss Obs Variable Negative Zero Positive NLevels Levels Levels N NMiss Mean 1 var1 1 1 7 5 1 4 9 1 0.888889 2 var2 0 4 4 4 1 3 8 2 0.750000 3 var3 3 3 2 4 1 3 8 2 0.125000 4 var4 4 3 2 4 1 3 9 1 -0.222222 Obs StdDev UCLM LCLM Sum Min Max 1 0.927961 1.602183 0.175595 8.000000 -1.000000 2.000000 2 0.886405 1.491053 0.008947 6.000000 0 2.000000 3 1.246423 1.167036 -0.917036 1.000000 -1.000000 2.000000 4 0.833333 0.418334 -0.862779 -2.000000 -1.000000 1.000000 Obs Q1 Median Q3 P1 P5 P95 1 1.000000 1.000000 1.000000 -1.000000 -1.000000 2.000000 2 0 0.500000 1.500000 0 0 2.000000 3 -1.000000 0 1.000000 -1.000000 -1.000000 2.000000 4 -1.000000 0 0 -1.000000 -1.000000 1.000000 Obs P99 P75 P25 CV Range 1 2.000000 1.000000 1.000000 104.395582 3.000000 2 2.000000 1.500000 0 118.187368 2.000000 3 2.000000 1.000000 -1.000000 997.138764 3.000000 4 1.000000 0 -1.000000 -375.000000 2.000000
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.