BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello

I want to calculate the following statistics for each numeric variable:

nr,nr_POS,nr_Neg,nr_Nulls,nr_0,nr_distinct_values,AVG,STD,UCL,LCL,Sum,Min,Max,Q1,Median,

Q3,P1,P5,P95,P99,P75,P25,CV,Range

I want to identify automatically the numeric variables.

In this code I perform wide to Long change of data structure.

My question- Is there a better more efficient way to calculate the required statistics? May you show code?

In real life the data set has 2 million rows and 200 numeric variables .

I am not sure if this method of work (Long to wide) is  efficient in this case

 

 

proc sql;
create table numeric_Vars as
select libname, memname, name, type, length,
format, informat, label
from dictionary.columns
where libname = 'SASHELP' and memname='CLASS' and upcase(type)='NUM'
;
quit;


proc sql noprint;
select name into : Numeric_Vars_List SEPARATED by ' '
from numeric_Vars
;
quit;
%put &Numeric_Vars_List.; /**Age Height Weight**/


proc sql noprint;
select count(*) as nr_numeric_Vars into :nr_numeric_Vars
from numeric_Vars
;
quit;
%put &nr_numeric_Vars.;


data Long_Structure_Numeric_Vars_Data(DROP=&Numeric_Vars_List. J);
set sashelp.class(KEEP=&Numeric_Vars_List.);
array vv{&nr_numeric_Vars.}   &Numeric_Vars_List.; /* _NUMERIC_   */
do j=1 to dim(vv);
Var_Name  = vname(vv(j));
Var_Value =vv(j);
output;
end;
run;


proc sql;
create table Want1 as
select Var_Name,
       count(*) as nr,
       sum(case when Var_Value>0 then 1 else 0 end ) as nr_POS,
       sum(case when Var_Value<0 AND Var_Value=. then 1 else 0 end ) as nr_Neg,
	   sum(case when Var_Value=. then 1 else 0 end ) as nr_Nulls,
       sum(case when Var_Value=0 then 1 else 0 end ) as nr_0,
	   count(distinct Var_Value) as nr_distinct_values,
	   mean(Var_Value) as AVG format=8.2,
	   STD(Var_Value) as STD format=8.2,
	   calculated AVG+calculated STD *3 as UCL,
	   calculated AVG-calculated STD *3 as LCL
from Long_Structure_Numeric_Vars_Data
group by Var_Name
;
quit;

proc means data=sashelp.class sum min max q1 median q3  P1 p5 p95 P99 P75 P25 cv SUM Range /**N mean std**/ STACKODSOUTPUT maxdec=2;
var _numeric_;
ods output summary=Want2(Rename=(Variable=Var_Name));
run;


data want_summary_Report;
merge Want1 Want2;
by Var_Name;
Run;


 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

The only  quick and easy way I know to get number of distinct values is to use Proc Freq with the NLEVELS option.

 

I see three steps, get your counts of ,nr_POS,nr_Neg,nr_Nulls,nr_0 which is one of your other questions,

the distinct values

      
Proc freq data=sashelp.class nlevels;
tables _numeric_;
ods output nlevels= distinctcount;
run;

And then proc means for the rest

proc means data=sashelp.class stackods 
  Mean STD UCLm LCLm Sum Min Max Q1 Median 
  Q3 P1 P5 P95 P99 P75 P25 CV Range;
  var _numeric_;
  ods output summary=mysummary;
run;

merge the three sets together on the variable name.

View solution in original post

3 REPLIES 3
PaigeMiller
Diamond | Level 26

SASHELP.CLASS is not a wide data set. It is a long data set. There is no benefit to doing a wide to long conversion. Don't waste your time.

 

Step 1: Use PROC FREQ and the formats you showed in your other thread to do the counting.

Step 2: Use the information already given to you by @Reeza here: https://communities.sas.com/t5/SAS-Programming/proc-freq-one-way-table-for-multiple-vars-export-to-d... to convert the PROC FREQ output to usable data set

Step 3: Use PROC SUMMARY/PROC UNIVARIATE to compute the rest of the statistics

Step 4: Merge step 2 and SUMMARY output together

--
Paige Miller
ballardw
Super User

The only  quick and easy way I know to get number of distinct values is to use Proc Freq with the NLEVELS option.

 

I see three steps, get your counts of ,nr_POS,nr_Neg,nr_Nulls,nr_0 which is one of your other questions,

the distinct values

      
Proc freq data=sashelp.class nlevels;
tables _numeric_;
ods output nlevels= distinctcount;
run;

And then proc means for the rest

proc means data=sashelp.class stackods 
  Mean STD UCLm LCLm Sum Min Max Q1 Median 
  Q3 P1 P5 P95 P99 P75 P25 CV Range;
  var _numeric_;
  ods output summary=mysummary;
run;

merge the three sets together on the variable name.

Tom
Super User Tom
Super User

Sounds like you need three passes through the data to generate all of those stats.

proc format;
 value sign 
   low-<0 = 'Negative'
   0 = 'Zero'
   0<-high = 'Positive'
   other = 'Missing'
 ;
run;
ods select none;
proc freq nlevels data=have;
  tables _numeric_ / noprint ;
  ods output nlevels=nlevels;
run;
proc means data=have stackods 
  N Nmiss Mean STD UCLm LCLm Sum Min Max Q1 Median 
  Q3 P1 P5 P95 P99 P75 P25 CV Range
;
  var _numeric_;
  ods output summary=means;
run;
proc freq data=have ;
  tables _numeric_ /  list nopercent nocum;
  format _numeric_ sign.;
  ods output OneWayFreqs=outfreq; 
run;
ods select all;
data freqs;
  length Variable _name_ $32;
  keep Variable _name_ Frequency ;
  set outfreq;
  Variable = substr(table,7);
  _name_ =vvaluex('F_'||variable);
run;
proc transpose data=freqs out=Neg0Pos(drop=_:);
  by Variable notsorted;
  id _name_;
  var Frequency;
run;

data want;
  merge Neg0Pos nlevels(rename=(TableVar=Variable)) means ;
  by Variable;
  Negative=sum(0,Negative);
  Positive=sum(0,Positive);
  Zero=sum(0,Zero);
run;

Example results:

                                                              NNon
                                                    NMiss     Miss
Obs Variable Negative     Zero Positive  NLevels   Levels   Levels  N NMiss         Mean

 1    var1          1        1        7        5        1        4  9   1       0.888889
 2    var2          0        4        4        4        1        3  8   2       0.750000
 3    var3          3        3        2        4        1        3  8   2       0.125000
 4    var4          4        3        2        4        1        3  9   1      -0.222222



Obs       StdDev          UCLM          LCLM           Sum           Min           Max

 1      0.927961      1.602183      0.175595      8.000000     -1.000000      2.000000
 2      0.886405      1.491053      0.008947      6.000000             0      2.000000
 3      1.246423      1.167036     -0.917036      1.000000     -1.000000      2.000000
 4      0.833333      0.418334     -0.862779     -2.000000     -1.000000      1.000000



Obs           Q1        Median            Q3            P1            P5           P95

 1      1.000000      1.000000      1.000000     -1.000000     -1.000000      2.000000
 2             0      0.500000      1.500000             0             0      2.000000
 3     -1.000000             0      1.000000     -1.000000     -1.000000      2.000000
 4     -1.000000             0             0     -1.000000     -1.000000      1.000000



Obs          P99             P75             P25              CV           Range

 1      2.000000        1.000000        1.000000      104.395582        3.000000
 2      2.000000        1.500000               0      118.187368        2.000000
 3      2.000000        1.000000       -1.000000      997.138764        3.000000
 4      1.000000               0       -1.000000     -375.000000        2.000000

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 531 views
  • 2 likes
  • 4 in conversation