BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Vogel
Fluorite | Level 6

Hi all,

 

I have a dataset with approximately 5.5 million observations of 3 numeric variables.

 

All I need are the means and a few quantiles for all 3 variables stored in a table.

 

I am currently running the following code and getting the desired result:

 

proc univariate data=MYDATA outtable=UNIV (keep=_var_ _min_ _p5_ _q1_ _median_ _mean_ _q3_ _p95_ _max_) noprint;
run;

This, however, seems like a waste of resources as it computes a number of statistics which I then drop immediately, and it also generates a warning about the number of observations being too large to calculate Qn, which I do not need here.

 

The number of nonmissing observations for variable X is too large to compute the robust measure of scale Qn. The statistic Qn is set to missing.

In sum, the code does what I want, but makes a number of unnecessary computations. In order to save time and resources, and also just out of interest, I was wondering whether there was any way to restrict the computations to a list of explicitly requested statistics.

 

Thanks in advance for your expertise.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
ods select none;
ods output summary=want;
proc means data=sashelp.heart   min p5 p25 median mean p75 p95 max stackodsoutput;
var _numeric_;
run;
ods select all;

proc print;run;

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

@Vogel wrote:

Hi all,

 

I have a dataset with approximately 5.5 million observations of 3 numeric variables.

 

All I need are the means and a few quantiles for all 3 variables stored in a table.

 

I am currently running the following code and getting the desired result:

 

proc univariate data=MYDATA outtable=UNIV (keep=_var_ _min_ _p5_ _q1_ _median_ _mean_ _q3_ _p95_ _max_) noprint;
run;

This, however, seems like a waste of resources as it computes a number of statistics which I then drop immediately, and it also generates a warning about the number of observations being too large to calculate Qn, which I do not need here.

Then use PROC MEANS or PROC SUMMARY and you can control which statistics are computed.

--
Paige Miller
Ksharp
Super User
ods select none;
ods output summary=want;
proc means data=sashelp.heart   min p5 p25 median mean p75 p95 max stackodsoutput;
var _numeric_;
run;
ods select all;

proc print;run;
Vogel
Fluorite | Level 6

Hi again,

 

First of all thanks, the layout of the output of your example code is exactly what I want!

 

The only remaining issue I now have, is it's hard to reproduce the same layout when using an output dataset, i.e. one variable that contains the input data variable name, and further variables for each of the computed statistics.

 

Is there any way to achieve this directly from a proc means / summary, or do I basically need to transform the output data myself to replicate that layout?

 

Thanks in advance.

PaigeMiller
Diamond | Level 26

@Vogel wrote:

Hi again,

 

First of all thanks, the layout of the output of your example code is exactly what I want!

 

The only remaining issue I now have, is it's hard to reproduce the same layout when using an output dataset, i.e. one variable that contains the input data variable name, and further variables for each of the computed statistics.

 

Is there any way to achieve this directly from a proc means / summary, or do I basically need to transform the output data myself to replicate that layout?

 

Thanks in advance.


This is not clear. What do you mean by "same layout"? Please explain further, or better yet, show us an example of what you want.

--
Paige Miller
Vogel
Fluorite | Level 6

OK, sorry about that.

 

I'm working in Enterprise Guide 7.15 HF3. When I run the following:

 

proc means data=sashelp.heart min p5 p25 median mean p75 p95 max;
	var _numeric_;
run;

The report shows the output I'm attaching to this post in an Excel file.

 

My question is whether, using an output statement to create an output dataset, I can get the results in a similar layout directly from proc means.

 

So the resulting dataset would have one observation per numeric variable in SASHELP.HEART, Character variables for the variable names and labels, and Numeric variables for the requested statistics (which are the same for all variables).

 

 

PaigeMiller
Diamond | Level 26

Some of us will not (or cannot) open Microsoft Office documents because they are a security risk.

 

Paste a portion of the output from PROC MEANS into the window that appears when you click on the {i} icon.

 

My question is whether, using an output statement to create an output dataset, I can get the results in a similar layout directly from proc means.

 

Please show us what you want.

--
Paige Miller
Ksharp
Super User

Did you open table WANT ? Is that you want ?

Vogel
Fluorite | Level 6
Yes! I don't know how on earth I missed that. Thanks again and sorry for the confusion.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1877 views
  • 5 likes
  • 3 in conversation