BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Vogel
Fluorite | Level 6

Hi all,

 

I have a dataset with approximately 5.5 million observations of 3 numeric variables.

 

All I need are the means and a few quantiles for all 3 variables stored in a table.

 

I am currently running the following code and getting the desired result:

 

proc univariate data=MYDATA outtable=UNIV (keep=_var_ _min_ _p5_ _q1_ _median_ _mean_ _q3_ _p95_ _max_) noprint;
run;

This, however, seems like a waste of resources as it computes a number of statistics which I then drop immediately, and it also generates a warning about the number of observations being too large to calculate Qn, which I do not need here.

 

The number of nonmissing observations for variable X is too large to compute the robust measure of scale Qn. The statistic Qn is set to missing.

In sum, the code does what I want, but makes a number of unnecessary computations. In order to save time and resources, and also just out of interest, I was wondering whether there was any way to restrict the computations to a list of explicitly requested statistics.

 

Thanks in advance for your expertise.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
ods select none;
ods output summary=want;
proc means data=sashelp.heart   min p5 p25 median mean p75 p95 max stackodsoutput;
var _numeric_;
run;
ods select all;

proc print;run;

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

@Vogel wrote:

Hi all,

 

I have a dataset with approximately 5.5 million observations of 3 numeric variables.

 

All I need are the means and a few quantiles for all 3 variables stored in a table.

 

I am currently running the following code and getting the desired result:

 

proc univariate data=MYDATA outtable=UNIV (keep=_var_ _min_ _p5_ _q1_ _median_ _mean_ _q3_ _p95_ _max_) noprint;
run;

This, however, seems like a waste of resources as it computes a number of statistics which I then drop immediately, and it also generates a warning about the number of observations being too large to calculate Qn, which I do not need here.

Then use PROC MEANS or PROC SUMMARY and you can control which statistics are computed.

--
Paige Miller
Ksharp
Super User
ods select none;
ods output summary=want;
proc means data=sashelp.heart   min p5 p25 median mean p75 p95 max stackodsoutput;
var _numeric_;
run;
ods select all;

proc print;run;
Vogel
Fluorite | Level 6

Hi again,

 

First of all thanks, the layout of the output of your example code is exactly what I want!

 

The only remaining issue I now have, is it's hard to reproduce the same layout when using an output dataset, i.e. one variable that contains the input data variable name, and further variables for each of the computed statistics.

 

Is there any way to achieve this directly from a proc means / summary, or do I basically need to transform the output data myself to replicate that layout?

 

Thanks in advance.

PaigeMiller
Diamond | Level 26

@Vogel wrote:

Hi again,

 

First of all thanks, the layout of the output of your example code is exactly what I want!

 

The only remaining issue I now have, is it's hard to reproduce the same layout when using an output dataset, i.e. one variable that contains the input data variable name, and further variables for each of the computed statistics.

 

Is there any way to achieve this directly from a proc means / summary, or do I basically need to transform the output data myself to replicate that layout?

 

Thanks in advance.


This is not clear. What do you mean by "same layout"? Please explain further, or better yet, show us an example of what you want.

--
Paige Miller
Vogel
Fluorite | Level 6

OK, sorry about that.

 

I'm working in Enterprise Guide 7.15 HF3. When I run the following:

 

proc means data=sashelp.heart min p5 p25 median mean p75 p95 max;
	var _numeric_;
run;

The report shows the output I'm attaching to this post in an Excel file.

 

My question is whether, using an output statement to create an output dataset, I can get the results in a similar layout directly from proc means.

 

So the resulting dataset would have one observation per numeric variable in SASHELP.HEART, Character variables for the variable names and labels, and Numeric variables for the requested statistics (which are the same for all variables).

 

 

PaigeMiller
Diamond | Level 26

Some of us will not (or cannot) open Microsoft Office documents because they are a security risk.

 

Paste a portion of the output from PROC MEANS into the window that appears when you click on the {i} icon.

 

My question is whether, using an output statement to create an output dataset, I can get the results in a similar layout directly from proc means.

 

Please show us what you want.

--
Paige Miller
Ksharp
Super User

Did you open table WANT ? Is that you want ?

Vogel
Fluorite | Level 6
Yes! I don't know how on earth I missed that. Thanks again and sorry for the confusion.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1020 views
  • 5 likes
  • 3 in conversation