Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Proc Means or other procedure for mean/quartiles

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-24-2012 11:57 AM

Hi

I have a few questions on how to calculate a mean/quartile:

1. I have a data set that has two variables that I want to calculate the mean and other quartiles for. The below statement returns what appears to be the means and the first and third quartiles. From a code perspective, is this correct?

proc means data=s10sorted median q1 q3 stddev;

var dur_filing_issuejoin dur_issuejoin_dispo;

run;

2. In the columns for the above variables, I have many null values that I want to factor into the median/quart. In other words, I do not want to calculate a mean or quartile if 50% (or 75% for 3rd quartile) of the observations have not occurred yet. I am assuming the above procedure just calculates the mean/quart for the numbers that are there, so it doesn't care if the appropriate number of obs have occurred?

3. I need to export this to a spreadsheet and am unfamiliar with how to do this. I know proc print prints out to screen. Is there another way to generate a .csv, .txt, or .xls?

Paul

Accepted Solutions

Solution

09-06-2017
02:09 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Paul_NYS

10-24-2012 12:31 PM - last edited on 09-06-2017 02:09 PM by Kathryn_SAS

*Editor's Note: Thanks to @Reeza for providing the answer that addressed the original question and for pointing to an alternative using PROC LIFETEST. Most Base procedures, including PROC MEANS and PROC RANK, will exclude missing values before calculating statistics. Changing missing values to 0 in advance would not help in this case. If you don't want a statistic to be calculated if there are not enough non-missing values, you would need to check in advance for the number of non-missing values and then use macro logic to conditionally execute the code based on that number.*

1. This is correct, but why would you calculate stddev but quartiles? Stdev is usually used with the mean?

2. Does this mean you want to exclude your null values? If they're coded as missing, SAS will exclude them, if you're looking at doing something else, like censoring you might want to consider proc lifetest instead but it depends on your data and your question.

3.

ods tagsets.excelxp file='Temp.xml';

proc means data=s10sorted median q1 q3 stddev;

var dur_filing_issuejoin dur_issuejoin_dispo;

run;

ods tagsets.excelxp close;

All Replies

Solution

09-06-2017
02:09 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Paul_NYS

10-24-2012 12:31 PM - last edited on 09-06-2017 02:09 PM by Kathryn_SAS

*Editor's Note: Thanks to @Reeza for providing the answer that addressed the original question and for pointing to an alternative using PROC LIFETEST. Most Base procedures, including PROC MEANS and PROC RANK, will exclude missing values before calculating statistics. Changing missing values to 0 in advance would not help in this case. If you don't want a statistic to be calculated if there are not enough non-missing values, you would need to check in advance for the number of non-missing values and then use macro logic to conditionally execute the code based on that number.*

1. This is correct, but why would you calculate stddev but quartiles? Stdev is usually used with the mean?

2. Does this mean you want to exclude your null values? If they're coded as missing, SAS will exclude them, if you're looking at doing something else, like censoring you might want to consider proc lifetest instead but it depends on your data and your question.

3.

ods tagsets.excelxp file='Temp.xml';

proc means data=s10sorted median q1 q3 stddev;

var dur_filing_issuejoin dur_issuejoin_dispo;

run;

ods tagsets.excelxp close;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

10-24-2012 12:49 PM

Thanks Reeza.

1. I was experimenting with proc means, so stdev was just part of that.

2. No, I want the null values to be factored in. What I want to do is to be sure the median and quartile calculations are done when 50% or 75% of the observations have actually occurred. I don't want to calculate the either based on SAS just calculating it for the values that are present.

So if I have 10 potential observations and only 4 of them have occurred as of a certain date, I should not be able to calculate a median for that variable. The way proc means seems to work is that it will calculate a median for the 4 present values and exclude the 6 that are missing.

Now if there were 5 observations (out of 10), I should be able to calculate a median because half of the observations have occurred and it doesn't matter what the remaining values are, they are already past the median point.

So I am wondering if this is something that must be coded or whether there is a configuration or another procedure that accomplishes this?

3. Thank you. I will try this.

Paul

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Paul_NYS

10-24-2012 12:54 PM

I think you need to look into survival analysis and the Kaplan Meier Curve and see if its what you're looking for.

The related SAS procedure is proc lifetest, if you provide a bit about the subject matter I can help direct you to a relevant resource.

There are slight difference in terms when looking at churn models for sales, survival of patients and failure of products in a manufacturing industry.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

10-24-2012 01:02 PM

Yes, can you please point me to the lifetest resource?

Also, on the function to export results of this into a file, the below code is part of a much larger set of code. Is there parameter that just points to a specific PROC or DATA section of code and not the whole thing?

Paul

ods tagsets.excelxp file='Temp.xml';

proc means data=s10sorted median q1 q3 stddev;

var dur_filing_issuejoin dur_issuejoin_dispo;

run;

ods tagsets.excelxp close;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Paul_NYS

10-24-2012 01:53 PM

Survival/Failure Time Analysis

Introduction to Survival Analysis with SAS Seminar

You can create an output data set from proc means and then export that out instead using proc means.

Here's a paper on that. If you have 9.3 you may want to look into the stackODS option as well.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Paul_NYS

10-24-2012 10:43 PM

Or Maybe you can check PROC RANKS , which might be your demand .

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

11-02-2012 10:57 AM

PROC RANK with the percent option, and then use if logic to identify the quartiles in a data step.

Ex.

Proc rank data=data1 out=data2 percent;

var quartile-variable;

ranks percentile;

run;

data data3;

set data2;

if percentile < .25 then quartile=1;

else if percentile < .5 then quartile = 2;

else if percentile < .75 then quartile = 3;

else quartile=4;

run;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to cau83

11-05-2012 07:43 AM

Proc rank will not necessarily work because the percent option only calculates percents for non-missing values: "Percent: divides each rank by the number of observations that have nonmissing values of the variable and multiplies the result by 100 to get a percentage...."

While this makes sense, I need to factor in missing values in determining a median.

Paul

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Paul_NYS

11-05-2012 08:52 AM

Ok, I see what you're saying. Does it matter where those missing ones are ranked? Could you set all missing values to the minimum value so that they are all put into the lowest quartile? Then at the 25%, 50%, etc. you will be at a point where you have that same % of the observations... just skewed up above their actual relative position when only compared to other non-missing.