Help using Base SAS procedures

Proc Means or other procedure for mean/quartiles

Accepted Solution Solved
Reply
Regular Contributor
Posts: 216
Accepted Solution

Proc Means or other procedure for mean/quartiles

Hi

I have a few questions on how to calculate a mean/quartile:

1. I have a data set that has two variables that I want to calculate the mean and other quartiles for. The below statement returns what appears to be the means and the first and third quartiles. From a code perspective, is this correct?

proc means data=s10sorted median q1 q3 stddev;

var dur_filing_issuejoin dur_issuejoin_dispo;

run;

2. In the columns for the above variables, I have many null values that I want to factor into the median/quart. In other words, I do not want to calculate a mean or quartile if 50% (or 75% for 3rd quartile) of the observations have not occurred yet. I am assuming the above procedure just calculates the mean/quart for the numbers that are there, so it doesn't care if the appropriate number of obs have occurred?

3. I need to export this to a spreadsheet and am unfamiliar with how to do this. I know proc print prints out to screen. Is there another way to generate a .csv, .txt, or .xls?

Paul


Accepted Solutions
Solution
‎09-06-2017 02:09 PM
Super User
Posts: 19,815

Re: Proc Means or other procedure for mean/quartiles

[ Edited ]

Editor's Note: Thanks to @Reeza for providing the answer that addressed the original question and for pointing to an alternative using PROC LIFETEST. Most Base procedures, including PROC MEANS and PROC RANK, will exclude missing values before calculating statistics. Changing missing values to 0 in advance would not help in this case. If you don't want a statistic to be calculated if there are not enough non-missing values, you would need to check in advance for the number of non-missing values and then use macro logic to conditionally execute the code based on that number.

 

1. This is correct, but why would you calculate stddev but quartiles? Stdev is usually used with the mean?

2.  Does this mean you want to exclude your null values?  If they're coded as missing, SAS will exclude them, if you're looking at doing something else, like censoring you might want to consider proc lifetest instead but it depends on your data and your question.

3.

 

ods tagsets.excelxp file='Temp.xml';

 

proc means data=s10sorted median q1 q3 stddev;

var dur_filing_issuejoin dur_issuejoin_dispo;

run;

 

ods tagsets.excelxp close;

View solution in original post


All Replies
Solution
‎09-06-2017 02:09 PM
Super User
Posts: 19,815

Re: Proc Means or other procedure for mean/quartiles

[ Edited ]

Editor's Note: Thanks to @Reeza for providing the answer that addressed the original question and for pointing to an alternative using PROC LIFETEST. Most Base procedures, including PROC MEANS and PROC RANK, will exclude missing values before calculating statistics. Changing missing values to 0 in advance would not help in this case. If you don't want a statistic to be calculated if there are not enough non-missing values, you would need to check in advance for the number of non-missing values and then use macro logic to conditionally execute the code based on that number.

 

1. This is correct, but why would you calculate stddev but quartiles? Stdev is usually used with the mean?

2.  Does this mean you want to exclude your null values?  If they're coded as missing, SAS will exclude them, if you're looking at doing something else, like censoring you might want to consider proc lifetest instead but it depends on your data and your question.

3.

 

ods tagsets.excelxp file='Temp.xml';

 

proc means data=s10sorted median q1 q3 stddev;

var dur_filing_issuejoin dur_issuejoin_dispo;

run;

 

ods tagsets.excelxp close;

Regular Contributor
Posts: 216

Re: Proc Means or other procedure for mean/quartiles

Thanks Reeza.

1. I was experimenting with proc means, so stdev was just part of that.

2. No, I want the null values to be factored in. What I want to do is to be sure the median and quartile calculations are done when 50% or 75% of the observations have actually occurred. I don't want to calculate the either based on SAS just calculating it for the values that are present.

So if I have 10 potential observations and only 4 of them have occurred as of a certain date, I should not be able to calculate a median for that variable. The way proc means seems to work is that it will calculate a median for the 4 present values and exclude the 6 that are missing.

Now if there were 5 observations (out of 10), I should be able to calculate a median because half of the observations have occurred and it doesn't matter what the remaining values are, they are already past the median point.

So I am wondering if this is something that must be coded or whether there is a configuration or another procedure that accomplishes this?

3. Thank you. I will try this.

Paul

Super User
Posts: 19,815

Re: Proc Means or other procedure for mean/quartiles

I think you need to look into survival analysis and the Kaplan Meier Curve and see if its what you're looking for.

The related SAS procedure is proc lifetest, if you provide a bit about the subject matter I can help direct you to a relevant resource.

There are slight difference in terms when looking at churn models for sales, survival of patients and failure of products in a manufacturing industry.

Regular Contributor
Posts: 216

Re: Proc Means or other procedure for mean/quartiles

Yes, can you please point me to the lifetest resource?

Also, on the function to export results of this into a file, the below code is part of a much larger set of code. Is there parameter that just points to a specific PROC or DATA section of code and not the whole thing?

Paul

ods tagsets.excelxp file='Temp.xml';

proc means data=s10sorted median q1 q3 stddev;

var dur_filing_issuejoin dur_issuejoin_dispo;

run;

ods tagsets.excelxp close;

Super User
Posts: 19,815

Re: Proc Means or other procedure for mean/quartiles

Survival/Failure Time Analysis

Introduction to Survival Analysis with SAS Seminar

You can create an output data set from proc means and then export that out instead using proc means.

Here's a paper on that. If you have 9.3 you may want to look into the stackODS option as well.

http://www.lexjansen.com/wuss/2008/ess/ess09.pdf

Super User
Posts: 10,028

Re: Proc Means or other procedure for mean/quartiles

Or Maybe you can check PROC RANKS , which might be your demand .

Frequent Contributor
Posts: 80

Re: Proc Means or other procedure for mean/quartiles

PROC RANK with the percent option, and then use if logic to identify the quartiles in a data step.

Ex.

Proc rank data=data1 out=data2 percent;

     var quartile-variable;

     ranks percentile;

run;

data data3;

     set data2;

     if percentile < .25 then quartile=1;

     else if percentile < .5 then quartile = 2;

     else if percentile < .75 then quartile = 3;

     else quartile=4;

run;

Regular Contributor
Posts: 216

Re: Proc Means or other procedure for mean/quartiles

Proc rank will not necessarily work because the percent option only calculates percents for non-missing values: "Percent: divides each rank by the number of observations that have nonmissing values of the variable and multiplies the result by 100 to get a percentage...."

While this makes sense, I need to factor in missing values in determining a median.

Paul

Frequent Contributor
Posts: 80

Re: Proc Means or other procedure for mean/quartiles

Ok, I see what you're saying. Does it matter where those missing ones are ranked? Could you set all missing values to the minimum value so that they are all put into the lowest quartile? Then at the 25%, 50%, etc. you will be at a point where you have that same % of the observations... just skewed up above their actual relative position when only compared to other non-missing.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 12748 views
  • 3 likes
  • 4 in conversation