BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Paul_NYS
Obsidian | Level 7

Hi

I have a few questions on how to calculate a mean/quartile:

1. I have a data set that has two variables that I want to calculate the mean and other quartiles for. The below statement returns what appears to be the means and the first and third quartiles. From a code perspective, is this correct?

proc means data=s10sorted median q1 q3 stddev;

var dur_filing_issuejoin dur_issuejoin_dispo;

run;

2. In the columns for the above variables, I have many null values that I want to factor into the median/quart. In other words, I do not want to calculate a mean or quartile if 50% (or 75% for 3rd quartile) of the observations have not occurred yet. I am assuming the above procedure just calculates the mean/quart for the numbers that are there, so it doesn't care if the appropriate number of obs have occurred?

3. I need to export this to a spreadsheet and am unfamiliar with how to do this. I know proc print prints out to screen. Is there another way to generate a .csv, .txt, or .xls?

Paul

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Editor's Note: Thanks to @Reeza for providing the answer that addressed the original question and for pointing to an alternative using PROC LIFETEST. Most Base procedures, including PROC MEANS and PROC RANK, will exclude missing values before calculating statistics. Changing missing values to 0 in advance would not help in this case. If you don't want a statistic to be calculated if there are not enough non-missing values, you would need to check in advance for the number of non-missing values and then use macro logic to conditionally execute the code based on that number.

 

1. This is correct, but why would you calculate stddev but quartiles? Stdev is usually used with the mean?

2.  Does this mean you want to exclude your null values?  If they're coded as missing, SAS will exclude them, if you're looking at doing something else, like censoring you might want to consider proc lifetest instead but it depends on your data and your question.

3.

 

ods tagsets.excelxp file='Temp.xml';

 

proc means data=s10sorted median q1 q3 stddev;

var dur_filing_issuejoin dur_issuejoin_dispo;

run;

 

ods tagsets.excelxp close;

View solution in original post

9 REPLIES 9
Reeza
Super User

Editor's Note: Thanks to @Reeza for providing the answer that addressed the original question and for pointing to an alternative using PROC LIFETEST. Most Base procedures, including PROC MEANS and PROC RANK, will exclude missing values before calculating statistics. Changing missing values to 0 in advance would not help in this case. If you don't want a statistic to be calculated if there are not enough non-missing values, you would need to check in advance for the number of non-missing values and then use macro logic to conditionally execute the code based on that number.

 

1. This is correct, but why would you calculate stddev but quartiles? Stdev is usually used with the mean?

2.  Does this mean you want to exclude your null values?  If they're coded as missing, SAS will exclude them, if you're looking at doing something else, like censoring you might want to consider proc lifetest instead but it depends on your data and your question.

3.

 

ods tagsets.excelxp file='Temp.xml';

 

proc means data=s10sorted median q1 q3 stddev;

var dur_filing_issuejoin dur_issuejoin_dispo;

run;

 

ods tagsets.excelxp close;

Paul_NYS
Obsidian | Level 7

Thanks Reeza.

1. I was experimenting with proc means, so stdev was just part of that.

2. No, I want the null values to be factored in. What I want to do is to be sure the median and quartile calculations are done when 50% or 75% of the observations have actually occurred. I don't want to calculate the either based on SAS just calculating it for the values that are present.

So if I have 10 potential observations and only 4 of them have occurred as of a certain date, I should not be able to calculate a median for that variable. The way proc means seems to work is that it will calculate a median for the 4 present values and exclude the 6 that are missing.

Now if there were 5 observations (out of 10), I should be able to calculate a median because half of the observations have occurred and it doesn't matter what the remaining values are, they are already past the median point.

So I am wondering if this is something that must be coded or whether there is a configuration or another procedure that accomplishes this?

3. Thank you. I will try this.

Paul

Reeza
Super User

I think you need to look into survival analysis and the Kaplan Meier Curve and see if its what you're looking for.

The related SAS procedure is proc lifetest, if you provide a bit about the subject matter I can help direct you to a relevant resource.

There are slight difference in terms when looking at churn models for sales, survival of patients and failure of products in a manufacturing industry.

Paul_NYS
Obsidian | Level 7

Yes, can you please point me to the lifetest resource?

Also, on the function to export results of this into a file, the below code is part of a much larger set of code. Is there parameter that just points to a specific PROC or DATA section of code and not the whole thing?

Paul

ods tagsets.excelxp file='Temp.xml';

proc means data=s10sorted median q1 q3 stddev;

var dur_filing_issuejoin dur_issuejoin_dispo;

run;

ods tagsets.excelxp close;

Reeza
Super User

Survival/Failure Time Analysis

Introduction to Survival Analysis with SAS Seminar

You can create an output data set from proc means and then export that out instead using proc means.

Here's a paper on that. If you have 9.3 you may want to look into the stackODS option as well.

http://www.lexjansen.com/wuss/2008/ess/ess09.pdf

Ksharp
Super User

Or Maybe you can check PROC RANKS , which might be your demand .

cau83
Pyrite | Level 9

PROC RANK with the percent option, and then use if logic to identify the quartiles in a data step.

Ex.

Proc rank data=data1 out=data2 percent;

     var quartile-variable;

     ranks percentile;

run;

data data3;

     set data2;

     if percentile < .25 then quartile=1;

     else if percentile < .5 then quartile = 2;

     else if percentile < .75 then quartile = 3;

     else quartile=4;

run;

Paul_NYS
Obsidian | Level 7

Proc rank will not necessarily work because the percent option only calculates percents for non-missing values: "Percent: divides each rank by the number of observations that have nonmissing values of the variable and multiplies the result by 100 to get a percentage...."

While this makes sense, I need to factor in missing values in determining a median.

Paul

cau83
Pyrite | Level 9

Ok, I see what you're saying. Does it matter where those missing ones are ranked? Could you set all missing values to the minimum value so that they are all put into the lowest quartile? Then at the 25%, 50%, etc. you will be at a point where you have that same % of the observations... just skewed up above their actual relative position when only compared to other non-missing.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 46896 views
  • 3 likes
  • 4 in conversation