BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Paul_NYS
Obsidian | Level 7

Hi

This is driving me up the wall and I am not sure how to describe it. When I have a query result that I want to create a median for, I used something under the "Describe" drop down. I thought it was either "Summary Statistics Wizard" or "Summary Tables Wizard". Regardless, when this ran, it generated an SAS report output in the "Results" tab, however, I believe there was a configuration that also generated an output that could be exported into a .csv or .xls format--this was in a tab prior to the Results tab and was in a text-like format. This allowed you to export it into .xls or .csv. You cannot export the SAS report output in the Result tab to .xls or .csv.

I cannot seem to find how I did this. This probably does not make sense, but I am trying to export the result of a running a median over a data set into Excel and there was a way to do this.

Does anyone know how to do this?

Paul

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Paul,

I think you are referring to the SummaryStatisticsWizard.  It gives you an option, on its third screen, to check off whether you want the statstistics saved to a data set.

If you have that checked, the created file can be exported.

View solution in original post

14 REPLIES 14
art297
Opal | Level 21

Paul,

I think you are referring to the SummaryStatisticsWizard.  It gives you an option, on its third screen, to check off whether you want the statstistics saved to a data set.

If you have that checked, the created file can be exported.

TomKari
Onyx | Level 15

Art is correct, and there are a few more options. From the Describe tab, you can use either Summary Statistics or Summary Tables, depending on the type of results you need (using either the Wizard or just the plain old task on either is a matter of preference...I find the Wizard better for new users, as it has more handholding, but if you already know how to get there, you'll get your request done faster with the non-wizard task).

If you're saving results to a dataset, in Summary Statistics you can untick "Show statistics" and you won't get report output. Summary Tables doesn't have the equivalent capability.

To be complete, you can also get medians from List Report and Distribution Analysis, but they're somewhat specialized. I agree with Art that your best bet is Summary Statistics.

Tom

Paul_NYS
Obsidian | Level 7

Yes, that is it. Thank you both.

One follow up, I see that for generating the medians in Summary Statistics, the Proc means procedure is used. I believe this does not include missing values in determining the median? Only populated values? If so, is there a way to use proc means to factor in missing values into median calculations?

Paul

art297
Opal | Level 21

For an even number of records the median is the average of the middle two values.  It is possible, then, for that value to be the average of two missing values or one missing and one non-missing value.  You would have to decide what value the missing data should represent.

TomKari
Onyx | Level 15

I'd suggest posting that question to the Statistical Procedures category. My feeling is that you're on very thin statistical ice trying to use missing data to calculate a median, but this discussion forum has people with much stronger statistical chops than me.

Tom

SteveDenham
Jade | Level 19

Not so bad if the missing values are all either less than the lowest value, due to some sort of truncation.  Poor if the missing values are truly missing at random, as they will all sort into the lowest percentiles, resulting in a bias of the median to a value lower than the true measure of central tendency. And absolutely wrong if the missing values are all more than the greatest value, as these will now transform high unobserved values to low ranks.

Steve Denham

Paul_NYS
Obsidian | Level 7

The situation I have is getting a median for a group of kids who experience a specific court event. We know what and who the overall population of the kids is--that number is fixed. What we want to do is to present the median value for the overall population for the time it takes 50% of these kids to reach this specific court event when 50% of the kids have reached it.

We know that we obviously don't know how long it will take the other 50% of these kids to reach the event, but that is for another statistic--maybe the 3rd percentile.

Regardless, I am looking for a way to take a median for a population where the median incorporates missing values in its calculation--if 50% of observations have occurred, then the median can be presented. If 50% have not occurred, then the median cannot be calculated.

I understand the lifetest works for this, but I am having trouble with this and was wondering if the median calc. can be configured to accommodate missing values?

Paul

SteveDenham
Jade | Level 19

That sounds like a survival analysis, with censoring (drop outs).  You might want to look at PROC LIFETEST.  A log-rank test of equality of medians is included in the Kaplan-Meier analysis, and as I recall you get estimates of median time to event as well as first and third quartiles.

Might be useful.

Steve Denham

art297
Opal | Level 21

Paul,

I know you are looking for an EG-specific solution, which is definitely not my forte, but I either misunderstand what you are trying to do or the problem is easier than requiring lifetest.

Don't you simply want to test whether a criterion was met by your population and then, if it has, obtain the median of those who had met the criterion?

If so, here is a way to do that using a macro:

data have;

  input id event weeks;

  cards;

1 1 5

2 . 2

3 1 7

4 . 4

5 1 8

6 1 9

7 . .

8 . .

;

run;

%macro run_if(infile=,var=,timevar=);

  proc freq data=&infile. noprint;

    tables &var./missing out=result (where=(event eq 1));

  run;

  proc sql noprint;

    select percent into :result

      from result

    ;

  quit;

  %if &result. ge 50 %then %do;

    proc means data=&infile. median;

      var &timevar.;

    run;

    %put Criterion Reached .. see Results;

  %end;

  %else %do;

    %put Criterion Not Reached;

  %end;

%mend run_if;

%run_if(infile=have,var=event,timevar=weeks)

Paul_NYS
Obsidian | Level 7

Hi Art

I believe you have identified the issue, but I don't completely understand all your code yet. The issue is that I have a population of kids in court that I need to find a median for. The total number of kids is known ahead of time and I want to capture a median when 50% of the kids have reached a certain court event--I can't capture the median based on just the number of kids who have reached the event prior 50% reaching it. I need a median for the entire population. So, when 50% have achieved the event, I can run the median.

I am new to SAS so I don't understand all your code above, but from what I understand, the second marco will run only when result set is ge 50%?

Paul

art297
Opal | Level 21

Paul, You will have to decide what it is you want to see.

The code I proposed only shows the result if at least 50% of the population to meet the criterion.  The result it shows, in turn, is the median time the population has been in the program .. regardless of whether they have or haven't met the criterion.

Is that what you are looking for or are you looking for something else?

Reeza
Super User

As mentioned before you need PROC LIFETEST, Median in Proc Means/Summary/Univariate cannot be modified to account for the missing observations.

SteveDenham
Jade | Level 19

I just did a quick test using PROC LIFETEST, where more than 50% of the observations were censored.  The estimates of the quartiles and of the median were of the non-censored values, so that seems to knock it out as a possibility.

Steve Denham

Reeza
Super User

If you limit it to where censoring is over 50% I think proc lifetest is still the better method, because it accounts for people who have been waiting longer than the median and not yet experienced the 'event' whereas proc means will ignore all of that.

Using the VALUNG sample dataset I checked the medians if you use proc lifetest and proc means:

The Median with Proc Lifetest is 122.0 Months vs Proc Means of 85.0 Months.

The results will vary with the randomization of course.

data valung2;

    set valung;

    rand=rand('normal');

    if rand<0 then censor=1;

run;

proc lifetest data=valung2 plots=s;

    time SurvTime*censor(0);

run;

proc sort data=valung2; by censor survtime; run;

proc means data=valung2 n mean median;

where censor=1;

var survtime;

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1423 views
  • 0 likes
  • 5 in conversation