Desktop productivity for business analysts and programmers

EG "Describe" output

Accepted Solution Solved
Reply
Regular Contributor
Posts: 216
Accepted Solution

EG "Describe" output

Hi

This is driving me up the wall and I am not sure how to describe it. When I have a query result that I want to create a median for, I used something under the "Describe" drop down. I thought it was either "Summary Statistics Wizard" or "Summary Tables Wizard". Regardless, when this ran, it generated an SAS report output in the "Results" tab, however, I believe there was a configuration that also generated an output that could be exported into a .csv or .xls format--this was in a tab prior to the Results tab and was in a text-like format. This allowed you to export it into .xls or .csv. You cannot export the SAS report output in the Result tab to .xls or .csv.

I cannot seem to find how I did this. This probably does not make sense, but I am trying to export the result of a running a median over a data set into Excel and there was a way to do this.

Does anyone know how to do this?

Paul


Accepted Solutions
Solution
‎10-26-2012 10:56 AM
Esteemed Advisor
Posts: 7,294

Re: EG "Describe" output

Paul,

I think you are referring to the SummaryStatisticsWizard.  It gives you an option, on its third screen, to check off whether you want the statstistics saved to a data set.

If you have that checked, the created file can be exported.

View solution in original post


All Replies
Solution
‎10-26-2012 10:56 AM
Esteemed Advisor
Posts: 7,294

Re: EG "Describe" output

Paul,

I think you are referring to the SummaryStatisticsWizard.  It gives you an option, on its third screen, to check off whether you want the statstistics saved to a data set.

If you have that checked, the created file can be exported.

Trusted Advisor
Posts: 1,056

Re: EG "Describe" output

Art is correct, and there are a few more options. From the Describe tab, you can use either Summary Statistics or Summary Tables, depending on the type of results you need (using either the Wizard or just the plain old task on either is a matter of preference...I find the Wizard better for new users, as it has more handholding, but if you already know how to get there, you'll get your request done faster with the non-wizard task).

If you're saving results to a dataset, in Summary Statistics you can untick "Show statistics" and you won't get report output. Summary Tables doesn't have the equivalent capability.

To be complete, you can also get medians from List Report and Distribution Analysis, but they're somewhat specialized. I agree with Art that your best bet is Summary Statistics.

Tom

Regular Contributor
Posts: 216

Re: EG "Describe" output

Yes, that is it. Thank you both.

One follow up, I see that for generating the medians in Summary Statistics, the Proc means procedure is used. I believe this does not include missing values in determining the median? Only populated values? If so, is there a way to use proc means to factor in missing values into median calculations?

Paul

Esteemed Advisor
Posts: 7,294

Re: EG "Describe" output

For an even number of records the median is the average of the middle two values.  It is possible, then, for that value to be the average of two missing values or one missing and one non-missing value.  You would have to decide what value the missing data should represent.

Trusted Advisor
Posts: 1,056

Re: EG "Describe" output

I'd suggest posting that question to the Statistical Procedures category. My feeling is that you're on very thin statistical ice trying to use missing data to calculate a median, but this discussion forum has people with much stronger statistical chops than me.

Tom

Respected Advisor
Posts: 2,655

Re: EG "Describe" output

Not so bad if the missing values are all either less than the lowest value, due to some sort of truncation.  Poor if the missing values are truly missing at random, as they will all sort into the lowest percentiles, resulting in a bias of the median to a value lower than the true measure of central tendency. And absolutely wrong if the missing values are all more than the greatest value, as these will now transform high unobserved values to low ranks.

Steve Denham

Regular Contributor
Posts: 216

Re: EG "Describe" output

The situation I have is getting a median for a group of kids who experience a specific court event. We know what and who the overall population of the kids is--that number is fixed. What we want to do is to present the median value for the overall population for the time it takes 50% of these kids to reach this specific court event when 50% of the kids have reached it.

We know that we obviously don't know how long it will take the other 50% of these kids to reach the event, but that is for another statistic--maybe the 3rd percentile.

Regardless, I am looking for a way to take a median for a population where the median incorporates missing values in its calculation--if 50% of observations have occurred, then the median can be presented. If 50% have not occurred, then the median cannot be calculated.

I understand the lifetest works for this, but I am having trouble with this and was wondering if the median calc. can be configured to accommodate missing values?

Paul

Respected Advisor
Posts: 2,655

Re: EG "Describe" output

That sounds like a survival analysis, with censoring (drop outs).  You might want to look at PROC LIFETEST.  A log-rank test of equality of medians is included in the Kaplan-Meier analysis, and as I recall you get estimates of median time to event as well as first and third quartiles.

Might be useful.

Steve Denham

Esteemed Advisor
Posts: 7,294

Re: EG "Describe" output

Paul,

I know you are looking for an EG-specific solution, which is definitely not my forte, but I either misunderstand what you are trying to do or the problem is easier than requiring lifetest.

Don't you simply want to test whether a criterion was met by your population and then, if it has, obtain the median of those who had met the criterion?

If so, here is a way to do that using a macro:

data have;

  input id event weeks;

  cards;

1 1 5

2 . 2

3 1 7

4 . 4

5 1 8

6 1 9

7 . .

8 . .

;

run;

%macro run_if(infile=,var=,timevar=);

  proc freq data=&infile. noprint;

    tables &var./missing out=result (where=(event eq 1));

  run;

  proc sql noprint;

    select percent into :result

      from result

    ;

  quit;

  %if &result. ge 50 %then %do;

    proc means data=&infile. median;

      var &timevar.;

    run;

    %put Criterion Reached .. see Results;

  %end;

  %else %do;

    %put Criterion Not Reached;

  %end;

%mend run_if;

%run_if(infile=have,var=event,timevar=weeks)

Regular Contributor
Posts: 216

Re: EG "Describe" output

Hi Art

I believe you have identified the issue, but I don't completely understand all your code yet. The issue is that I have a population of kids in court that I need to find a median for. The total number of kids is known ahead of time and I want to capture a median when 50% of the kids have reached a certain court event--I can't capture the median based on just the number of kids who have reached the event prior 50% reaching it. I need a median for the entire population. So, when 50% have achieved the event, I can run the median.

I am new to SAS so I don't understand all your code above, but from what I understand, the second marco will run only when result set is ge 50%?

Paul

Esteemed Advisor
Posts: 7,294

Re: EG "Describe" output

Paul, You will have to decide what it is you want to see.

The code I proposed only shows the result if at least 50% of the population to meet the criterion.  The result it shows, in turn, is the median time the population has been in the program .. regardless of whether they have or haven't met the criterion.

Is that what you are looking for or are you looking for something else?

Grand Advisor
Posts: 17,358

Re: EG "Describe" output

As mentioned before you need PROC LIFETEST, Median in Proc Means/Summary/Univariate cannot be modified to account for the missing observations.

Respected Advisor
Posts: 2,655

Re: EG "Describe" output

I just did a quick test using PROC LIFETEST, where more than 50% of the observations were censored.  The estimates of the quartiles and of the median were of the non-censored values, so that seems to knock it out as a possibility.

Steve Denham

Grand Advisor
Posts: 17,358

Re: EG "Describe" output

If you limit it to where censoring is over 50% I think proc lifetest is still the better method, because it accounts for people who have been waiting longer than the median and not yet experienced the 'event' whereas proc means will ignore all of that.

Using the VALUNG sample dataset I checked the medians if you use proc lifetest and proc means:

The Median with Proc Lifetest is 122.0 Months vs Proc Means of 85.0 Months.

The results will vary with the randomization of course.

data valung2;

    set valung;

    rand=rand('normal');

    if rand<0 then censor=1;

run;

proc lifetest data=valung2 plots=s;

    time SurvTime*censor(0);

run;

proc sort data=valung2; by censor survtime; run;

proc means data=valung2 n mean median;

where censor=1;

var survtime;

run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 433 views
  • 0 likes
  • 5 in conversation