turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- BI
- /
- Enterprise Guide
- /
- EG "Describe" output

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-26-2012 10:39 AM

Hi

This is driving me up the wall and I am not sure how to describe it. When I have a query result that I want to create a median for, I used something under the "Describe" drop down. I thought it was either "Summary Statistics Wizard" or "Summary Tables Wizard". Regardless, when this ran, it generated an SAS report output in the "Results" tab, however, I believe there was a configuration that also generated an output that could be exported into a .csv or .xls format--this was in a tab prior to the Results tab and was in a text-like format. This allowed you to export it into .xls or .csv. You cannot export the SAS report output in the Result tab to .xls or .csv.

I cannot seem to find how I did this. This probably does not make sense, but I am trying to export the result of a running a median over a data set into Excel and there was a way to do this.

Does anyone know how to do this?

Paul

Accepted Solutions

Solution

10-26-2012
10:56 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Paul_NYS

10-26-2012 10:56 AM

Paul,

I think you are referring to the SummaryStatisticsWizard. It gives you an option, on its third screen, to check off whether you want the statstistics saved to a data set.

If you have that checked, the created file can be exported.

All Replies

Solution

10-26-2012
10:56 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Paul_NYS

10-26-2012 10:56 AM

Paul,

I think you are referring to the SummaryStatisticsWizard. It gives you an option, on its third screen, to check off whether you want the statstistics saved to a data set.

If you have that checked, the created file can be exported.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to art297

10-27-2012 02:12 PM

Art is correct, and there are a few more options. From the Describe tab, you can use either Summary Statistics or Summary Tables, depending on the type of results you need (using either the Wizard or just the plain old task on either is a matter of preference...I find the Wizard better for new users, as it has more handholding, but if you already know how to get there, you'll get your request done faster with the non-wizard task).

If you're saving results to a dataset, in Summary Statistics you can untick "Show statistics" and you won't get report output. Summary Tables doesn't have the equivalent capability.

To be complete, you can also get medians from List Report and Distribution Analysis, but they're somewhat specialized. I agree with Art that your best bet is Summary Statistics.

Tom

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to TomKari

10-29-2012 10:59 AM

Yes, that is it. Thank you both.

One follow up, I see that for generating the medians in Summary Statistics, the Proc means procedure is used. I believe this does not include missing values in determining the median? Only populated values? If so, is there a way to use proc means to factor in missing values into median calculations?

Paul

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Paul_NYS

10-29-2012 11:34 AM

For an even number of records the median is the average of the middle two values. It is possible, then, for that value to be the average of two missing values or one missing and one non-missing value. You would have to decide what value the missing data should represent.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Paul_NYS

10-30-2012 10:34 AM

I'd suggest posting that question to the Statistical Procedures category. My feeling is that you're on very thin statistical ice trying to use missing data to calculate a median, but this discussion forum has people with much stronger statistical chops than me.

Tom

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to TomKari

10-30-2012 12:52 PM

Not so bad if the missing values are all either less than the lowest value, due to some sort of truncation. Poor if the missing values are truly missing at random, as they will all sort into the lowest percentiles, resulting in a bias of the median to a value lower than the true measure of central tendency. And absolutely wrong if the missing values are all more than the greatest value, as these will now transform high unobserved values to low ranks.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

10-30-2012 01:13 PM

The situation I have is getting a median for a group of kids who experience a specific court event. We know what and who the overall population of the kids is--that number is fixed. What we want to do is to present the median value for the overall population for the time it takes 50% of these kids to reach this specific court event when 50% of the kids have reached it.

We know that we obviously don't know how long it will take the other 50% of these kids to reach the event, but that is for another statistic--maybe the 3rd percentile.

Regardless, I am looking for a way to take a median for a population where the median incorporates missing values in its calculation--if 50% of observations have occurred, then the median can be presented. If 50% have not occurred, then the median cannot be calculated.

I understand the lifetest works for this, but I am having trouble with this and was wondering if the median calc. can be configured to accommodate missing values?

Paul

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Paul_NYS

10-31-2012 08:47 AM

That sounds like a survival analysis, with censoring (drop outs). You might want to look at PROC LIFETEST. A log-rank test of equality of medians is included in the Kaplan-Meier analysis, and as I recall you get estimates of median time to event as well as first and third quartiles.

Might be useful.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Paul_NYS

10-31-2012 10:46 AM

Paul,

I know you are looking for an EG-specific solution, which is definitely not my forte, but I either misunderstand what you are trying to do or the problem is easier than requiring lifetest.

Don't you simply want to test whether a criterion was met by your population and then, if it has, obtain the median of those who had met the criterion?

If so, here is a way to do that using a macro:

data have;

input id event weeks;

cards;

1 1 5

2 . 2

3 1 7

4 . 4

5 1 8

6 1 9

7 . .

8 . .

;

run;

%macro run_if(infile=,var=,timevar=);

proc freq data=&infile. noprint;

tables &var./missing out=result (where=(event eq 1));

run;

proc sql noprint;

select percent into :result

from result

;

quit;

%if &result. ge 50 %then %do;

proc means data=&infile. median;

var &timevar.;

run;

%put Criterion Reached .. see Results;

%end;

%else %do;

%put Criterion Not Reached;

%end;

%mend run_if;

%run_if(infile=have,var=event,timevar=weeks)

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to art297

11-05-2012 07:56 AM

Hi Art

I believe you have identified the issue, but I don't completely understand all your code yet. The issue is that I have a population of kids in court that I need to find a median for. The total number of kids is known ahead of time and I want to capture a median when 50% of the kids have reached a certain court event--I can't capture the median based on just the number of kids who have reached the event prior 50% reaching it. I need a median for the entire population. So, when 50% have achieved the event, I can run the median.

I am new to SAS so I don't understand all your code above, but from what I understand, the second marco will run only when result set is ge 50%?

Paul

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Paul_NYS

11-05-2012 09:57 AM

Paul, You will have to decide what it is you want to see.

The code I proposed only shows the result if at least 50% of the population to meet the criterion. The result it shows, in turn, is the median time the population has been in the program .. regardless of whether they have or haven't met the criterion.

Is that what you are looking for or are you looking for something else?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Paul_NYS

11-05-2012 10:48 AM

As mentioned before you need PROC LIFETEST, Median in Proc Means/Summary/Univariate cannot be modified to account for the missing observations.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

11-05-2012 01:05 PM

I just did a quick test using PROC LIFETEST, where more than 50% of the observations were censored. The estimates of the quartiles and of the median were of the non-censored values, so that seems to knock it out as a possibility.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

11-05-2012 01:51 PM

If you limit it to where censoring is over 50% I think proc lifetest is still the better method, because it accounts for people who have been waiting longer than the median and not yet experienced the 'event' whereas proc means will ignore all of that.

Using the VALUNG sample dataset I checked the medians if you use proc lifetest and proc means:

The Median with Proc Lifetest is 122.0 Months vs Proc Means of 85.0 Months.

The results will vary with the randomization of course.

data valung2;

set valung;

rand=rand('normal');

if rand<0 then censor=1;

run;

proc lifetest data=valung2 plots=s;

time SurvTime*censor(0);

run;

proc sort data=valung2; by censor survtime; run;

proc means data=valung2 n mean median;

where censor=1;

var survtime;

run;