Solved: Procedure or data step to create a data set based on reviewing an exis...

Paul_NYS · Posted 12-21-2012 09:05 AM

Hi

I have a data set that contains the number of days it takes to reach an event (Time to Disposition) along with county and year values, and many other things. Not all records have a value populated for reaching Time to Disposition, as it has not happened yet.

What I want to do is to create a new data set based on this one that only has records for the years/counties where 50% of records have a value populated for Time to Disposition for the year/county combination. So if a year/county combination have at least 50% of their records with a value for Time to Disposition, then all its records would be included (even those that do have a Time to Disposition value). The records that would be excluded are those year/county combinations where 50% of their records have not reached Time to Disposition (all their records would be excluded).

I would rather not use Proc SQL for this, so I was wondering if there is another Proc or data step that might do this?

Paul

Astounding · Posted 12-21-2012 04:10 PM

You can add to the list of statistics, for example:

output out=test11 median= q1= q3= nmiss= /autoname;

The NMISS statistic is number of observations having a missing value. Also part of the output data set, _FREQ_ is the total number of observations for the county/year. So you would have the information needed to compute % missing.

View solution in original post

PaigeMiller · Posted 12-21-2012 09:14 AM

PROC MEANS/PROC SUMMARY can determine the number of missing and non-missing values in each year/country. Then you merge the output of the PROC in with the original data, and then select any record where there are more non-missing than missing.

--
Paige Miller

Paul_NYS · Posted 12-21-2012 09:20 AM

How do I output from PROC Means to a data set? There is no 'out' option.

Paul

PaigeMiller · Posted 12-21-2012 10:05 AM

Paul-NYS wrote:

How do I output from PROC Means to a data set? There is no 'out' option.

Paul

There is an output statement that you would want to use.

--
Paige Miller

Paul_NYS · Posted 12-21-2012 12:53 PM

Thanks Paige. I ran the below and did not get either the median or the quartiles in the output and am not getting an error either. I am getting the mean, st dev, min,. max, N.

Paul

proc means data=issuejoinedNNNAsort median q1 q3;

var dur_filing_issuejoin;

by cnty_name filingYear;

output out=test11;

run;

ballardw · Posted 12-21-2012 01:13 PM

output out=test11 median= q1= q3= /autoname;

Reeza · Posted 12-21-2012 01:14 PM

This doesn't handle the 50% issue. And haven't you already asked this? Or someone asked something very similar on here recently.

proc means data=issuejoinedNNNAsort noprint;

var dur_filing_issuejoin;

by cnty_name filingYear;

output out=test11 median= q1= q3= /autoname;

run;

PaigeMiller · Posted 12-21-2012 01:25 PM

Yes, it sure looks like Paul has changed problems in the middle of the thread, the issue involving 50% of the data being non-missing has disappeared.

--
Paige Miller

Astounding · Posted 12-21-2012 04:10 PM

You can add to the list of statistics, for example:

output out=test11 median= q1= q3= nmiss= /autoname;

The NMISS statistic is number of observations having a missing value. Also part of the output data set, _FREQ_ is the total number of observations for the county/year. So you would have the information needed to compute % missing.

Paul_NYS · Posted 12-24-2012 12:38 PM

Thanks Astounding. The above is very easy and does exactly what is needed.

Paul

Paul_NYS · Posted 12-21-2012 05:10 PM

Paige-I haven't changed problems mid-thread, I haven't gotten to the 50% issue yet. I was still working on getting the proc means to work. Once that is done, then I am going to look at what you indicated above and Hai's response actually.

Reeza- I had asked a month or so ago about proc lifetest and its ability to calculate medians based on entire populations-missing or not. I don't know if this is what you mean.

Haikuo · Posted 12-21-2012 09:52 AM

IMHO, Proc SQL would be the most straightforward approach to tackle it, although may not be the most efficient. Else, datastep 2XDOW will get what want, but requiring 2X passes; Array() will do with 1 pass, but not as robust, as it needs to predefine the dimension which is unknown in your case. Following is to show using Hash() as a possible approach with 1X pass (vs. 2XDOW) and true dynamic (vs. Array):

data sample;

set sashelp.class (keep=name age sex);

if ranuni(1)>0.5 then call missing (age);

run;

proc sort data=sample;

by sex age;run;

data want;

declare hash h(multidata:'y');

h.definekey('sex');

h.definedata('name','sex','age');

h.definedone();

declare hiter hi('h');

_i=0;

do _n=1 by 1 until (last.sex);

set sample;

by sex;

if missing(age) then _i+1;

h.add();

end;

if _i/_n<=0.5 then do;

_rc=hi.first();

do _rc=0 by 0 while (_rc=0);

output;

_rc=hi.next();

end;

drop _:;

run;

Haikuo

Paul_NYS · Posted 12-21-2012 05:14 PM

Hi Haikuo

Thanks for the response and I have not used Hash () objects before and your code above is somewhat beyond where I am, but will work through it and post back.

PaigeMiller · Posted 12-21-2012 01:02 PM

The statistics you want in the output data set are specified in the OUTPUT statement.

--
Paige Miller

Procedure or data step to create a data set based on reviewing an existing one

Re: Procedure or data step to create a data set based on reviewing an existing one

Re: Procedure or data step to create a data set based on reviewing an existing one

Re: Procedure or data step to create a data set based on reviewing an existing one

Re: Procedure or data step to create a data set based on reviewing an existing one

Re: Procedure or data step to create a data set based on reviewing an existing one

Re: Procedure or data step to create a data set based on reviewing an existing one

Re: Procedure or data step to create a data set based on reviewing an existing one

Re: Procedure or data step to create a data set based on reviewing an existing one

Re: Procedure or data step to create a data set based on reviewing an existing one

Re: Procedure or data step to create a data set based on reviewing an existing one

Re: Procedure or data step to create a data set based on reviewing an existing one

Re: Procedure or data step to create a data set based on reviewing an existing one

Re: Procedure or data step to create a data set based on reviewing an existing one

Re: Procedure or data step to create a data set based on reviewing an existing one

Registration is open

SAS Training: Just a Click Away