## Procedure or data step to create a data set based on reviewing an existing one

Solved
Regular Contributor
Posts: 216

# Procedure or data step to create a data set based on reviewing an existing one

Hi

I have a data set that contains the number of days it takes to reach an event (Time to Disposition) along with county and year values, and many other things. Not all records have a value populated for reaching Time to Disposition, as it has not happened yet.

What I want to do is to create a new data set based on this one that only has records for the years/counties where 50% of records have a value populated for Time to Disposition for the year/county combination. So if a year/county combination have at least 50% of their records with a value for Time to Disposition, then all its records would be included (even those that do have a Time to Disposition value). The records that would be excluded are those year/county combinations where 50% of their records have not reached Time to Disposition (all their records would be excluded).

I would rather not use Proc SQL for this, so I was wondering if there is another Proc or data step that might do this?

Paul

Accepted Solutions
Solution
‎12-21-2012 04:10 PM
Super User
Posts: 6,754

## Re: Procedure or data step to create a data set based on reviewing an existing one

Posted in reply to PaigeMiller

You can add to the list of statistics, for example:

output out=test11 median= q1= q3= nmiss= /autoname;

The NMISS statistic is number of observations having a missing value.  Also part of the output data set, _FREQ_ is the total number of observations for the county/year.  So you would have the information needed to compute % missing.

All Replies
Respected Advisor
Posts: 2,989

## Re: Procedure or data step to create a data set based on reviewing an existing one

PROC MEANS/PROC SUMMARY can determine the number of missing and non-missing values in each year/country. Then you merge the output of the PROC in with the original data, and then select any record where there are more non-missing than missing.

--
Paige Miller
Regular Contributor
Posts: 216

## Re: Procedure or data step to create a data set based on reviewing an existing one

Posted in reply to PaigeMiller

How do I output from PROC Means to a data set? There is no 'out' option.

Paul

Respected Advisor
Posts: 2,989

## Re: Procedure or data step to create a data set based on reviewing an existing one

```Paul-NYS wrote:

How do I output from PROC Means to a data set? There is no 'out' option.

Paul
```

There is an output statement that you would want to use.

--
Paige Miller
Regular Contributor
Posts: 216

## Re: Procedure or data step to create a data set based on reviewing an existing one

Posted in reply to PaigeMiller

Thanks Paige. I ran the below and did not get either the median or the quartiles in the output and am not getting an error either. I am getting the mean, st dev, min,. max, N.

Paul

proc means data=issuejoinedNNNAsort median q1 q3;

var dur_filing_issuejoin;

by cnty_name filingYear;

output out=test11;

run;

Super User
Posts: 13,508

## Re: Procedure or data step to create a data set based on reviewing an existing one

output out=test11 median= q1= q3= /autoname;

Super User
Posts: 23,683

## Re: Procedure or data step to create a data set based on reviewing an existing one

This doesn't handle the 50% issue. And haven't you already asked this? Or someone asked something very similar on here recently.

proc means data=issuejoinedNNNAsort noprint;

var dur_filing_issuejoin;

by cnty_name filingYear;

output out=test11 median= q1= q3= /autoname;

run;

Respected Advisor
Posts: 2,989

## Re: Procedure or data step to create a data set based on reviewing an existing one

Yes, it sure looks like Paul has changed problems in the middle of the thread, the issue involving 50% of the data being non-missing has disappeared.

--
Paige Miller
Solution
‎12-21-2012 04:10 PM
Super User
Posts: 6,754

## Re: Procedure or data step to create a data set based on reviewing an existing one

Posted in reply to PaigeMiller

You can add to the list of statistics, for example:

output out=test11 median= q1= q3= nmiss= /autoname;

The NMISS statistic is number of observations having a missing value.  Also part of the output data set, _FREQ_ is the total number of observations for the county/year.  So you would have the information needed to compute % missing.

Regular Contributor
Posts: 216

## Re: Procedure or data step to create a data set based on reviewing an existing one

Posted in reply to Astounding

Thanks Astounding. The above is very easy and does exactly what is needed.

Paul

Regular Contributor
Posts: 216

## Re: Procedure or data step to create a data set based on reviewing an existing one

Posted in reply to PaigeMiller

Paige-I haven't changed problems mid-thread, I haven't gotten to the 50% issue yet. I was still working on getting the proc means to work. Once that is done, then I am going to look at what you indicated above and Hai's response actually.

Reeza- I had asked a month or so ago about proc lifetest and its ability to calculate medians based on entire populations-missing or not. I don't know if this is what you mean.

Respected Advisor
Posts: 3,167

## Re: Procedure or data step to create a data set based on reviewing an existing one

IMHO, Proc SQL would be the most straightforward approach to tackle it, although may not be the most efficient. Else, datastep 2XDOW will get what want, but requiring 2X passes; Array() will do with 1 pass, but not as robust, as it needs to predefine the dimension which is unknown in your case. Following is to show using Hash() as a possible approach with 1X pass (vs. 2XDOW) and true dynamic (vs. Array):

data sample;

set sashelp.class (keep=name age sex);

if ranuni(1)>0.5 then call missing (age);

run;

proc sort data=sample;

by sex age;run;

data want;

declare hash h(multidata:'y');

h.definekey('sex');

h.definedata('name','sex','age');

h.definedone();

declare hiter hi('h');

_i=0;

do _n=1 by 1 until (last.sex);

set sample;

by sex;

if missing(age) then _i+1;

h.add();

end;

if _i/_n<=0.5 then do;

_rc=hi.first();

do _rc=0 by 0 while (_rc=0);

output;

_rc=hi.next();

end;

end;

drop _:;

run;

Haikuo

Regular Contributor
Posts: 216

## Re: Procedure or data step to create a data set based on reviewing an existing one

Hi Haikuo

Thanks for the response and I have not used Hash () objects before and your code above is somewhat beyond where I am, but will work through it and post back.

Respected Advisor
Posts: 2,989

## Re: Procedure or data step to create a data set based on reviewing an existing one

The statistics you want in the output data set are specified in the OUTPUT statement.

--
Paige Miller
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
• 13 replies
• 352 views
• 0 likes
• 6 in conversation