BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Paul_NYS
Obsidian | Level 7

Hi

I have a data set that contains the number of days it takes to reach an event (Time to Disposition) along with county and year values, and many other things. Not all records have a value populated for reaching Time to Disposition, as it has not happened yet.

What I want to do is to create a new data set based on this one that only has records for the years/counties where 50% of records have a value populated for Time to Disposition for the year/county combination. So if a year/county combination have at least 50% of their records with a value for Time to Disposition, then all its records would be included (even those that do have a Time to Disposition value). The records that would be excluded are those year/county combinations where 50% of their records have not reached Time to Disposition (all their records would be excluded).

I would rather not use Proc SQL for this, so I was wondering if there is another Proc or data step that might do this?

Paul

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

You can add to the list of statistics, for example:

output out=test11 median= q1= q3= nmiss= /autoname;

The NMISS statistic is number of observations having a missing value.  Also part of the output data set, _FREQ_ is the total number of observations for the county/year.  So you would have the information needed to compute % missing.

View solution in original post

13 REPLIES 13
PaigeMiller
Diamond | Level 26

PROC MEANS/PROC SUMMARY can determine the number of missing and non-missing values in each year/country. Then you merge the output of the PROC in with the original data, and then select any record where there are more non-missing than missing.

--
Paige Miller
Paul_NYS
Obsidian | Level 7

How do I output from PROC Means to a data set? There is no 'out' option.

Paul

PaigeMiller
Diamond | Level 26

Paul-NYS wrote:

How do I output from PROC Means to a data set? There is no 'out' option.

Paul

There is an output statement that you would want to use.

--
Paige Miller
Paul_NYS
Obsidian | Level 7

Thanks Paige. I ran the below and did not get either the median or the quartiles in the output and am not getting an error either. I am getting the mean, st dev, min,. max, N.

Paul

proc means data=issuejoinedNNNAsort median q1 q3;

var dur_filing_issuejoin;

by cnty_name filingYear;

output out=test11;

run;

ballardw
Super User

output out=test11 median= q1= q3= /autoname;

Reeza
Super User

This doesn't handle the 50% issue. And haven't you already asked this? Or someone asked something very similar on here recently.

proc means data=issuejoinedNNNAsort noprint;

var dur_filing_issuejoin;

by cnty_name filingYear;

output out=test11 median= q1= q3= /autoname;

run;

PaigeMiller
Diamond | Level 26

Yes, it sure looks like Paul has changed problems in the middle of the thread, the issue involving 50% of the data being non-missing has disappeared.

--
Paige Miller
Astounding
PROC Star

You can add to the list of statistics, for example:

output out=test11 median= q1= q3= nmiss= /autoname;

The NMISS statistic is number of observations having a missing value.  Also part of the output data set, _FREQ_ is the total number of observations for the county/year.  So you would have the information needed to compute % missing.

Paul_NYS
Obsidian | Level 7

Thanks Astounding. The above is very easy and does exactly what is needed.

Paul

Paul_NYS
Obsidian | Level 7

Paige-I haven't changed problems mid-thread, I haven't gotten to the 50% issue yet. I was still working on getting the proc means to work. Once that is done, then I am going to look at what you indicated above and Hai's response actually.

Reeza- I had asked a month or so ago about proc lifetest and its ability to calculate medians based on entire populations-missing or not. I don't know if this is what you mean.

Haikuo
Onyx | Level 15

IMHO, Proc SQL would be the most straightforward approach to tackle it, although may not be the most efficient. Else, datastep 2XDOW will get what want, but requiring 2X passes; Array() will do with 1 pass, but not as robust, as it needs to predefine the dimension which is unknown in your case. Following is to show using Hash() as a possible approach with 1X pass (vs. 2XDOW) and true dynamic (vs. Array):

data sample;

set sashelp.class (keep=name age sex);

  if ranuni(1)>0.5 then call missing (age);

run;

proc sort data=sample;

by sex age;run;

data want;

  declare hash h(multidata:'y');

  h.definekey('sex');

  h.definedata('name','sex','age');

  h.definedone();

  declare hiter hi('h');

  _i=0;

do _n=1 by 1 until (last.sex);

  set sample;

by sex;

if missing(age) then _i+1;

  h.add();

end;

if _i/_n<=0.5 then do;

  _rc=hi.first();

do _rc=0 by 0 while (_rc=0);

output;

_rc=hi.next();

end;

  end;

  drop _:;

  run;

Haikuo

Paul_NYS
Obsidian | Level 7

Hi Haikuo

Thanks for the response and I have not used Hash () objects before and your code above is somewhat beyond where I am, but will work through it and post back.

PaigeMiller
Diamond | Level 26

The statistics you want in the output data set are specified in the OUTPUT statement.

--
Paige Miller

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1040 views
  • 0 likes
  • 6 in conversation