BookmarkSubscribeRSS Feed
braam
Quartz | Level 8

Is there a way to check whether there is a new person in my grouped data?

 

HospitalID PersonID Year

1 1 2005

1 2 2005

1 3 2005

1 1 2006

1 2 2006

1 3 2006

1 4 2006

2 5 2005

2 6 2005

2 7 2005

2 5 2006

2 6 2006

2 8 2006

3 9 2005 

3 10 2005

3 9 2006 

3 10 2006

 As shown above, there were 1, 2, and 3 in hospital A in year 2005. There is a new person, 4, in year 2006. In this case, I would like to give a value of 1 to observations of hospital 1 and year 2006 (i.e., 1 for the observations of groups where there is a new person).

In hospital 2, there are 5, 6, and 7 in year 2005. But next year, 7 left while 8 joined. In this case, I would like to assign missing to the observations of this group (i.e., hospital 2 - year 2006).

In hospital 3, there are 9 and 10 in years 2005 and 2006. For the observations of this group, I would like to assign 0 to the observations of this group.

 

The resulting dataset would be either at hospital-year level or at hospital-person-yearl level, whichever is OK for me. Is there a simple way to make this dataset? Thanks in advance!

 

9 REPLIES 9
novinosrin
Tourmaline | Level 20

HI @braam  In addition to the sentences that explains the logic, can you please post the expected output for the input sample posted. Thank you!

ballardw
Super User

An important question is how do you know which are the "old" observations?

Do you have a data set with the old groups in it somewhere?

braam
Quartz | Level 8
I meant, old observations are those in the previous year.
Let's say, observations in year 2005 serve as "old" observations for those in year 2006.
ballardw
Super User

@braam wrote:
I meant, old observations are those in the previous year.
Let's say, observations in year 2005 serve as "old" observations for those in year 2006.

Previous to what? What in the data tells what is a "new" obs?

Listing an example does not provide a rule. You know your data, presumably, and so can tell. But we don't and to program a process we need to know the rule(s) involved that tell us what is new.

Since all of the example years you have shown so far are more than 10 years old compared to the current date everything looks "old".

So, are  you talking about the latest year in the data for a hospital is "new"? or Hospital PersonID the latest year is the "new"?

Or is a specific year, one that you designate in the code for example, to be considered the "new" year?

braam
Quartz | Level 8

Thanks for your reply. I added some blank lines between different hospitals to read it easily.

 

HospitalID PersonID Year

1 1 2005

1 2 2005

1 3 2005

1 1 2006

1 2 2006

1 3 2006

1 4 2006

 

2 5 2005

2 6 2005

2 7 2005

2 5 2006

2 6 2006

2 8 2006

 

3 9 2005 

3 10 2005

3 9 2006 

3 10 2006

 

My expected output is:

HopsitalID Year NewPerson

1 2005 .

1 2006 1

2 2005 .

2 2006 .

3 2005 .

3 2006 0

 

For the first hospital, there is a clearly new person in year 2006. So 1 would be assigned to NewPerson in year 2006.

For the second hospital, there is a new person coming in and another outgoing. So missing would be assigned to NewPerson in year 2006.

For the last hospital, there is no change. So 0 would be assigned to NewPerson in 2006.

 

Just so you know, I have a much bigger panel data. Thanks again.

 

novinosrin
Tourmaline | Level 20

So can we assume the following the rules-

1. If there's a change i.e. an addition of 1 or more patients in subsequent years without any drop in existing patients, then NewPerson flag is 1

2. If there's a change whereby the situation reflects drop in some existing patients and concurrently there are some additional new patients, then NewPerson flag is missing

3. If there's NO change, then NewPerson flag is 0

 

Can you please review and let us know all scenarios to account for?

braam
Quartz | Level 8
Thanks for your clarification. Your description is what I'm looking for.
novinosrin
Tourmaline | Level 20

Hi @braam  Your clarification helps. Thanks! Please try the below and see if it works-


data have;
input HospitalID PersonID Year;
cards;
1 1 2005

1 2 2005

1 3 2005

1 1 2006

1 2 2006

1 3 2006

1 4 2006

2 5 2005

2 6 2005

2 7 2005

2 5 2006

2 6 2006

2 8 2006

3 9 2005 

3 10 2005

3 9 2006 

3 10 2006
;

data want;
 if _n_=1 then do;
  dcl hash H () ;
  h.definekey  ("PersonID") ;
  h.definedone () ;
 end;
 do _n_=1 by 1 until(last.HospitalID);
  h_items=h.num_items;
  do until(last.year);
   set have;
   by HospitalID year;
   if _n_>1 then if  h.check()=0 then h_items=h_items-1;
   else NewPerson=1;
   h.replace();
  end;
  if _n_>1 then if  h_items then NewPerson=.;
  else if NewPerson=. then NewPerson=0; 
  output;
 end;
 h.clear();
 keep HospitalID year NewPerson;
run;
mkeintz
PROC Star

Let's restate the objective, to:  For each hospital create 3 variables N_KEPT (number of persons in both 2005 and 2006), N_ADDED (Number in 2006 and not in 2005), and N_DROPPED (you can guess).

 

And it just so happens I have suggested code for my suggested redefinition of the task:

 

data want (keep=hospitalid n_: );
  merge have (where=(year=2005) in=in2005)
        have (where=(year=2006) in=in2006);
  by hospitalid personid;

  if in2005=in2006 then n_kept+1;
  else if in2006=0 then n_dropped+1;
  else if in2005=0 then n_added+1;

  if last.hospitalid;
  output;
  call missing(of n_:);
run;

This program assumes you data are sorted by hospitalid/year/personid. 

 

The reason there is a call missing is to clear out the accumulations of N_dropped, N_added, N_kept at the end of one hospital, before starting the next hospital.  And because there is an explicit call missing of these news variables, it has to be preceded by an explicit OUTPUT statement.  Otherwise the implicit output action just before the RUN statement would be outputting missing values instead of the calculated totals.

 

Question: do you really just have the same two years for all hospitals? 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 902 views
  • 0 likes
  • 4 in conversation