DATA Step, Macro, Functions and more

Merge/left join onto certain rows

Reply
Occasional Contributor
Posts: 6

Merge/left join onto certain rows

 

Here is the scenario: I have a dataset with hospital identifiers, years, and a cost to charge ratio. For the year 2003, no cost to charge ratio is available. I need to take the ratio from 2004, multipy it by 1.04, and then merge by hospital ID to get cost to charge ratios for 2003. It might make more sense if you look at the code--I want to go from old to new.

 

data old;

infile datalines;

input Hospid $ Year Ratio;

datalines;

A 2003 .

B 2003 .

C 2003 .

C 2003 .

A 2004 .2

B 2004 .24

C 2004 .3

;

run;

 

and I want to get new by taking the ratio from 2004, multiplying it by 1.04 and then matching it by hospid:

 

data new;

infile datalines;

input hospid $ year ratio;

datalines;

A 2003 .208

B 2003 .2496

C 2003 .312

C 2003 .312

A 2004 .2

B 2004 .24

C 2004 .3

;

run;

 

I have tried two methods:

1. a proc sql left join, in which I made a separate dataset of 2004 observations and then tried to join by hospid. that still gave me missing values

2. also making a separate dataset of 2004 observations, sorting that and the original dataset, then merging by hospid. that mostly worked, except when hospital id appeared more than once in which case i only got a ratio for one of the occurances.

 

thanks for any advice! 

 

 

 

 

PROC Star
Posts: 253

Re: Merge/left join onto certain rows

[ Edited ]

something like this hsould work for your scenario

proc sql;
create table new as select a.hospid, a.year, case when a.hospid=b.hospid and a.ratio is missing then b.ratio *1.04 else a.ratio end as ratio from (select hospid, year, ratio from old)a left join (select distinct hospid, ratio from old where ratio is not missing)b on a.hospid =b.hospid; ;

 

Occasional Contributor
Posts: 6

Re: Merge/left join onto certain rows

When I tried this I ended up with more observations than in the original dataset

Super Contributor
Posts: 268

Re: Merge/left join onto certain rows

Your only matching criteria is ID but ID is not unique to each observation. Am I reading your data set correctly?

Occasional Contributor
Posts: 6

Re: Merge/left join onto certain rows

I think so, there can be more than one of the same hospital ID within a given year.
Super User
Posts: 5,084

Re: Merge/left join onto certain rows

There are too many cases that you haven't covered here.  What should the results be for these combinations?

 

data old;

infile datalines;

input Hospid $ Year Ratio;

datalines;

A 2003 .

A 2003 .

B 2003 .

C 2003 .

C 2003 .

A 2004 .2

A 2004 .3

B 2004 .24

B 2004 .26

C 2004 .3

;

 

Anything is possible.  But the rules are incomplete at this point.

Occasional Contributor
Posts: 6

Re: Merge/left join onto certain rows

Sorry, should have been more clear. There would not be a case in which the ratio would be different for two instances of the same hospital.

So this:
A 2004 .3
A 2004 .2

Would never happen. However this could happen:
A 2004 .3
A 2005 .2

But I don't care about years besides 2003 and 2004, I want those years to be unaffected
Super User
Posts: 5,084

Re: Merge/left join onto certain rows

[ Edited ]

I'm not sure if this would be the fastest way, but it's probably the clearest way.

 

data just2003 just2004 other_years;

   set old;

   if year=2003 then output just2003;

   else if year = 2004 then output just2004;

   else output other_years;

run;

 

proc sort data=just2003;

   by HospID;

run;

 

proc sort data=just2004 out=ratio2004 nodupkey;

   by HospID;

run;

 

data replace2003;

   set ratio2004 (in=in1)   just2003 (in=in2);

   by HospID;

   if first.HospID then do;

      if in1 then replacement_ratio = ratio * 1.04;

      else replacement_ratio = .;

   end;

   if in2;

   if ratio = . then ratio = replacement_ratio;

   drop replacement_ratio;

   retain replacement_ratio;

run;

 

data want;

   set replace2003 just2004 other_years;

run;

PROC Star
Posts: 1,562

Re: Merge/left join onto certain rows

Like this?


proc sql;
   create table WANT as  
     select HOSPID, 2003 as YEAR, RATIO*1.04 as RATIO
     from HAVE
     where YEAR eq 2004
   union all
     select * 
     from HAVE
     where YEAR ne 2003;
quit;
HOSPID YEAR RATIO
A 2003 0.208
B 2003 0.2496
C 2003 0.312
A 2004 0.2
B 2004 0.24
C 2004 0.3
Occasional Contributor
Posts: 6

Re: Merge/left join onto certain rows

I tried this code but if there is a hospital in 2003 which is not in the other years, it gets overwritten. see this dataset;

 

data old;

infile datalines;

input hospid $ year ratio;

datalines;

A 2003 .

A 2003 .

B 2003 .

C 2003 .

A 2004 .2

B 2004 .3

B 2004 .3

D 2004 .4

A 2005 .3

;

run;

 

output should be this but instead C gets overwriten:

A 2003 .208

A 2003 .208

B 2003 .312

C 2003 .

A 2004 .2

B 2004 .3

B 2004 .3

D 2004 .4

A 2005 .3

Ask a Question
Discussion stats
  • 9 replies
  • 124 views
  • 0 likes
  • 5 in conversation