BookmarkSubscribeRSS Feed
lnicholl9
Calcite | Level 5

 

Here is the scenario: I have a dataset with hospital identifiers, years, and a cost to charge ratio. For the year 2003, no cost to charge ratio is available. I need to take the ratio from 2004, multipy it by 1.04, and then merge by hospital ID to get cost to charge ratios for 2003. It might make more sense if you look at the code--I want to go from old to new.

 

data old;

infile datalines;

input Hospid $ Year Ratio;

datalines;

A 2003 .

B 2003 .

C 2003 .

C 2003 .

A 2004 .2

B 2004 .24

C 2004 .3

;

run;

 

and I want to get new by taking the ratio from 2004, multiplying it by 1.04 and then matching it by hospid:

 

data new;

infile datalines;

input hospid $ year ratio;

datalines;

A 2003 .208

B 2003 .2496

C 2003 .312

C 2003 .312

A 2004 .2

B 2004 .24

C 2004 .3

;

run;

 

I have tried two methods:

1. a proc sql left join, in which I made a separate dataset of 2004 observations and then tried to join by hospid. that still gave me missing values

2. also making a separate dataset of 2004 observations, sorting that and the original dataset, then merging by hospid. that mostly worked, except when hospital id appeared more than once in which case i only got a ratio for one of the occurances.

 

thanks for any advice! 

 

 

 

 

9 REPLIES 9
kiranv_
Rhodochrosite | Level 12

something like this hsould work for your scenario

proc sql;
create table new as select a.hospid, a.year, case when a.hospid=b.hospid and a.ratio is missing then b.ratio *1.04 else a.ratio end as ratio from (select hospid, year, ratio from old)a left join (select distinct hospid, ratio from old where ratio is not missing)b on a.hospid =b.hospid; ;

 

lnicholl9
Calcite | Level 5

When I tried this I ended up with more observations than in the original dataset

GreggB
Pyrite | Level 9

Your only matching criteria is ID but ID is not unique to each observation. Am I reading your data set correctly?

lnicholl9
Calcite | Level 5
I think so, there can be more than one of the same hospital ID within a given year.
Astounding
PROC Star

There are too many cases that you haven't covered here.  What should the results be for these combinations?

 

data old;

infile datalines;

input Hospid $ Year Ratio;

datalines;

A 2003 .

A 2003 .

B 2003 .

C 2003 .

C 2003 .

A 2004 .2

A 2004 .3

B 2004 .24

B 2004 .26

C 2004 .3

;

 

Anything is possible.  But the rules are incomplete at this point.

lnicholl9
Calcite | Level 5
Sorry, should have been more clear. There would not be a case in which the ratio would be different for two instances of the same hospital.

So this:
A 2004 .3
A 2004 .2

Would never happen. However this could happen:
A 2004 .3
A 2005 .2

But I don't care about years besides 2003 and 2004, I want those years to be unaffected
Astounding
PROC Star

I'm not sure if this would be the fastest way, but it's probably the clearest way.

 

data just2003 just2004 other_years;

   set old;

   if year=2003 then output just2003;

   else if year = 2004 then output just2004;

   else output other_years;

run;

 

proc sort data=just2003;

   by HospID;

run;

 

proc sort data=just2004 out=ratio2004 nodupkey;

   by HospID;

run;

 

data replace2003;

   set ratio2004 (in=in1)   just2003 (in=in2);

   by HospID;

   if first.HospID then do;

      if in1 then replacement_ratio = ratio * 1.04;

      else replacement_ratio = .;

   end;

   if in2;

   if ratio = . then ratio = replacement_ratio;

   drop replacement_ratio;

   retain replacement_ratio;

run;

 

data want;

   set replace2003 just2004 other_years;

run;

ChrisNZ
Tourmaline | Level 20

Like this?


proc sql;
   create table WANT as  
     select HOSPID, 2003 as YEAR, RATIO*1.04 as RATIO
     from HAVE
     where YEAR eq 2004
   union all
     select * 
     from HAVE
     where YEAR ne 2003;
quit;
HOSPID YEAR RATIO
A 2003 0.208
B 2003 0.2496
C 2003 0.312
A 2004 0.2
B 2004 0.24
C 2004 0.3
lnicholl9
Calcite | Level 5

I tried this code but if there is a hospital in 2003 which is not in the other years, it gets overwritten. see this dataset;

 

data old;

infile datalines;

input hospid $ year ratio;

datalines;

A 2003 .

A 2003 .

B 2003 .

C 2003 .

A 2004 .2

B 2004 .3

B 2004 .3

D 2004 .4

A 2005 .3

;

run;

 

output should be this but instead C gets overwriten:

A 2003 .208

A 2003 .208

B 2003 .312

C 2003 .

A 2004 .2

B 2004 .3

B 2004 .3

D 2004 .4

A 2005 .3

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 3107 views
  • 0 likes
  • 5 in conversation