Here is the scenario: I have a dataset with hospital identifiers, years, and a cost to charge ratio. For the year 2003, no cost to charge ratio is available. I need to take the ratio from 2004, multipy it by 1.04, and then merge by hospital ID to get cost to charge ratios for 2003. It might make more sense if you look at the code--I want to go from old to new.
data old;
infile datalines;
input Hospid $ Year Ratio;
datalines;
A 2003 .
B 2003 .
C 2003 .
C 2003 .
A 2004 .2
B 2004 .24
C 2004 .3
;
run;
and I want to get new by taking the ratio from 2004, multiplying it by 1.04 and then matching it by hospid:
data new;
infile datalines;
input hospid $ year ratio;
datalines;
A 2003 .208
B 2003 .2496
C 2003 .312
C 2003 .312
A 2004 .2
B 2004 .24
C 2004 .3
;
run;
I have tried two methods:
1. a proc sql left join, in which I made a separate dataset of 2004 observations and then tried to join by hospid. that still gave me missing values
2. also making a separate dataset of 2004 observations, sorting that and the original dataset, then merging by hospid. that mostly worked, except when hospital id appeared more than once in which case i only got a ratio for one of the occurances.
thanks for any advice!
something like this hsould work for your scenario
proc sql;
create table new as
select a.hospid, a.year, case when a.hospid=b.hospid and a.ratio is missing
then b.ratio *1.04
else a.ratio
end as ratio
from
(select hospid, year, ratio
from old)a
left join
(select distinct hospid, ratio from old
where ratio is not missing)b
on a.hospid =b.hospid;
;
When I tried this I ended up with more observations than in the original dataset
Your only matching criteria is ID but ID is not unique to each observation. Am I reading your data set correctly?
There are too many cases that you haven't covered here. What should the results be for these combinations?
data old;
infile datalines;
input Hospid $ Year Ratio;
datalines;
A 2003 .
A 2003 .
B 2003 .
C 2003 .
C 2003 .
A 2004 .2
A 2004 .3
B 2004 .24
B 2004 .26
C 2004 .3
;
Anything is possible. But the rules are incomplete at this point.
I'm not sure if this would be the fastest way, but it's probably the clearest way.
data just2003 just2004 other_years;
set old;
if year=2003 then output just2003;
else if year = 2004 then output just2004;
else output other_years;
run;
proc sort data=just2003;
by HospID;
run;
proc sort data=just2004 out=ratio2004 nodupkey;
by HospID;
run;
data replace2003;
set ratio2004 (in=in1) just2003 (in=in2);
by HospID;
if first.HospID then do;
if in1 then replacement_ratio = ratio * 1.04;
else replacement_ratio = .;
end;
if in2;
if ratio = . then ratio = replacement_ratio;
drop replacement_ratio;
retain replacement_ratio;
run;
data want;
set replace2003 just2004 other_years;
run;
Like this?
proc sql;
create table WANT as
select HOSPID, 2003 as YEAR, RATIO*1.04 as RATIO
from HAVE
where YEAR eq 2004
union all
select *
from HAVE
where YEAR ne 2003;
quit;
HOSPID | YEAR | RATIO |
---|---|---|
A | 2003 | 0.208 |
B | 2003 | 0.2496 |
C | 2003 | 0.312 |
A | 2004 | 0.2 |
B | 2004 | 0.24 |
C | 2004 | 0.3 |
I tried this code but if there is a hospital in 2003 which is not in the other years, it gets overwritten. see this dataset;
data old;
infile datalines;
input hospid $ year ratio;
datalines;
A 2003 .
A 2003 .
B 2003 .
C 2003 .
A 2004 .2
B 2004 .3
B 2004 .3
D 2004 .4
A 2005 .3
;
run;
output should be this but instead C gets overwriten:
A 2003 .208
A 2003 .208
B 2003 .312
C 2003 .
A 2004 .2
B 2004 .3
B 2004 .3
D 2004 .4
A 2005 .3
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.