I have two variables a and b indexed by firm and time as follows.
data a;
input firm time a;
cards;
1 1 99
1 2 98
1 4 97
1 5 95
2 1 89
2 3 90
2 4 88
2 5 94
;
I want to append the following b to the a above—for firm 1 (2), b will be updated at time 1 and 4 (2 and 5).
data b;
input firm time b;
cards;
1 0.9 11
1 3.1 15
2 1.8 42
2 4.2 29
;
I tried to JOIN them using A.TIME=CEIL(B.TIME) but failed because firm 2 has no observation at time 2. In addition, the following way appends no information at time 2 and 5 (3 and 4) for firm 1 (2).
proc sql;
create table ab as select a.firm,a.time,a,b from a left join b on a.firm=b.firm and a.time=ceil(b.time) order by firm,time;
quit;
I am expecting a slow-moving b in addition to the original a after the JOINing. How can I execute this kind of one-to-many JOINing conveniently?
Have look at:
data want;
set aaaa;
by firm;
if first.firm then do;
b_value = .;
end;
do _p = 1 to _count;
set bbbb(rename= (firm = b_firm time = b_time)) point=_p nobs=_count;
if firm = b_firm and time >= b_time then do;
b_value = b;
end;
end;
b = b_value;
drop b_: _:;
run;
Performance will be disgusting, but could be optimized, maybe @PeterClemmensen has an idea how point= could be avoided.
Thanks for posting data in usable form, unfortunately you forgot to add the expected result to your message, leaving me puzzled whether you want to join (=merge) or append those two tables. From the code it seems you want to merge, but in the text you are talking about appending.
My apologies. Here I add the results I was expecting.
data ab;
input firm time a b;
cards;
1 1 99 11
1 2 98 11
1 4 97 15
1 5 95 15
2 1 89 .
2 3 90 42
2 4 88 42
2 5 94 29
;
Thank you!
Sorry, but i don't understand the logic you want implemented to return the first three lines for firm=2.
b should be missing at the beginning for each firm and will be updated accordingly after each arrival in table b (and continued until the next arrival).
For firm 1, for example, the value of b is revised twice at time=0.9 and 3.1. Therefore, at time=1 and 2, the value of b is 11, and at time=4 and 5, the value of b is 15 since the new value arrived at time=3.1. For firm 2, similarly, the value of b is missing before time=1.8. The value of b becomes 42 at time=3 and 4. And finally, it becomes 29 at time=5 because of the update at time=4.2.
Have look at:
data want;
set aaaa;
by firm;
if first.firm then do;
b_value = .;
end;
do _p = 1 to _count;
set bbbb(rename= (firm = b_firm time = b_time)) point=_p nobs=_count;
if firm = b_firm and time >= b_time then do;
b_value = b;
end;
end;
b = b_value;
drop b_: _:;
run;
Performance will be disgusting, but could be optimized, maybe @PeterClemmensen has an idea how point= could be avoided.
Haven't tested thoroughly, but I thinks this will work
data ab(keep=firm time a b);
if _N_ = 1 then do;
dcl hash h(dataset : "b", ordered : "a", multidata : "y");
h.definekey("firm");
h.definedata("b", "time");
h.definedone();
end;
set a;
if 0 then set b;
t = time;
do rc = h.find() by 0 while (rc = 0 & time <= t);
_b = b;
rc = h.find_next();
end;
b = _b;
time = t;
run;
Result:
firm time a b 1 1 99 11 1 2 98 11 1 4 97 15 1 5 95 15 2 1 89 . 2 3 90 42 2 4 88 42 2 5 94 29
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.