I have two variables a and b indexed by firm and time as follows.
data a;
input firm time a;
cards;
1 1 99
1 2 98
1 4 97
1 5 95
2 1 89
2 3 90
2 4 88
2 5 94
;
I want to append the following b to the a above—for firm 1 (2), b will be updated at time 1 and 4 (2 and 5).
data b;
input firm time b;
cards;
1 0.9 11
1 3.1 15
2 1.8 42
2 4.2 29
;
I tried to JOIN them using A.TIME=CEIL(B.TIME) but failed because firm 2 has no observation at time 2. In addition, the following way appends no information at time 2 and 5 (3 and 4) for firm 1 (2).
proc sql;
create table ab as select a.firm,a.time,a,b from a left join b on a.firm=b.firm and a.time=ceil(b.time) order by firm,time;
quit;
I am expecting a slow-moving b in addition to the original a after the JOINing. How can I execute this kind of one-to-many JOINing conveniently?
Have look at:
data want;
set aaaa;
by firm;
if first.firm then do;
b_value = .;
end;
do _p = 1 to _count;
set bbbb(rename= (firm = b_firm time = b_time)) point=_p nobs=_count;
if firm = b_firm and time >= b_time then do;
b_value = b;
end;
end;
b = b_value;
drop b_: _:;
run;
Performance will be disgusting, but could be optimized, maybe @PeterClemmensen has an idea how point= could be avoided.
Thanks for posting data in usable form, unfortunately you forgot to add the expected result to your message, leaving me puzzled whether you want to join (=merge) or append those two tables. From the code it seems you want to merge, but in the text you are talking about appending.
My apologies. Here I add the results I was expecting.
data ab;
input firm time a b;
cards;
1 1 99 11
1 2 98 11
1 4 97 15
1 5 95 15
2 1 89 .
2 3 90 42
2 4 88 42
2 5 94 29
;
Thank you!
Sorry, but i don't understand the logic you want implemented to return the first three lines for firm=2.
b should be missing at the beginning for each firm and will be updated accordingly after each arrival in table b (and continued until the next arrival).
For firm 1, for example, the value of b is revised twice at time=0.9 and 3.1. Therefore, at time=1 and 2, the value of b is 11, and at time=4 and 5, the value of b is 15 since the new value arrived at time=3.1. For firm 2, similarly, the value of b is missing before time=1.8. The value of b becomes 42 at time=3 and 4. And finally, it becomes 29 at time=5 because of the update at time=4.2.
Have look at:
data want;
set aaaa;
by firm;
if first.firm then do;
b_value = .;
end;
do _p = 1 to _count;
set bbbb(rename= (firm = b_firm time = b_time)) point=_p nobs=_count;
if firm = b_firm and time >= b_time then do;
b_value = b;
end;
end;
b = b_value;
drop b_: _:;
run;
Performance will be disgusting, but could be optimized, maybe @PeterClemmensen has an idea how point= could be avoided.
Haven't tested thoroughly, but I thinks this will work
data ab(keep=firm time a b);
if _N_ = 1 then do;
dcl hash h(dataset : "b", ordered : "a", multidata : "y");
h.definekey("firm");
h.definedata("b", "time");
h.definedone();
end;
set a;
if 0 then set b;
t = time;
do rc = h.find() by 0 while (rc = 0 & time <= t);
_b = b;
rc = h.find_next();
end;
b = _b;
time = t;
run;
Result:
firm time a b 1 1 99 11 1 2 98 11 1 4 97 15 1 5 95 15 2 1 89 . 2 3 90 42 2 4 88 42 2 5 94 29
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.