Solved: Re: JOINing Two Panels with Different Time Indices

Junyong · Posted 08-12-2020 11:01 PM

I have two variables a and b indexed by firm and time as follows.

data a;
input firm time a;
cards;
1 1 99
1 2 98
1 4 97
1 5 95
2 1 89
2 3 90
2 4 88
2 5 94
;

I want to append the following b to the a above—for firm 1 (2), b will be updated at time 1 and 4 (2 and 5).

data b;
input firm time b;
cards;
1 0.9 11
1 3.1 15
2 1.8 42
2 4.2 29
;

I tried to JOIN them using A.TIME=CEIL(B.TIME) but failed because firm 2 has no observation at time 2. In addition, the following way appends no information at time 2 and 5 (3 and 4) for firm 1 (2).

proc sql;
create table ab as select a.firm,a.time,a,b from a left join b on a.firm=b.firm and a.time=ceil(b.time) order by firm,time;
quit;

I am expecting a slow-moving b in addition to the original a after the JOINing. How can I execute this kind of one-to-many JOINing conveniently?

andreas_lds · Posted 08-13-2020 05:21 AM

Have look at:

data want;
   set aaaa;
   by firm;

   if first.firm then do;
      b_value = .;
   end;

   do _p = 1 to _count;
      set bbbb(rename= (firm = b_firm time = b_time)) point=_p nobs=_count;
      
      if firm = b_firm and time >= b_time then do;
         b_value = b;
      end;
   end;

   b = b_value;

   drop b_: _:;
run;

Performance will be disgusting, but could be optimized, maybe @PeterClemmensen has an idea how point= could be avoided.

View solution in original post

andreas_lds · Posted 08-13-2020 12:49 AM

Thanks for posting data in usable form, unfortunately you forgot to add the expected result to your message, leaving me puzzled whether you want to join (=merge) or append those two tables. From the code it seems you want to merge, but in the text you are talking about appending.

Junyong · Posted 08-13-2020 01:35 AM

My apologies. Here I add the results I was expecting.

data ab;
input firm time a b;
cards;
1 1 99 11
1 2 98 11
1 4 97 15
1 5 95 15
2 1 89 .
2 3 90 42
2 4 88 42
2 5 94 29
;

Thank you!

andreas_lds · Posted 08-13-2020 02:24 AM

Sorry, but i don't understand the logic you want implemented to return the first three lines for firm=2.

Junyong · Posted 08-13-2020 03:22 AM

b should be missing at the beginning for each firm and will be updated accordingly after each arrival in table b (and continued until the next arrival).

For firm 1, for example, the value of b is revised twice at time=0.9 and 3.1. Therefore, at time=1 and 2, the value of b is 11, and at time=4 and 5, the value of b is 15 since the new value arrived at time=3.1. For firm 2, similarly, the value of b is missing before time=1.8. The value of b becomes 42 at time=3 and 4. And finally, it becomes 29 at time=5 because of the update at time=4.2.

andreas_lds · Posted 08-13-2020 05:21 AM

Have look at:

data want;
   set aaaa;
   by firm;

   if first.firm then do;
      b_value = .;
   end;

   do _p = 1 to _count;
      set bbbb(rename= (firm = b_firm time = b_time)) point=_p nobs=_count;
      
      if firm = b_firm and time >= b_time then do;
         b_value = b;
      end;
   end;

   b = b_value;

   drop b_: _:;
run;

Performance will be disgusting, but could be optimized, maybe @PeterClemmensen has an idea how point= could be avoided.

PeterClemmensen · Posted 08-14-2020 05:59 AM

Haven't tested thoroughly, but I thinks this will work

data ab(keep=firm time a b);
   if _N_ = 1 then do;
      dcl hash h(dataset : "b", ordered : "a", multidata : "y");
      h.definekey("firm");
      h.definedata("b", "time");
      h.definedone();
   end;

   set a;
   if 0 then set b;
   t = time;

   do rc = h.find() by 0 while (rc = 0 & time <= t);
      _b = b;
      rc = h.find_next();
   end;

   b = _b;
   time = t;
run;

Result:

firm time a   b 
1    1    99  11 
1    2    98  11 
1    4    97  15 
1    5    95  15 
2    1    89  . 
2    3    90  42 
2    4    88  42 
2    5    94  29

The DATA to DATA Step Macro
Blog: SASnrd

Register Today!

SAS Training: Just a Click Away