Hey guys,
I have a problem regarding a Join of two tables.
I have a table A which looks like this:
data a;
infile datalines dlm='/';
input (Date cusip fundid value) ($);
datalines;
Jan2012/ 123 / A / 5$
Jan2012/ 123 / B / 6$
Feb2012 / 124 / A / 4$
Feb2012 / 125 / C / 4$
;
And a Table b which looks like this:
data b;
infile datalines dlm='/';
input (Date cusip relValue) ($);
datalines;
Jan2012 / 123 / 4$
Jan2012 / 124 / 5$
Jan2012 / 125 / 5$
Feb2012 / 123 / 6$
Feb2012 / 124 / 2$
Feb2012 / 125 / 4$
;
Now I want to join these tables as followed:
data want;
infile datalines dlm='/';
input (Date cusip relvalue fundid value) ($);
datalines;
Jan2012 / 123 / 4$ / A / 5$
Jan2012 / 124 / 5$ / A / .
Jan2012 / 125 / 5$ / A / .
Feb2012 / 123 / 6$ / A / .
Feb2012 / 124 / 2$ / A / 4$
Feb2012 / 125 / 4$ / A / .
Jan2012 / 123 / 4$ / B / 6$
Jan2012 / 124 / 5$ / B / .
Jan2012 / 125 / 5$ / B / .
Feb2012 / 123 / 6$ / B / .
Feb2012 / 124 / 2$ / B / .
Feb2012 / 125 / 4$ / B / .
Jan2012 / 123 / 4$ / C / .
Jan2012 / 124 / 5$ / C / .
Jan2012 / 125 / 5$ / C / .
Feb2012 / 123 / 6$ / C / .
Feb2012 / 124 / 2$ / C / .
Feb2012 / 125 / 4$ / C / 4$
;
This means: For each fundid in table A, I want to add all date, cusip Observations of Table B.
The joined table should then show the value for this date/cusip/fundid observation if available,
otherwise it should show a missing value.
Furthermore, there is only one entry for each date, cusip observation in table b
and only one entry for each date, cusip, fundid value in table b.
Sorry, Ive got the feeling that this question is stupid but Im not sure how to handle this issue.
Is this the Cartesian Product?
Assuming i understand:
data a;
infile datalines dlm='/';
input Date :monyy7. (cusip fundid value) ($);
format date monyy7.;
datalines;
Jan2012/ 123 / A / 5$
Jan2012/ 123 / B / 6$
Feb2012 / 124 / A / 4$
Feb2012 / 125 / C / 4$
;
data b;
infile datalines dlm='/';
input Date : monyy7. (cusip relValue) ($);
format date monyy7.;
datalines;
Jan2012 / 123 / 4$
Jan2012 / 124 / 5$
Jan2012 / 125 / 5$
Feb2012 / 123 / 6$
Feb2012 / 124 / 2$
Feb2012 / 125 / 4$
;
proc sql;
create table want as
select x1.*,x2.value
from
(select * from b , (select distinct fundid from a)) x1
left join a x2
on x1.fundid=x2.fundid and x1.date=x2.date and x1.cusip=x2.cusip;
quit;
Btw, If the above works, I still don't think sql is a robust approach and should switch to datastep, much much faster. But I love sql for fun atleast here and that me being cheeky as the onus is on OP to be concerned about performance 🙂
Assuming i understand:
data a;
infile datalines dlm='/';
input Date :monyy7. (cusip fundid value) ($);
format date monyy7.;
datalines;
Jan2012/ 123 / A / 5$
Jan2012/ 123 / B / 6$
Feb2012 / 124 / A / 4$
Feb2012 / 125 / C / 4$
;
data b;
infile datalines dlm='/';
input Date : monyy7. (cusip relValue) ($);
format date monyy7.;
datalines;
Jan2012 / 123 / 4$
Jan2012 / 124 / 5$
Jan2012 / 125 / 5$
Feb2012 / 123 / 6$
Feb2012 / 124 / 2$
Feb2012 / 125 / 4$
;
proc sql;
create table want as
select x1.*,x2.value
from
(select * from b , (select distinct fundid from a)) x1
left join a x2
on x1.fundid=x2.fundid and x1.date=x2.date and x1.cusip=x2.cusip;
quit;
Btw, If the above works, I still don't think sql is a robust approach and should switch to datastep, much much faster. But I love sql for fun atleast here and that me being cheeky as the onus is on OP to be concerned about performance 🙂
Mate, thank you so much. This looks exactly what I was looking for.
Deep in my heart, I hoped that you will find my post 😄
Since I only need to apply this code once, I don´t really have performance issues with it 🙂
The pleasure is all mine. You're most welcome!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.