I have a situation in which I have one table with one row per primary key and I need to left join onto multiple tables that can have more than one row per primary key. I would like the resulting table (a view, actually) to be one row per value of the primary key, which means the PROC SQL procedure needs to derive new variables to transpose the "long" tables to wide data sets. In this example, I do not get the output I'm looking for. I've looked in the documentation and don't see a simple solution. I need a workable solution in PROC SQL because I am building SQL views off tables in a MySQL database. This code results in the wrong output table, so I need a modification in the statement somewhere.
data TableA;
input x $ y z;
cards;
A01 55 66
A02 11 22
A03 44 88
;
run;
data Long;
input ID $ Key1;
cards;
A01 8
A01 7
A03 9
A03 3
;
run;
data Long2;
input ID $ Key2;
cards;
A01 8
A01 7
A01 4
A03 9
A03 3
;
run;
proc sql;
create view merged as
select distinct a.x, a.y, a.z,
case when b.key1 = 8 then 1 else 0 end as L8,
case when b.key1 = 7 then 1 else 0 end as L7,
case when b.key1 = 9 then 1 else 0 end as L9,
case when c.key2 = 8 then 1 else 0 end as M8,
case when c.key2 = 7 then 1 else 0 end as M7,
case when c.key2 = 4 then 1 else 0 end as M4,
case when c.key2 = 9 then 1 else 0 end as M9,
case when c.key2 = 3 then 1 else 0 end as M3
from TableA a
left join Long b on (a.x = b.ID)
left join Long2 c on (a.x = c.ID)
group by a.x;
quit;
You could be right, but I need a SQL view nonetheless. Turns out this is a case for a FULL JOIN. It's not pretty but it works.
proc sql;
create view merged_full_sum as
select distinct a.x, a.y, a.z,
sum(case when b.key1 = 8 then 1 else 0 end) as L8,
sum(case when b.key1 = 7 then 1 else 0 end) as L7,
sum(case when b.key1 = 9 then 1 else 0 end) as L9,
sum(case when c.key2 = 8 then 1 else 0 end) as M8,
sum(case when c.key2 = 7 then 1 else 0 end) as M7,
sum(case when c.key2 = 4 then 1 else 0 end) as M4,
sum(case when c.key2 = 9 then 1 else 0 end) as M9,
sum(case when c.key2 = 3 then 1 else 0 end) as M3,
(case when calculated L8 > 1 then 1 else calculated L8 end) as L8,
(case when calculated L7 > 1 then 1 else calculated L7 end) as L7_new,
(case when calculated L9 > 1 then 1 else calculated L9 end) as L9_new,
(case when calculated M8 > 1 then 1 else calculated M8 end) as M8_new,
(case when calculated M7 > 1 then 1 else calculated M7 end) as M7_new,
(case when calculated M4 > 1 then 1 else calculated M4 end) as M4_new,
(case when calculated M9 > 1 then 1 else calculated M9 end) as M9_new,
(case when calculated M3 > 1 then 1 else calculated M3 end) as M3_new
from TableA a
full join Long b on (a.x = b.ID)
full join Long2 c on (a.x = c.ID)
group by a.x;
quit;
One suspects that a LOT of information is left out of your problem description.
If you have two matches for the primary key in ONE set that has 10 other variables then you need to create 10 additional variables for the second rows values, if your have 4 of the same key value you need to add 30 varibles (for a total of 4 of each one).
SQL is basically the last tool I would grab for that because every SQL step has to list every single variable by name and this sounds like you may not even know how many variables you need to create for the transpose.
Maybe someone with MySQL knowledge knows something that is MySQL specific that will help.
I might attempt to transpose each of those sets before joining them. Then the join might be relatively simple otherwise you have . Make intermediate VIEWS of that SQL, if you can get it to work for each table.
I really do wonder how well thought out the next steps that use this result are. I predict really nasty headaches to come.
You could be right, but I need a SQL view nonetheless. Turns out this is a case for a FULL JOIN. It's not pretty but it works.
proc sql;
create view merged_full_sum as
select distinct a.x, a.y, a.z,
sum(case when b.key1 = 8 then 1 else 0 end) as L8,
sum(case when b.key1 = 7 then 1 else 0 end) as L7,
sum(case when b.key1 = 9 then 1 else 0 end) as L9,
sum(case when c.key2 = 8 then 1 else 0 end) as M8,
sum(case when c.key2 = 7 then 1 else 0 end) as M7,
sum(case when c.key2 = 4 then 1 else 0 end) as M4,
sum(case when c.key2 = 9 then 1 else 0 end) as M9,
sum(case when c.key2 = 3 then 1 else 0 end) as M3,
(case when calculated L8 > 1 then 1 else calculated L8 end) as L8,
(case when calculated L7 > 1 then 1 else calculated L7 end) as L7_new,
(case when calculated L9 > 1 then 1 else calculated L9 end) as L9_new,
(case when calculated M8 > 1 then 1 else calculated M8 end) as M8_new,
(case when calculated M7 > 1 then 1 else calculated M7 end) as M7_new,
(case when calculated M4 > 1 then 1 else calculated M4 end) as M4_new,
(case when calculated M9 > 1 then 1 else calculated M9 end) as M9_new,
(case when calculated M3 > 1 then 1 else calculated M3 end) as M3_new
from TableA a
full join Long b on (a.x = b.ID)
full join Long2 c on (a.x = c.ID)
group by a.x;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.