Hello,
here is my code. Basically I have 2 datasets, and I want to get some variables from one (a, b) but the primary key's values have to be the ones that are in another dataset (c).
Dataset C contains some ID's that have a particular status, so I'm interested in getting the variables in A for those ID's only.
The code I pasted below gives me duplicated rows on some (not all) ID's. Is there an easy way to fix this?
For context, loans.data_apps has 1 row per ID, and data_collections can have multiple rows per ID.
Please if possible keep the proc sql notation, this is part of a larger code made by a colleague.
proc sql;
create table outputs_table as
select a.id,
b.*
from (select id from loans.data_apps) a
left join
(select id, 'apps' as var1, var2, var3, var4
from loans.data_apps
) b
on a.id = b.id
inner join
loans.data_collections c
on a.id= c.id
;
quit;
Let us look at this part of your code. (I am looking at what you have shown here )
select a.id,
b.*
from (select id from loans.data_apps) a
left join
(select id, 'apps' as var1, var2, var3, var4
from loans.data_apps
) b
on a.id = b.id
I don't think you are achieving anything by this join.
Basically my understanding is that you want records from loans.data_apps that have their id's present in loans.data_collections.
With this understanding I would write a code something like this
proc sql;
select id, 'apps' as var1, var2, var3 ,var4 from load.data_apps a
where a.id in (select id from loans.data_collections);
quit;
Their will duplicate id's if loans.data_apps has duplicates.
In that case use select distinct in place of select.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.