Hello,
here is my code. Basically I have 2 datasets, and I want to get some variables from one (a, b) but the primary key's values have to be the ones that are in another dataset (c).
Dataset C contains some ID's that have a particular status, so I'm interested in getting the variables in A for those ID's only.
The code I pasted below gives me duplicated rows on some (not all) ID's. Is there an easy way to fix this?
For context, loans.data_apps has 1 row per ID, and data_collections can have multiple rows per ID.
Please if possible keep the proc sql notation, this is part of a larger code made by a colleague.
proc sql;
create table outputs_table as
select a.id,
b.*
from (select id from loans.data_apps) a
left join
(select id, 'apps' as var1, var2, var3, var4
from loans.data_apps
) b
on a.id = b.id
inner join
loans.data_collections c
on a.id= c.id
;
quit;
Let us look at this part of your code. (I am looking at what you have shown here )
select a.id,
b.*
from (select id from loans.data_apps) a
left join
(select id, 'apps' as var1, var2, var3, var4
from loans.data_apps
) b
on a.id = b.id
I don't think you are achieving anything by this join.
Basically my understanding is that you want records from loans.data_apps that have their id's present in loans.data_collections.
With this understanding I would write a code something like this
proc sql;
select id, 'apps' as var1, var2, var3 ,var4 from load.data_apps a
where a.id in (select id from loans.data_collections);
quit;
Their will duplicate id's if loans.data_apps has duplicates.
In that case use select distinct in place of select.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.