Hello,
here is my code. Basically I have 2 datasets, and I want to get some variables from one (a, b) but the primary key's values have to be the ones that are in another dataset (c).
Dataset C contains some ID's that have a particular status, so I'm interested in getting the variables in A for those ID's only.
The code I pasted below gives me duplicated rows on some (not all) ID's. Is there an easy way to fix this?
For context, loans.data_apps has 1 row per ID, and data_collections can have multiple rows per ID.
Please if possible keep the proc sql notation, this is part of a larger code made by a colleague.
proc sql;
create table outputs_table as
select a.id,
b.*
from (select id from loans.data_apps) a
left join
(select id, 'apps' as var1, var2, var3, var4
from loans.data_apps
) b
on a.id = b.id
inner join
loans.data_collections c
on a.id= c.id
;
quit;
Let us look at this part of your code. (I am looking at what you have shown here )
select a.id,
b.*
from (select id from loans.data_apps) a
left join
(select id, 'apps' as var1, var2, var3, var4
from loans.data_apps
) b
on a.id = b.id
I don't think you are achieving anything by this join.
Basically my understanding is that you want records from loans.data_apps that have their id's present in loans.data_collections.
With this understanding I would write a code something like this
proc sql;
select id, 'apps' as var1, var2, var3 ,var4 from load.data_apps a
where a.id in (select id from loans.data_collections);
quit;
Their will duplicate id's if loans.data_apps has duplicates.
In that case use select distinct in place of select.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.