Hi All,
I want to merge 2 dataset using proc sql. Before merging I want to remove duplicates in one dataset and then merge . Can this be done under single proc statement in proc sql.
What do you mean by duplicates? Rows where all columns' values are the same or where certain key columns have the same values? SQL isn't very good at removing duplicate rows / key columns and there are better methods to do this.
Duplicates here I meant is certain key columns have the same values. Thanks for the reply
The blueprint is
proc sql;
create table want as
select
t1.*,
t2.var,
t2.varx
from have1 t1 left join (
select distinct id, var, varx
from have2
) t2
on t1.id = t2.id
;
quit;
To really be of help here, we need to see examples for your data and what you expect as a resulting table.
Post your incoming datasets as data steps with datalines, so we do not have to make guesses about your data.
Thanks a lot, I think this will be helping out
Keep in mind that, as always, the task defines the tool. Depending on the layout, size and contents of your data, other means than SQL might be more efficient, or it might even be that SQL can't do what you want (SQL only knows DISTINCT, but PROC SORT makes a difference between NODUPREC and NODUPKEY).
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.