How to use proc sql to merge 2 datasets with similar variables (4 different variables in one dataset) but want to match it by a Unique_ID ?
I have 2 datasets dataset1 has about 20 variables including a variabled called Unique_ID.
dataset2 has about 24 variables also including Unique_ID but has 4 other variables that I want to merge into a table with.
I want to merge these tables and match it by Unique_ID.
I've tried the code below but when I do, it will multiply my observations which juristically increases my rows and I do not want that. What am I doing wrong?
proc sql; create table test as select * from dataset1, dataset2 where dataset1.Unique_ID=dataset2.Unique_ID; quit;
If you are expecting your row IDs to be unique, but they are not then you need to investigate why and then decide on a process to make them unique. This is a data understanding issue. You may find PROC FREQ useful to understanding where your duplicates are:
proc freq data = test; table uniqueID1 uniqueID2 / missing; run;
Count will be greater than 1 for any duplicates.
To define the appropriate join you need to know two things:
Below two links are hopefully useful to you.
This is helpful! Question though. I checked my data freq by the unique_id on a specific id #. For example dateset1 had 8 observations for that id#. Dataset2 had 16 for that specific id#. Joineddataset had 128 observations for that specific id#. Therefor it seemed like it multiplied? Why is that? Do I just deduplicate it?
That means you've got a many:many relationship between your tables and unique_id is not unique.
Even a SQL inner join will return 8X16 rows which is what you observe.
How would you deduplicate? You would need for at least one of the tables all variable values of interest to be identical for a specific "unique" id. Is that the case? If not then which values would you pick in order to "dedup"?
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.