How to use proc sql to merge 2 datasets with similar variables (4 different variables in one dataset) but want to match it by a Unique_ID ?
I have 2 datasets dataset1 has about 20 variables including a variabled called Unique_ID.
dataset2 has about 24 variables also including Unique_ID but has 4 other variables that I want to merge into a table with.
I want to merge these tables and match it by Unique_ID.
I've tried the code below but when I do, it will multiply my observations which juristically increases my rows and I do not want that. What am I doing wrong?
proc sql;
create table test as
select *
from dataset1, dataset2
where dataset1.Unique_ID=dataset2.Unique_ID;
quit;
If the number of rows increase in TEST, that means that either the column dataset1.Unique_ID or column dataset2.Unique_ID or both do NOT contain unique values for each row. Do you expect them to be unique?
If you are expecting your row IDs to be unique, but they are not then you need to investigate why and then decide on a process to make them unique. This is a data understanding issue. You may find PROC FREQ useful to understanding where your duplicates are:
proc freq data = test;
table uniqueID1 uniqueID2 / missing;
run;
Count will be greater than 1 for any duplicates.
To define the appropriate join you need to know two things:
Below two links are hopefully useful to you.
https://documentation.sas.com/doc/en/vdmmlcdc/8.1/sqlproc/p1bk7i6jqseje7n1lifcip8kzhpp.htm
@rebelde52 wrote:
This is helpful! Question though. I checked my data freq by the unique_id on a specific id #. For example dateset1 had 8 observations for that id#. Dataset2 had 16 for that specific id#. Joineddataset had 128 observations for that specific id#. Therefor it seemed like it multiplied? Why is that? Do I just deduplicate it?
That means you've got a many:many relationship between your tables and unique_id is not unique.
Even a SQL inner join will return 8X16 rows which is what you observe.
How would you deduplicate? You would need for at least one of the tables all variable values of interest to be identical for a specific "unique" id. Is that the case? If not then which values would you pick in order to "dedup"?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.