Hello. I am trying to do a left join using proc sql. I have two datasets, Data1 and Data2. I want to merge or join into Data1, both datasets are large, I included sample data below. When I use the code below, I get blank fields for Data2 variables.
PROC SORT DATA=Data2; BY COUNTY_2 MUN_2 ; RUN;
PROC SORT DATA=Data1; BY COUNTY MUNICIPALITY ; RUN;
PROC SQL;
CREATE TABLE finaldata AS
SELECT * FROM data1 AS x LEFT JOIN data2 AS y
ON x.County = y.County_2 and x.Municipality= y.Mun_2;
QUIT;
EXPECTED OUTPUT
UniqID | Municipality | County | STATE | Mun_2 | County_2 | STATE_2 | Total |
1 | ATLANTIC CITY | ATLANTIC | NJ | ATLANTIC CITY | ATLANTIC | NJ | 10 |
2 | FORT LEE BOROUGH | BERGEN | NJ | FORT LEE BOROUGH | BERGEN | NJ | 20 |
3 | ATLANTIC CITY | ATLANTIC | NJ | ATLANTIC CITY | ATLANTIC | NJ | 10 |
4 | CAMDEN CITY | CAMDEN | NJ | . | . | . | . |
5 | . | UNKNOWN | NJ | . | . | . | . |
6 | . | MERCER | NJ | . | . | . | . |
DATA 2
Mun_2 | County_2 | STATE_2 | Total |
ATLANTIC CITY | ATLANTIC | NJ | 10 |
FORT LEE BOROUGH | BERGEN | NJ | 20 |
TRENTON CITY | MERCER | NJ | 30 |
DATA 1
UniqID | Municipality | County | STATE |
1 | ATLANTIC CITY | ATLANTIC | NJ |
2 | FORT LEE BOROUGH | BERGEN | NJ |
3 | ATLANTIC CITY | ATLANTIC | NJ |
4 | CAMDEN CITY | CAMDEN | NJ |
5 | . | UNKNOWN | NJ |
6 | . | MERCER | NJ |
You have not told us what the desired output is.
Expected Output=Desired Output has already been included. Thanks!
Oops, my apologies, I scrolled down to the data and didn't see the Expected Output. I'll try to have answer in a few minutes.
Ok, please provide data as SAS data step code, as you did here: https://communities.sas.com/t5/SAS-Programming/How-to-assign-a-unique-ID-number-to-each-row-with-ide...
DATA data1;
INPUT UniqID 1-2 Municipality $3-20 County $21-30 State $31-33 ;
DATALINES;
1 ATLANTIC CITY ATLANTIC NJ
2 FORT LEE BOROUGH BERGEN NJ
3 ATLANTIC CITY ATLANTIC NJ
4 CAMDEN CITY CAMDEN NJ
5 . UNKNOWN NJ
6 . MERCER NJ
run;
DATA data2;
INPUT Mun_2 $1-18 County_2 $19-27 State_2 $28-31 Total 32-35;
DATALINES;
ATLANTIC CITY ATLANTIC NJ 10
FORT LEE BOROUGH BERGEN NJ 20
TRENTON CITY MERCER NJ 30
run;
Ok, I'm sorry, but right now I just don't have the time to fix your code so that it works. I'm happy to help, but you have to provide working code that produces the proper data sets.
What is the problem? Other than not including STATE in join criteria you code looks fine and works for the data as given.
data data1;
infile cards dsd dlm='|' truncover;
input UniqID Municipality :$30. County :$20. STATE :$2. ;
cards;
1|ATLANTIC CITY| ATLANTIC|NJ
2|FORT LEE BOROUGH| BERGEN|NJ
3|ATLANTIC CITY| ATLANTIC|NJ
4|CAMDEN CITY| CAMDEN|NJ
5|.|UNKNOWN|NJ
6|.|MERCER|NJ
;
data data2 ;
infile cards dsd dlm='|' truncover;
input Mun_2 :$30. County_2 :$20. STATE_2 :$2. Total ;
cards;
ATLANTIC CITY| ATLANTIC|NJ|10
FORT LEE BOROUGH| BERGEN|NJ|20
TRENTON CITY|MERCER|NJ|30
;
proc sql;
create table want as
select *
from data1 left join data2
on data1.Municipality = data2.mun_2
and data1.county=data2.county_2
and data1.state=data2.state_2
order by data1.uniqid
;
quit;
proc print;
run;
Thanks! I had been trying to figure out the reason, too. After looking at the raw data (not the sample data provided), I found an extra space before counties. After removing the space, the problem is solved. Sorry for taking your time.
One quick way to find cases of intermittent values having a leading space is to run Proc Freq. The output table will appear to have multiple rows with the same value because the table output will left justify all the text. So if you see a repeated value that is a likely cause.
Compare the output of proc freq on these two data steps:
Data example; length word $ 10.; word='word';output; word=' word';output; word=' word';output; run; proc freq data=example; run; data example2; length word $ 10.; word='word';output; word='word';output; word='word';output; run; proc freq data=example2; run;
If you send the results to ODS Listing the values with the leading spaces will show the leading spaces as well as showing each count on a separate row.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.