Not sure if "merge" is the function I ultimately want, but it's what I tried....and flopped lol
So I have two data sets:
1. Base (goes on much longer than this)
A | B | C | ZIPCODE | YEAR | D |
a | b | c | 12345 | 2015 | j |
d | e | f | 11011 | 2015 | k |
g | h | i | 11102 | 2013 | l |
ch | cs | fm | 11011 | 2015 | kk |
2. References:
Z | Y | ZIPCODE | YEAR |
m | p | 12345 | 2015 |
n | q | 11011 | 2015 |
o | r | 11102 | 2013 |
s | t | 90384 | 2013 |
u | v | 10090 | 2016 |
Both of them have say two variables in common: Name is correct, format is the same, information is the same. (Zipcode and Year)
For each observation in Base dataset, I want to look up on the References table and find the matching Zipcode and Year. If and only if there is a match, then add the variables from References table to the Base dataset:
A | B | C | ZIPCODE | YEAR | D | Z | Y |
a | b | c | 12345 | 2015 | j | m | p |
d | e | f | 11011 | 2015 | k | n | q |
g | h | i | 11102 | 2013 | l | o | r |
ch | cs | fm | 11011 | 2015 | kk | o | r |
However, there are
(1) repeated Zipcode and Year in Base dataset (11011, 2015) even though other variables have different information, so I want to be able to append the information while retaining that information in the References dataset for later observations that have the same zipcode/year.
and
(2) some Zipcode and Year in Reference dataset may not be present in Base dataset (e.g. 90384, 2013 and 10090, 2016), so I don't want these to merge into Base dataset with empty cells for other variables (since there are no observations in Base dataset with said zip code and year).
These are two issues I've been running into, and not sure how to solve...
Thanks in advance!!
You mentioned "Both of them have say two variables in common:". The only common variables I see are zipcode and year.
You mentioned "name is correct" - there is no variable named NAME.
In general you can use sql to join data from the two tables:
proc sql;
create table want as
select b.* /* all BASE variables */
,r.<var1>
.r.<var2>
/* add as many REFERENCE variables as need */
from BASE as b
left join REFERENCE as r
on (b.<common var-1> = r.<common var-1> and
b.<common var-2> = r.<common var-2>
/* compete any more conditions if need */
)
sort by b.<common var-1> , b.<common var-2>;
quit;
Please post the exact code you run, or even better the full log of the run showing the code and the messages.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.