BookmarkSubscribeRSS Feed
mkimmi
Obsidian | Level 7

Not sure if "merge" is the function I ultimately want, but it's what I tried....and flopped lol

 

So I have two data sets:

1. Base (goes on much longer than this)

ABCZIPCODEYEARD
abc123452015j
def110112015k
ghi111022013l
chcsfm110112015kk

2. References: 

ZYZIPCODEYEAR
mp123452015
nq110112015
or111022013
st903842013
uv100902016

 

Both of them have say two variables in common: Name is correct, format is the same, information is the same. (Zipcode and Year)

 

For each observation in Base dataset, I want to look up on the References table and find the matching Zipcode and Year. If and only if there is a match, then add the variables from References table to the Base dataset:

ABCZIPCODEYEARDZY
abc123452015jmp
def110112015knq
ghi111022013lor
chcsfm110112015kkor

 

However, there are

(1) repeated Zipcode and Year in Base dataset (11011, 2015) even though other variables have different information, so I want to be able to append the information while retaining that information in the References dataset for later observations that have the same zipcode/year.

and

(2) some Zipcode and Year in Reference dataset may not be present in Base dataset (e.g. 90384, 2013 and 10090, 2016), so I don't want these to merge into Base dataset with empty cells for other variables (since there are no observations in Base dataset with said zip code and year).

These are two issues I've been running into, and not sure how to solve...

 

Thanks in advance!!

4 REPLIES 4
Shmuel
Garnet | Level 18

You mentioned "Both of them have say two variables in common:". The only common variables I see are zipcode and year.

You mentioned "name is correct" - there is no variable named NAME.

 

In general you can use sql to join data from the two tables:

proc sql;
   create table want as
   select b.*   /* all BASE variables */
             ,r.<var1>
             .r.<var2>
            /* add as many REFERENCE variables as need */
   from BASE as b
   left join REFERENCE as r
   on (b.<common var-1> = r.<common var-1> and
         b.<common var-2> = r.<common var-2>
       /* compete any more conditions if need */
       )
      sort by b.<common var-1> , b.<common var-2>;
quit;
mkimmi
Obsidian | Level 7
Quick clarification: "Both of them" -- both datasets of two common variables, whose names are correct -- as in ZIPCODE is ZIPCODE and YEAR is YEAR in both datasets

Thanks so much--will try this!! I'm finally getting a hang of proc sql and macros
mkimmi
Obsidian | Level 7
I'm not sure why, but my BASE file is still the same when I proc print it, there are no new variables attached....
Shmuel
Garnet | Level 18

Please post the exact code you run, or even better the full log of the run showing the code and the messages.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1124 views
  • 0 likes
  • 2 in conversation