Hi all,
I'm trying to merge zip code and time zone into one dataset and am wondering of the best way to do it. I have two datasets - one that has a zip code variable and one that has the time zone for each zip code and the zip code associated with it. The number of observations are different between the two, but I'd like to merge them so that the time zone is put into the first dataset based on the zip code for each observation and not add more observations in the process. What is the best way to do this?
Thanks in advance!
First concept "put into" is not really a SAS data concept. Once you have a data set you can add (append) observation but once you start adding variables you are REPLACING the base data with a new set.
One basic approach could look like this with some warnings:
proc sql;
   create table want as
   select a.*,  b.timezone
   from basedata as a
        left join
        zip_and_timezonedata as b
        on a.zipcode=b.zipcode
   ;
run;
First warning: this assumes that your zipcode with timezone does not duplicate any values of zipcode. If they do then you will get duplicates. If you have duplicates it may actually be needed. I am not sure there is any rule for creating Zip codes that they must be all in a single timezone. So you may have to consider and tell us more.
Second, this assumed the variables have the same name and type (zip numeric or character in BOTH sets). If not you need to do a conversion so that one will equal the other values. Name can be different just need to use the correct name from the A or B referenced set.
Third, Zip codes used to be a simple a 5-digit value. Now you get those and Zip+4 that would look like 12345-4321 (pretty much have to be character). If you have a mix of Zip and Zip+4 you need to make sure that only the parts that will match are used for any equal comparison.
Please show example data.
First concept "put into" is not really a SAS data concept. Once you have a data set you can add (append) observation but once you start adding variables you are REPLACING the base data with a new set.
One basic approach could look like this with some warnings:
proc sql;
   create table want as
   select a.*,  b.timezone
   from basedata as a
        left join
        zip_and_timezonedata as b
        on a.zipcode=b.zipcode
   ;
run;
First warning: this assumes that your zipcode with timezone does not duplicate any values of zipcode. If they do then you will get duplicates. If you have duplicates it may actually be needed. I am not sure there is any rule for creating Zip codes that they must be all in a single timezone. So you may have to consider and tell us more.
Second, this assumed the variables have the same name and type (zip numeric or character in BOTH sets). If not you need to do a conversion so that one will equal the other values. Name can be different just need to use the correct name from the A or B referenced set.
Third, Zip codes used to be a simple a 5-digit value. Now you get those and Zip+4 that would look like 12345-4321 (pretty much have to be character). If you have a mix of Zip and Zip+4 you need to make sure that only the parts that will match are used for any equal comparison.
Thank you! I figured proc sql would be the way to go but I wasnt sure if there was a better way. Appreciate your warnings as well, I know it does not have duplicates and I converted the zip code value in the first dataset to numeric so they should work together. Thank you for the help!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
