I am new to SAS and I using SAS 9.4. I am trying to convert data using data from a different data set. The first data set is called data and contains a variable for the first 3 digits of a zip code plus 14 other variables. The second data is called zip and contains the first three digits of a zipcodes along with the corresponding county and city names. For example:
Data Zip
zip_3 (variable name) zip_3 County State
005 005 Holtsville NY
005 006 Adjuntas PR
006 006 Aquada PR
I need to find some way to merge the data so that the county and state name are in the same observation as the matching zipcodes and if there are any other counties it could be, it is also in the observation. Example:
Data
zip_3 County State County State
005 Holtsville NY
005 Holtsville NY
006 Adjuntas PR Aquada PR
I've tried merging the data, but I have had no luck. I haven't used SAS much outside of merging data with identical variables, so I am at a loss.
There are two problems. First you are trying to do a many to many merge. There can be more than one loan per ZIP3 and there are more than one "lookup" value per ZIP3. The second problem is you want transpose you many rows of detailed zipcode into a single observation. If you solve the second one first then the first one disappears.
First convert your ZIP dataset to have one observation per ZIP3 value. For example using PROC TRANSPOSE.
proc transpose data=zip out=zip_name prefix=name ;
by zip_3 ;
var name ;
run;
proc tranpose data=zip out=zip_code prefix=zipcode ;
by zip_3;
var zipcode ;
run;
data zip_single;
merge zip_name zip_code ;
by zip_3;
run;
Then you can merge this with your loan data and you will be doing a many to one merge.
data want ;
merge loan_data (in=in1) zip_single ;
by zip_3;
if in1 ;
run;
Your approach of merging is correct, can you explain how it didn't work?
Feel free to include your exact code and error messages.
Otherwise, the approach I'd recommend would be either a merge (data step or SQL) and/or a format. I'm partial to formats because I find them reusable versus merges and keep a library of formats that I can commonly pull from when needed.
@Bennettr99 wrote:
I am new to SAS and I using SAS 9.4. I am trying to convert data using data from a different data set. The first data set is called data and contains a variable for the first 3 digits of a zip code plus 14 other variables. The second data is called zip and contains the first three digits of a zipcodes along with the corresponding county and city names. For example:
Data Zip
zip_3 (variable name) zip_3 County State
005 005 Holtsville NY
005 006 Adjuntas PR
006 006 Aquada PR
I need to find some way to merge the data so that the county and state name are in the same observation as the matching zipcodes and if there are any other counties it could be, it is also in the observation. Example:
Data
zip_3 County State County State
005 Holtsville NY
005 Holtsville NY
006 Adjuntas PR Aquada PR
I've tried merging the data, but I have had no luck. I haven't used SAS much outside of merging data with identical variables, so I am at a loss.
I just realized I made a mistake with the merging and I should have been more specific about my problem. This was the code I was using.
DATA home.merge;
MERGE home.data home.zip;
BY zip_3;
RUN;
This is the result (with some variables dropped):
The loan_id is from home.data. I need to merge the data without duplicating the data from home.data like it does in this data. I need to find a way to have one ID correspond to multiple Counties or States if the 3 digits of the zip code correspond to multiple zip codes. I also need to remove any observation that does not include data from home.data.
So if the zip_3 in home.data is 025, all of the five digit zip codes and county names would be listed in the same observation.
There are two problems. First you are trying to do a many to many merge. There can be more than one loan per ZIP3 and there are more than one "lookup" value per ZIP3. The second problem is you want transpose you many rows of detailed zipcode into a single observation. If you solve the second one first then the first one disappears.
First convert your ZIP dataset to have one observation per ZIP3 value. For example using PROC TRANSPOSE.
proc transpose data=zip out=zip_name prefix=name ;
by zip_3 ;
var name ;
run;
proc tranpose data=zip out=zip_code prefix=zipcode ;
by zip_3;
var zipcode ;
run;
data zip_single;
merge zip_name zip_code ;
by zip_3;
run;
Then you can merge this with your loan data and you will be doing a many to one merge.
data want ;
merge loan_data (in=in1) zip_single ;
by zip_3;
if in1 ;
run;
This works! Thanks for the help.
You may also want to be aware that in many parts of the US the first three digits of a Zip code will not determine county.
For instance in my state we have 44 counties and only 7 different values of first three digits in the Zip, the county counts ranging from 1 to 11 per 3-digit code.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.