Solved: Re: Converting Data Using Another Data Set

Bennettr99 · Posted 01-25-2018 09:55 PM

I am new to SAS and I using SAS 9.4. I am trying to convert data using data from a different data set. The first data set is called data and contains a variable for the first 3 digits of a zip code plus 14 other variables. The second data is called zip and contains the first three digits of a zipcodes along with the corresponding county and city names. For example:

Data Zip

zip_3 (variable name) zip_3 County State

005 005 Holtsville NY

005 006 Adjuntas PR

006 006 Aquada PR

I need to find some way to merge the data so that the county and state name are in the same observation as the matching zipcodes and if there are any other counties it could be, it is also in the observation. Example:

Data

zip_3 County State County State

005 Holtsville NY

006 Adjuntas PR Aquada PR

I've tried merging the data, but I have had no luck. I haven't used SAS much outside of merging data with identical variables, so I am at a loss.

Tom · Posted 01-25-2018 11:38 PM

There are two problems. First you are trying to do a many to many merge. There can be more than one loan per ZIP3 and there are more than one "lookup" value per ZIP3. The second problem is you want transpose you many rows of detailed zipcode into a single observation. If you solve the second one first then the first one disappears.

First convert your ZIP dataset to have one observation per ZIP3 value. For example using PROC TRANSPOSE.

proc transpose data=zip out=zip_name prefix=name ;
  by zip_3 ;
  var name ;
run;
proc tranpose data=zip out=zip_code prefix=zipcode ;
  by zip_3;
  var zipcode ;
run;
data zip_single;
  merge zip_name zip_code ;
  by zip_3;
run;

Then you can merge this with your loan data and you will be doing a many to one merge.

data want ;
   merge loan_data (in=in1) zip_single ;
   by zip_3;
   if in1 ;
run;

View solution in original post

Reeza · Posted 01-25-2018 10:05 PM

Your approach of merging is correct, can you explain how it didn't work?

Feel free to include your exact code and error messages.

Otherwise, the approach I'd recommend would be either a merge (data step or SQL) and/or a format. I'm partial to formats because I find them reusable versus merges and keep a library of formats that I can commonly pull from when needed.

@Bennettr99 wrote:

I am new to SAS and I using SAS 9.4. I am trying to convert data using data from a different data set. The first data set is called data and contains a variable for the first 3 digits of a zip code plus 14 other variables. The second data is called zip and contains the first three digits of a zipcodes along with the corresponding county and city names. For example:

Data Zip

zip_3 (variable name) zip_3 County State

005 005 Holtsville NY

005 006 Adjuntas PR

006 006 Aquada PR

I need to find some way to merge the data so that the county and state name are in the same observation as the matching zipcodes and if there are any other counties it could be, it is also in the observation. Example:

Data

zip_3 County State County State

005 Holtsville NY

005 Holtsville NY

006 Adjuntas PR Aquada PR

I've tried merging the data, but I have had no luck. I haven't used SAS much outside of merging data with identical variables, so I am at a loss.

Bennettr99 · Posted 01-25-2018 11:02 PM

I just realized I made a mistake with the merging and I should have been more specific about my problem. This was the code I was using.

DATA home.merge;
MERGE home.data home.zip;
BY zip_3;
RUN;

This is the result (with some variables dropped):

The loan_id is from home.data. I need to merge the data without duplicating the data from home.data like it does in this data. I need to find a way to have one ID correspond to multiple Counties or States if the 3 digits of the zip code correspond to multiple zip codes. I also need to remove any observation that does not include data from home.data.

So if the zip_3 in home.data is 025, all of the five digit zip codes and county names would be listed in the same observation.

Tom · Posted 01-25-2018 11:38 PM

There are two problems. First you are trying to do a many to many merge. There can be more than one loan per ZIP3 and there are more than one "lookup" value per ZIP3. The second problem is you want transpose you many rows of detailed zipcode into a single observation. If you solve the second one first then the first one disappears.

First convert your ZIP dataset to have one observation per ZIP3 value. For example using PROC TRANSPOSE.

proc transpose data=zip out=zip_name prefix=name ;
  by zip_3 ;
  var name ;
run;
proc tranpose data=zip out=zip_code prefix=zipcode ;
  by zip_3;
  var zipcode ;
run;
data zip_single;
  merge zip_name zip_code ;
  by zip_3;
run;

Then you can merge this with your loan data and you will be doing a many to one merge.

data want ;
   merge loan_data (in=in1) zip_single ;
   by zip_3;
   if in1 ;
run;

Bennettr99 · Posted 01-26-2018 12:31 AM

This works! Thanks for the help.

ballardw · Posted 01-26-2018 12:14 PM

You may also want to be aware that in many parts of the US the first three digits of a Zip code will not determine county.

For instance in my state we have 44 counties and only 7 different values of first three digits in the Zip, the county counts ranging from 1 to 11 per 3-digit code.

Reeza · Posted 01-25-2018 10:06 PM

And a small FYI, though not helpful if you only have the 3 digit zip, SAS has a bunch of functions that convert ZIP codes to state names and such. http://documentation.sas.com/?docsetId=lefunctionsref&docsetTarget=p1en5dzvubdabmn1czrqdcq9ozv3.htm&...

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away