BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dr2014
Quartz | Level 8

Hi,

I am fairly new to data cleaning. I want to update the missing values in the master file for a variable (county code) with values from another dataset. In both files , a corresponding zip code has a corresponding county code. I do not have a unique identifier for the files , so I cannot merge them. The master file has a unique identifier and has about 85,000 unique records for a state . There could be multiple records in the master file who have the same zip code and same county. I want to replace the " missing county value" for a "non-missing zip code". The transaction file has a corresponding county for each unique zip code. Two different zip code could lie in the same county. I need your suggestions for the code to help me do this. Please advice.

Thanks,

DR

1 ACCEPTED SOLUTION

Accepted Solutions
dr2014
Quartz | Level 8

Thanks for the suggestion Scott. I am going to try this. Also, I should have mentioned this initially, my master and transaction datasets are SAS datasets. The zip code and county code are not labeled sequentially. But, I will just rename them and use your code.

View solution in original post

4 REPLIES 4
pronabesh
Fluorite | Level 6

Can you share part of the two datafiles and an example of how the final output should look like?

dr2014
Quartz | Level 8

Hi , Thanks for your reply. I have attached an excel sheet with a example of the layout of master file. transaction file and output file. The master file is a huge data set with unique ID. It has three different fields each for zip code and county labeled zip1 and county1, zip2 and county2 and zip3 and county3 for three different addresses for the individual. The transaction file has information for the county corresponding to a respective zip code. The transaction file is smaller than the master file. It has two fields zip and county. With the help of the transaction file , I want to assign a county value to missing county1, county2 county3 fields in the master file. Each of the fields zip1, zip2, zip3 and zip have $3 format and county1, county2, county3 and county field have a $5 format. I hope my explanation helps. Please let me know if you have any questions.

Scott_Mitchell
Quartz | Level 8

How about a HASH TABLE?

DATA MASTER;

INFILE DATALINES DLM=",";

INPUT ID $ ZIP1 COUNTY1 ZIP2 COUNTY2 ZIP3 COUNTY3;

DATALINES;

ER001,30345,145,36751,.,39768,165

ER002,30546,162,37654,.,38765,177

ER003,30357,176,39768,165,35505,154

ER004,30345,.,38765,177,35674,122

ER005,30578,189,35505,154,35555,134

ER006,39820,089,35674,122,34545,59

ER007,39145,121,35555,134,36751,

ER008,36102,063,34545,059,35654,

ER009,36102,063,34545,059,35654,

;

RUN;

DATA TRANSACTION;

INFILE DATALINES DLM="," MISSOVER;

INPUT ZIP COUNTY;

DATALINES;

30345,244

36751,255

37654,233

30345,145

30546,162

30357,176

30578,189

39820,089

39145,121

36102,063

39768,165

38765,177

35505,154

35674,122

35555,134

34545,059

35654,192

;

RUN;

DATA WANT;

IF 0 THEN SET TRANSACTION;;

IF _N_ = 1 THEN DO;

DECLARE HASH H(DATASET:"TRANSACTION");

H.DEFINEKEY("ZIP");

H.DEFINEDATA("ZIP","COUNTY");

H.DEFINEDONE();

END;

DO UNTIL (EOF);

SET MASTER END=EOF;

ARRAY _COUNTY  {*} COUNTY1-COUNTY3;

ARRAY _ZIP     {*} ZIP1-ZIP3;

DO I = 1 TO 3;

IF _COUNTY{I} = . THEN DO;

  RC = H.FIND(KEY:_ZIP{I});

  _COUNTY{I} = COUNTY;

END;

END;

OUTPUT;

END;

RUN;

dr2014
Quartz | Level 8

Thanks for the suggestion Scott. I am going to try this. Also, I should have mentioned this initially, my master and transaction datasets are SAS datasets. The zip code and county code are not labeled sequentially. But, I will just rename them and use your code.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2304 views
  • 0 likes
  • 3 in conversation