SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Handling misssing values

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 124
Accepted Solution

Handling misssing values

Hi,

I am fairly new to data cleaning. I want to update the missing values in the master file for a variable (county code) with values from another dataset. In both files , a corresponding zip code has a corresponding county code. I do not have a unique identifier for the files , so I cannot merge them. The master file has a unique identifier and has about 85,000 unique records for a state . There could be multiple records in the master file who have the same zip code and same county. I want to replace the " missing county value" for a "non-missing zip code". The transaction file has a corresponding county for each unique zip code. Two different zip code could lie in the same county. I need your suggestions for the code to help me do this. Please advice.

Thanks,

DR


Accepted Solutions
Solution
‎07-24-2014 04:15 PM
Frequent Contributor
Posts: 124

Re: Handling misssing values

Posted in reply to Scott_Mitchell

Thanks for the suggestion Scott. I am going to try this. Also, I should have mentioned this initially, my master and transaction datasets are SAS datasets. The zip code and county code are not labeled sequentially. But, I will just rename them and use your code.

View solution in original post


All Replies
Contributor
Posts: 58

Re: Handling misssing values

Can you share part of the two datafiles and an example of how the final output should look like?

Frequent Contributor
Posts: 124

Re: Handling misssing values

Posted in reply to pronabesh

Hi , Thanks for your reply. I have attached an excel sheet with a example of the layout of master file. transaction file and output file. The master file is a huge data set with unique ID. It has three different fields each for zip code and county labeled zip1 and county1, zip2 and county2 and zip3 and county3 for three different addresses for the individual. The transaction file has information for the county corresponding to a respective zip code. The transaction file is smaller than the master file. It has two fields zip and county. With the help of the transaction file , I want to assign a county value to missing county1, county2 county3 fields in the master file. Each of the fields zip1, zip2, zip3 and zip have $3 format and county1, county2, county3 and county field have a $5 format. I hope my explanation helps. Please let me know if you have any questions.

Super Contributor
Posts: 297

Re: Handling misssing values

How about a HASH TABLE?

DATA MASTER;

INFILE DATALINES DLM=",";

INPUT ID $ ZIP1 COUNTY1 ZIP2 COUNTY2 ZIP3 COUNTY3;

DATALINES;

ER001,30345,145,36751,.,39768,165

ER002,30546,162,37654,.,38765,177

ER003,30357,176,39768,165,35505,154

ER004,30345,.,38765,177,35674,122

ER005,30578,189,35505,154,35555,134

ER006,39820,089,35674,122,34545,59

ER007,39145,121,35555,134,36751,

ER008,36102,063,34545,059,35654,

ER009,36102,063,34545,059,35654,

;

RUN;

DATA TRANSACTION;

INFILE DATALINES DLM="," MISSOVER;

INPUT ZIP COUNTY;

DATALINES;

30345,244

36751,255

37654,233

30345,145

30546,162

30357,176

30578,189

39820,089

39145,121

36102,063

39768,165

38765,177

35505,154

35674,122

35555,134

34545,059

35654,192

;

RUN;

DATA WANT;

IF 0 THEN SET TRANSACTION;;

IF _N_ = 1 THEN DO;

DECLARE HASH H(DATASET:"TRANSACTION");

H.DEFINEKEY("ZIP");

H.DEFINEDATA("ZIP","COUNTY");

H.DEFINEDONE();

END;

DO UNTIL (EOF);

SET MASTER END=EOF;

ARRAY _COUNTY  {*} COUNTY1-COUNTY3;

ARRAY _ZIP     {*} ZIP1-ZIP3;

DO I = 1 TO 3;

IF _COUNTY{I} = . THEN DO;

  RC = H.FIND(KEY:_ZIP{I});

  _COUNTY{I} = COUNTY;

END;

END;

OUTPUT;

END;

RUN;

Solution
‎07-24-2014 04:15 PM
Frequent Contributor
Posts: 124

Re: Handling misssing values

Posted in reply to Scott_Mitchell

Thanks for the suggestion Scott. I am going to try this. Also, I should have mentioned this initially, my master and transaction datasets are SAS datasets. The zip code and county code are not labeled sequentially. But, I will just rename them and use your code.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 442 views
  • 0 likes
  • 3 in conversation