Solved: Re: Update the missing variables

mspak · Posted 09-15-2015 11:14 AM

Dear all,

The attached dataset contains the missing "state" and "stateab". These two variables can be matched with the "STCOD".

Could anyone suggest a program that can help me to update the missing variables?

Thank you.

MSPAK

hbi · Posted 09-15-2015 03:25 PM

Hi,

Your dataset is pretty small, so a PROC SQL update statement would work nicely.

The first step involves creating a lookup table; I included a frequency column (informational only and not needed). But notice that for our nation's capital, there are two spelling variations ("District of Columbia" and "District Of Columbia"). The PROC SORT gets rid of the duplicate.

Next, a series of two PROC SQL update statement updates rows for which state is missing and then updates rows for which stateab is missing. Since your dataset is relatively small, an "in-place" PROC SQL update works just fine. If your dataset were much larger with millions of observations, then a PROC SQL update may be suboptimal.

Enjoy!

/* first, create a lookup table */
PROC SQL;
  CREATE TABLE distinct_states AS 
  SELECT STCOD, state, stateab, COUNT(*) as state_frequency
  FROM full
  WHERE NOT MISSING(state)
  GROUP BY STCOD, state, stateab
  ORDER BY 2, 1, 4 DESC;
QUIT;


/* note that your dataset has District of Columbia spelled two different 
   ways, thus get rid of one of them */
PROC SORT DATA=distinct_states nodupkey;
  BY STCOD;
RUN;


PROC SQL;
/* only update where state is missing */
  UPDATE full
  SET state = (SELECT MAX(state)
               FROM distinct_states
               WHERE full.STCOD = distinct_states.STCOD)
  WHERE MISSING(state);
QUIT;


PROC SQL;
/* only update where abbreviated state is missing */
  UPDATE full
  SET stateab = (SELECT MAX(stateab)
                 FROM distinct_states
                 WHERE full.STCOD = distinct_states.STCOD)
  WHERE MISSING(stateab);
QUIT;

View solution in original post

Reeza · Posted 09-15-2015 11:54 AM

Have you tried a merge?

mspak · Posted 09-16-2015 12:06 AM

Dear Reeza,

Yes, I tried. Thank you.

MikeZdeb · Posted 09-15-2015 12:17 PM

Hi ... you can use functions ...

data fixed;
set z.full;
if missing(state) then do;
state = fipname(stcod);
stateab = fipstate(stcod);
end;
run;

Also, the variable COUNTY has three different forms in the data set: just the county name, county + state, county name followed by the word "County". . Why not make it consistent, just the county name ...

data fixed;
set z.full;
if missing(state) then do;
state = fipname(stcod);
stateab = fipstate(stcod);
end;
* get rid of the STATE, get rid of text 'COUNTY';
if find(county,',') then county=scan(county,1,',');
else
if findw(county,'County') then county = tranwrd(county,'County',' ');
run;

hbi · Posted 09-15-2015 03:25 PM

Hi,

Your dataset is pretty small, so a PROC SQL update statement would work nicely.

The first step involves creating a lookup table; I included a frequency column (informational only and not needed). But notice that for our nation's capital, there are two spelling variations ("District of Columbia" and "District Of Columbia"). The PROC SORT gets rid of the duplicate.

Next, a series of two PROC SQL update statement updates rows for which state is missing and then updates rows for which stateab is missing. Since your dataset is relatively small, an "in-place" PROC SQL update works just fine. If your dataset were much larger with millions of observations, then a PROC SQL update may be suboptimal.

Enjoy!

/* first, create a lookup table */
PROC SQL;
  CREATE TABLE distinct_states AS 
  SELECT STCOD, state, stateab, COUNT(*) as state_frequency
  FROM full
  WHERE NOT MISSING(state)
  GROUP BY STCOD, state, stateab
  ORDER BY 2, 1, 4 DESC;
QUIT;


/* note that your dataset has District of Columbia spelled two different 
   ways, thus get rid of one of them */
PROC SORT DATA=distinct_states nodupkey;
  BY STCOD;
RUN;


PROC SQL;
/* only update where state is missing */
  UPDATE full
  SET state = (SELECT MAX(state)
               FROM distinct_states
               WHERE full.STCOD = distinct_states.STCOD)
  WHERE MISSING(state);
QUIT;


PROC SQL;
/* only update where abbreviated state is missing */
  UPDATE full
  SET stateab = (SELECT MAX(stateab)
                 FROM distinct_states
                 WHERE full.STCOD = distinct_states.STCOD)
  WHERE MISSING(stateab);
QUIT;

MikeZdeb · Posted 09-15-2015 03:48 PM

Hi. There are functions to convert state codes (fips numbers) to state names and two character abbreviations. Every observation in the data set has a state code.

mspak · Posted 09-16-2015 12:10 AM

Dear MikeZded,

Did you mean that SAS system contains a database of state codes, with their names and abbreviation? May I know which SAS function can perform this procedure?

Thank you.

Regards,

MSPAK

mspak · Posted 09-16-2015 12:13 AM

Hi again,

Thanks. I read your suggested program. I think this only works for U.S. data. Do you think this is applicable to other countries as well?

Thank you very much.

MSPAK

mspak · Posted 09-16-2015 12:19 AM

Thanks MikeZded and hbi for your programs. Both are the correct solution.

MikeZdeb · Posted 09-16-2015 09:23 AM

Correct, the various functions that deal with state and zip codes are US only. There are a lot of such functions (see below from SAS online HELP). Also the SQL solution for US state codes worked fine, but given that there are two functions that did the same thing,in two statements, it seems like a lot of code for US-based data. However , it would be a solution for an instance when functions would not work, e.g. non-US data.

State and Zip Code Functions

FIPNAME

converts FIPS codes to uppercase state names

FIPNAMEL

converts FIPS codes to mixed-case state names

FIPSTATE
converts FIPS codes to two-character postal codes

GEODISTANCE
returns the geodetic distance between two latitude and longitude coordinates

STFIPS
converts state postal codes to FIPS state codes

STNAME
converts state postal codes to uppercase state names

STNAMEL
converts state postal codes to mixed-case state names

ZIPCITY
returns a city name and the two-character postal code that corresponds to a zip code

ZIPCITYDISTANCE
returns the geodetic distance between two zip code locations

ZIPFIPS
converts zip codes to FIPS state codes

ZIPNAME
converts zip codes to uppercase state names

ZIPNAMEL
converts zip codes to mixed-case state names

ZIPSTATE
converts zip codes to state postal codes

Classroom Training Available!