BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mspak
Quartz | Level 8

Dear all,

 

The attached dataset contains the missing "state" and "stateab". These two variables can be matched with the "STCOD".

 

Could anyone suggest a program that can help me to update the missing variables?

 

Thank you.

 

MSPAK

1 ACCEPTED SOLUTION

Accepted Solutions
hbi
Quartz | Level 8 hbi
Quartz | Level 8

Hi,

 

Your dataset is pretty small, so a PROC SQL update statement would work nicely.

 

The first step involves creating a lookup table; I included a frequency column (informational only and not needed). But notice that for our nation's capital, there are two spelling variations ("District of Columbia" and "District Of Columbia"). The PROC SORT gets rid of the duplicate.

 

Next, a series of two PROC SQL update statement updates rows for which state is missing and then updates rows for which stateab is missing. Since your dataset is relatively small, an "in-place" PROC SQL update works just fine. If your dataset were much larger with millions of observations, then a PROC SQL update may be suboptimal.

 

Enjoy!

 

 

/* first, create a lookup table */
PROC SQL;
  CREATE TABLE distinct_states AS 
  SELECT STCOD, state, stateab, COUNT(*) as state_frequency
  FROM full
  WHERE NOT MISSING(state)
  GROUP BY STCOD, state, stateab
  ORDER BY 2, 1, 4 DESC;
QUIT;


/* note that your dataset has District of Columbia spelled two different 
   ways, thus get rid of one of them */
PROC SORT DATA=distinct_states nodupkey;
  BY STCOD;
RUN;


PROC SQL;
/* only update where state is missing */
  UPDATE full
  SET state = (SELECT MAX(state)
               FROM distinct_states
               WHERE full.STCOD = distinct_states.STCOD)
  WHERE MISSING(state);
QUIT;


PROC SQL;
/* only update where abbreviated state is missing */
  UPDATE full
  SET stateab = (SELECT MAX(stateab)
                 FROM distinct_states
                 WHERE full.STCOD = distinct_states.STCOD)
  WHERE MISSING(stateab);
QUIT;

 

View solution in original post

9 REPLIES 9
Reeza
Super User
Have you tried a merge?
mspak
Quartz | Level 8

Dear Reeza,

 

Yes, I tried. Thank you.

MikeZdeb
Rhodochrosite | Level 12

Hi ... you can use functions ...

 

data fixed;
set z.full;
if missing(state) then do;
   state = fipname(stcod);
   stateab = fipstate(stcod);
end;
run;

 

Also, the variable COUNTY has three different forms in the data set:  just the county name, county + state, county name followed by the word "County".  . Why not make it consistent, just the county name ...

 


data fixed;
set z.full;
if missing(state) then do;
   state = fipname(stcod);
   stateab = fipstate(stcod);
end;
* get rid of the STATE, get rid of text 'COUNTY';
if find(county,',') then county=scan(county,1,',');
else
if findw(county,'County') then county = tranwrd(county,'County',' ');
run;

hbi
Quartz | Level 8 hbi
Quartz | Level 8

Hi,

 

Your dataset is pretty small, so a PROC SQL update statement would work nicely.

 

The first step involves creating a lookup table; I included a frequency column (informational only and not needed). But notice that for our nation's capital, there are two spelling variations ("District of Columbia" and "District Of Columbia"). The PROC SORT gets rid of the duplicate.

 

Next, a series of two PROC SQL update statement updates rows for which state is missing and then updates rows for which stateab is missing. Since your dataset is relatively small, an "in-place" PROC SQL update works just fine. If your dataset were much larger with millions of observations, then a PROC SQL update may be suboptimal.

 

Enjoy!

 

 

/* first, create a lookup table */
PROC SQL;
  CREATE TABLE distinct_states AS 
  SELECT STCOD, state, stateab, COUNT(*) as state_frequency
  FROM full
  WHERE NOT MISSING(state)
  GROUP BY STCOD, state, stateab
  ORDER BY 2, 1, 4 DESC;
QUIT;


/* note that your dataset has District of Columbia spelled two different 
   ways, thus get rid of one of them */
PROC SORT DATA=distinct_states nodupkey;
  BY STCOD;
RUN;


PROC SQL;
/* only update where state is missing */
  UPDATE full
  SET state = (SELECT MAX(state)
               FROM distinct_states
               WHERE full.STCOD = distinct_states.STCOD)
  WHERE MISSING(state);
QUIT;


PROC SQL;
/* only update where abbreviated state is missing */
  UPDATE full
  SET stateab = (SELECT MAX(stateab)
                 FROM distinct_states
                 WHERE full.STCOD = distinct_states.STCOD)
  WHERE MISSING(stateab);
QUIT;

 

MikeZdeb
Rhodochrosite | Level 12

Hi.  There are functions to convert state codes (fips numbers) to state names and two character abbreviations.  Every observation in the data set has a state code.

mspak
Quartz | Level 8

Dear MikeZded,

 

Did you mean that SAS system contains a database of state codes, with their names and abbreviation? May I know which SAS function can perform this procedure?

 

Thank you.

 

Regards,

MSPAK

mspak
Quartz | Level 8

Hi again,

 

Thanks. I read your suggested program. I think this only works for U.S. data. Do you think this is applicable to other countries as well?

 

Thank you very much.

 

MSPAK

mspak
Quartz | Level 8
Thanks MikeZded and hbi for your programs. Both are the correct solution.
MikeZdeb
Rhodochrosite | Level 12

Correct, the various functions that deal with state and zip codes are US only.  There are a lot of such functions (see below from SAS online HELP).  Also the SQL solution for US state codes worked fine, but given that there are two functions that did the same thing,in two statements, it seems like a lot of code for US-based data.  However , it would be a solution for an instance when functions would not work, e.g. non-US data.

 


State and Zip Code Functions

 

FIPNAME 

converts FIPS codes to uppercase state names

FIPNAMEL 

converts FIPS codes to mixed-case state names

FIPSTATE
converts FIPS codes to two-character postal codes

GEODISTANCE
returns the geodetic distance between two latitude and longitude coordinates

STFIPS
converts state postal codes to FIPS state codes

STNAME
converts state postal codes to uppercase state names

STNAMEL
converts state postal codes to mixed-case state names

ZIPCITY
returns a city name and the two-character postal code that corresponds to a zip code

ZIPCITYDISTANCE
returns the geodetic distance between two zip code locations

ZIPFIPS
converts zip codes to FIPS state codes

ZIPNAME
converts zip codes to uppercase state names

ZIPNAMEL
converts zip codes to mixed-case state names

ZIPSTATE
converts zip codes to state postal codes 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1111 views
  • 1 like
  • 4 in conversation