Dear all,
The attached dataset contains the missing "state" and "stateab". These two variables can be matched with the "STCOD".
Could anyone suggest a program that can help me to update the missing variables?
Thank you.
MSPAK
Hi,
Your dataset is pretty small, so a PROC SQL update statement would work nicely.
The first step involves creating a lookup table; I included a frequency column (informational only and not needed). But notice that for our nation's capital, there are two spelling variations ("District of Columbia" and "District Of Columbia"). The PROC SORT gets rid of the duplicate.
Next, a series of two PROC SQL update statement updates rows for which state is missing and then updates rows for which stateab is missing. Since your dataset is relatively small, an "in-place" PROC SQL update works just fine. If your dataset were much larger with millions of observations, then a PROC SQL update may be suboptimal.
Enjoy!
/* first, create a lookup table */
PROC SQL;
CREATE TABLE distinct_states AS
SELECT STCOD, state, stateab, COUNT(*) as state_frequency
FROM full
WHERE NOT MISSING(state)
GROUP BY STCOD, state, stateab
ORDER BY 2, 1, 4 DESC;
QUIT;
/* note that your dataset has District of Columbia spelled two different
ways, thus get rid of one of them */
PROC SORT DATA=distinct_states nodupkey;
BY STCOD;
RUN;
PROC SQL;
/* only update where state is missing */
UPDATE full
SET state = (SELECT MAX(state)
FROM distinct_states
WHERE full.STCOD = distinct_states.STCOD)
WHERE MISSING(state);
QUIT;
PROC SQL;
/* only update where abbreviated state is missing */
UPDATE full
SET stateab = (SELECT MAX(stateab)
FROM distinct_states
WHERE full.STCOD = distinct_states.STCOD)
WHERE MISSING(stateab);
QUIT;
Dear Reeza,
Yes, I tried. Thank you.
Hi ... you can use functions ...
data fixed;
set z.full;
if missing(state) then do;
state = fipname(stcod);
stateab = fipstate(stcod);
end;
run;
Also, the variable COUNTY has three different forms in the data set: just the county name, county + state, county name followed by the word "County". . Why not make it consistent, just the county name ...
data fixed;
set z.full;
if missing(state) then do;
state = fipname(stcod);
stateab = fipstate(stcod);
end;
* get rid of the STATE, get rid of text 'COUNTY';
if find(county,',') then county=scan(county,1,',');
else
if findw(county,'County') then county = tranwrd(county,'County',' ');
run;
Hi,
Your dataset is pretty small, so a PROC SQL update statement would work nicely.
The first step involves creating a lookup table; I included a frequency column (informational only and not needed). But notice that for our nation's capital, there are two spelling variations ("District of Columbia" and "District Of Columbia"). The PROC SORT gets rid of the duplicate.
Next, a series of two PROC SQL update statement updates rows for which state is missing and then updates rows for which stateab is missing. Since your dataset is relatively small, an "in-place" PROC SQL update works just fine. If your dataset were much larger with millions of observations, then a PROC SQL update may be suboptimal.
Enjoy!
/* first, create a lookup table */
PROC SQL;
CREATE TABLE distinct_states AS
SELECT STCOD, state, stateab, COUNT(*) as state_frequency
FROM full
WHERE NOT MISSING(state)
GROUP BY STCOD, state, stateab
ORDER BY 2, 1, 4 DESC;
QUIT;
/* note that your dataset has District of Columbia spelled two different
ways, thus get rid of one of them */
PROC SORT DATA=distinct_states nodupkey;
BY STCOD;
RUN;
PROC SQL;
/* only update where state is missing */
UPDATE full
SET state = (SELECT MAX(state)
FROM distinct_states
WHERE full.STCOD = distinct_states.STCOD)
WHERE MISSING(state);
QUIT;
PROC SQL;
/* only update where abbreviated state is missing */
UPDATE full
SET stateab = (SELECT MAX(stateab)
FROM distinct_states
WHERE full.STCOD = distinct_states.STCOD)
WHERE MISSING(stateab);
QUIT;
Hi. There are functions to convert state codes (fips numbers) to state names and two character abbreviations. Every observation in the data set has a state code.
Dear MikeZded,
Did you mean that SAS system contains a database of state codes, with their names and abbreviation? May I know which SAS function can perform this procedure?
Thank you.
Regards,
MSPAK
Hi again,
Thanks. I read your suggested program. I think this only works for U.S. data. Do you think this is applicable to other countries as well?
Thank you very much.
MSPAK
Correct, the various functions that deal with state and zip codes are US only. There are a lot of such functions (see below from SAS online HELP). Also the SQL solution for US state codes worked fine, but given that there are two functions that did the same thing,in two statements, it seems like a lot of code for US-based data. However , it would be a solution for an instance when functions would not work, e.g. non-US data.
State and Zip Code Functions
FIPNAME
converts FIPS codes to uppercase state names
FIPNAMEL
converts FIPS codes to mixed-case state names
FIPSTATE
converts FIPS codes to two-character postal codes
GEODISTANCE
returns the geodetic distance between two latitude and longitude coordinates
STFIPS
converts state postal codes to FIPS state codes
STNAME
converts state postal codes to uppercase state names
STNAMEL
converts state postal codes to mixed-case state names
ZIPCITY
returns a city name and the two-character postal code that corresponds to a zip code
ZIPCITYDISTANCE
returns the geodetic distance between two zip code locations
ZIPFIPS
converts zip codes to FIPS state codes
ZIPNAME
converts zip codes to uppercase state names
ZIPNAMEL
converts zip codes to mixed-case state names
ZIPSTATE
converts zip codes to state postal codes
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.