In LOCATION dataset, a STATES column has two letter coded values like 'CT, NY, NJ, IL, AZ, CA' and some of the values in the same column are displayed with full names of the states like 'COLORADO, DELAWARE, FLORIDA, INDIANA'
what is the code to make all these full name states into their respective two lettered codes.
Do you have a list of what country should decode to which 2 letter abbreviation? If so then just put some if statements in a datastep or merge the information on. For example - and you have not provided any of the necessary information such as test data in the form of a datastep to write this acurately:
data codes; country="Canada"; code="CA"; output; run; proc sql; create table WANT as select A.COUNTRY, B.CODE from HAVE A left join CODES B on A.COUNTRY=B.COUNTRY; quit;
hi,
the LOCATION dataset is huge dataset in which thousands of records are with full names of the states instead of two lettered codes.
how can we write if statement for soo many records with full names?
@anushagandla wrote:
hi,
the LOCATION dataset is huge dataset in which thousands of records are with full names of the states instead of two lettered codes.
how can we write if statement for soo many records with full names?
That's why cleaning data sucks.
Run a proc freq on the column and get all distinct entries.
Map each one you don't want to the new code using IF/THEN statements or a custom format. A Format is easier because you can create a format from a data set. SASMAPS will have a mapping of state names to abbreviations.
You only the write the statement once, but if you have 50 states, it could be up to 50 linesm, a format would definitely be less code.
There's some examples in here on how to create formats from a data set and using PUT to recode variables.
http://www2.sas.com/proceedings/sugi30/001-30.pdf
By any chance does this data set have a ZIPcode variable? Then the easiest may be to use the ZIPSTATE function.
If you have Zip+4 codes , such as 90049-1392 you only want to use the first 5 characters, or the digits corresponding to the base 5 zipcode. A numeric zip of 1040 would be treated as 01040.
data want;
set have;
ST2Char= zipstate(zipcodevariable);
run;
would be as start. If you have the Zip+4 then zipstate(substr(zipcodevariable,1,5))
If these are US states and you don't have to deal with typos as well then below code should do the job.
data have;
infile datalines dlm=',';
input states:$40. @@;
output;
datalines;
CT, NY, NJ, IL, AZ, CA,COLORADO, DELAWARE, FLORIDA, INDIANA
;
run;
proc sql;
create table cntlin_source as
select distinct
'StateToCode' as fmtname
, 'c' as type
, statename2 as start
, statecode as label
from
sashelp.zipcode
;
quit;
proc format cntlin=cntlin_source;
run;
data want;
set have;
state_codes=put(upcase(compress(states)),$StateToCode.);
run;
Another way to save some typing is to leverage SASHELP.ZIPCODE, where you will find all the states' full name as well as the two letter state code. Then you can choose either Proc SQL / Hash/Format to standardize your variable.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.