BookmarkSubscribeRSS Feed
archibald
Obsidian | Level 7

I have a character variable  representing various US states that I need to convert  to numberical.

data have;

State

Kentucky

Louisiana

Maine

Maryland

Massachusetts

Michigan

Minnesota

Mississippi

Missouri

Montana

Nebraska

Nevada

New Hampshire

New Jersey

New Mexico

New York

North Carolina

North Dakota

Ohio

Oklahoma

Oregon

Pennsylvania

Rhode Island

South Carolina

 

data want;

State_num

1

2

3

so one so forth

 

I know I can achieve this  using :

If state=”Kentucky” then state_num=1;

If state=”Louisiana” then state_num=2;

and so one so forth.

 

But this would be lenghty, does anyone know what would be the optimal way of converting from character to numeric.

Thanks.

 

7 REPLIES 7
KachiM
Rhodochrosite | Level 12

You can create a format that gives a unique number when the Statename is given.

 

Other easy way is to use _N_, the obsevation number, to be attached to each state. 

 

The former is better.

archibald
Obsidian | Level 7
@ datasp, would creating a format the same as recoding the variable as I shown in my posting?
Thanks
KachiM
Rhodochrosite | Level 12

There is no need to list all Names to make the format. We can create format using the data set.

 

I show how it can be made.

 

Your interest is not known as how you will be using the crime rates. If it is known,  an approprite solution can be given.

Besides the Format, Hash Objects and Array with PIBw. can be used.

 

data have;
input state &$15.;
datalines;
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Montana
Nebraska
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon
Pennsylvania
Rhode Island
South Carolina
;
run;

data cnt;
retain fmtname 's2num' type 'I';
      set have end = last;
      start = state;
      label = _N_;
      output;
   if last then do;
      hlo = 'O';
      label = 0;
      output;
   end;
run;

proc sort data = cnt nodupkey;
by start;
run;

proc format  cntlin = cnt;
run;


We can verify the Format by cycling the original data set. Note the use of INPUT(STATE, s2num.) to get the unique number for a given State.

data want;
   set have;
   state_id = input(state, s2num.);
run;

 

 

Reeza
Super User

There are also the FIPS code that you can use. Look into the functions. I know you can go from state abbreviation to code.  

archibald
Obsidian | Level 7
Reeza, there are no FIPS Codes in the dataset. Plus the states have to be ordered based on crime level.
Reeza
Super User

I think we're missing some information, please clarify your requirements. 

 

I was suggesting mapping the State Names to FIP codes, I think there's a SAS function that maps this.

If order matters then please clarify your requirements. 

Tom
Super User Tom
Super User

If your data is one record per state then you can define your own code.

data want ;
  set have ;
  state_num + 1;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1250 views
  • 0 likes
  • 4 in conversation