DATA Step, Macro, Functions and more

character variables to numerical variables

Reply
Contributor
Posts: 62

character variables to numerical variables

I have a character variable  representing various US states that I need to convert  to numberical.

data have;

State

Kentucky

Louisiana

Maine

Maryland

Massachusetts

Michigan

Minnesota

Mississippi

Missouri

Montana

Nebraska

Nevada

New Hampshire

New Jersey

New Mexico

New York

North Carolina

North Dakota

Ohio

Oklahoma

Oregon

Pennsylvania

Rhode Island

South Carolina

 

data want;

State_num

1

2

3

so one so forth

 

I know I can achieve this  using :

If state=”Kentucky” then state_num=1;

If state=”Louisiana” then state_num=2;

and so one so forth.

 

But this would be lenghty, does anyone know what would be the optimal way of converting from character to numeric.

Thanks.

 

Super Contributor
Posts: 254

Re: character variables to numerical variables

You can create a format that gives a unique number when the Statename is given.

 

Other easy way is to use _N_, the obsevation number, to be attached to each state. 

 

The former is better.

Contributor
Posts: 62

Re: character variables to numerical variables

@ datasp, would creating a format the same as recoding the variable as I shown in my posting?
Thanks
Super Contributor
Posts: 254

Re: character variables to numerical variables

There is no need to list all Names to make the format. We can create format using the data set.

 

I show how it can be made.

 

Your interest is not known as how you will be using the crime rates. If it is known,  an approprite solution can be given.

Besides the Format, Hash Objects and Array with PIBw. can be used.

 

data have;
input state &$15.;
datalines;
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Montana
Nebraska
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon
Pennsylvania
Rhode Island
South Carolina
;
run;

data cnt;
retain fmtname 's2num' type 'I';
      set have end = last;
      start = state;
      label = _N_;
      output;
   if last then do;
      hlo = 'O';
      label = 0;
      output;
   end;
run;

proc sort data = cnt nodupkey;
by start;
run;

proc format  cntlin = cnt;
run;


We can verify the Format by cycling the original data set. Note the use of INPUT(STATE, s2num.) to get the unique number for a given State.

data want;
   set have;
   state_id = input(state, s2num.);
run;

 

 

Super User
Posts: 17,730

Re: character variables to numerical variables

There are also the FIPS code that you can use. Look into the functions. I know you can go from state abbreviation to code.  

Contributor
Posts: 62

Re: character variables to numerical variables

Reeza, there are no FIPS Codes in the dataset. Plus the states have to be ordered based on crime level.
Super User
Posts: 17,730

Re: character variables to numerical variables

I think we're missing some information, please clarify your requirements. 

 

I was suggesting mapping the State Names to FIP codes, I think there's a SAS function that maps this.

If order matters then please clarify your requirements. 

Super User
Super User
Posts: 6,495

Re: character variables to numerical variables

If your data is one record per state then you can define your own code.

data want ;
  set have ;
  state_num + 1;
run;
Ask a Question
Discussion stats
  • 7 replies
  • 370 views
  • 0 likes
  • 4 in conversation