DATA Step, Macro, Functions and more

Question about coding Categorical variable

Posts: 27

Question about coding Categorical variable

I've got a dataset imported from SPSS. The smoking status from patients is coded as "smoke" variable. And it has 3 categories "Current smoker", "Ex smoker" and "Non-smoker". I wanted to code this categorical variable into 0, 1, 2 so I wrote below code. However, it did not work. It is showing "." in each column.


if smoke='Current smoker' then smoke1=2;
else if smoke='Ex smoker' then smoke1=1;
else if smoke='Non-smoker' then smoke1=0;
else if smoke='No Answer' then smoke1=.;

Super User
Posts: 6,785

Re: Question about coding Categorical variable

This section of the program looks OK.  The problem might lie in another section of the program, or it might lie in the data. 


Example:  The data actually contains all uppercase values for SMOKE.


Example:  The number of characters in SMOKE is actually less than 10 (for whatever reason).


Example:  The DATA step that contains this code forgot to use a SET statement to read in the data source.


The steps you can take to help:


Run a PROC FREQ on the variable SMOKE to verify the actual values it contains.


Post the log from your DATA step (not the program).  That will contain key results to help diagnose the source of the problem.

Super User
Posts: 13,583

Re: Question about coding Categorical variable

Please run Proc Contents on that data set and share the results for the smoke variable.


When you say you get . it may mean that smoke is already numeric and you perhaps are seeing a Format applied to an existing numeric value.

Or try:


proc freq data=<your data set name>;

   tables smoke;

   format smoke best.;



Ask a Question
Discussion stats
  • 2 replies
  • 1 like
  • 3 in conversation