BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
leackell13
Fluorite | Level 6

Hi,

 

I have a character variable, SmokingStatus, with these values: "Never Smoker","Former Smoker","Light Smoker", and "Heavy Smoker". I am trying to assign them numbers so I can interpret them easier in a regression:

DATA SubQ.dataclean3;

LENGTH SmokingStatus $22.
Smoking 8.;

SET SubQ.combined;

IF SmokingStatus="*Unknown" THEN Smoking_Status=' '; /* Highlight missing values */
IF SmokingStatus="Never Smoker" THEN Smoking=0; /* Change smoking group to easier variable to use in regression */
IF SmokingStatus="Former Smoker" THEN Smoking=1;
IF Smoking_Status="Light Smoker" THEN Smoking=2;
IF Smoking_Status="Heavy Smoker" THEN Smoking=3;

RUN;

The problem is they assign 0 and 1 correctly but not 2 and 3; for light and heavy smokers, Smoking is shown as missing. Does anyone know why this is? I have also tried this making Smoking a character variable with $22. length and quotations around the number values.

 

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

@leackell13 wrote:
That is a good point; I think I was afraid that the values would be too large. So under the CLASS statement, I can just put SmokingStatus (REF='Never Smoker')?

Yes, that is how you'd specify a reference level.

 

I would also add PARAM=REF to indicate that you're using the One hot encoding/dummy variables/Referential parameterization instead of GLM which is the SAS default. 

 

 

 

 

View solution in original post

5 REPLIES 5
yabwon
Onyx | Level 15
IF SmokingStatus="*Unknown" THEN Smoking_Status=.;

Numeric missing is . (a dot)

 

{EDIT:} sorry didn't spot the issue. You are missing _ in the variable name:

SmokingStatus is not the same as Smoking_Status 

 

{EDIT2:} The length statement should be like:

LENGTH SmokingStatus $ 22 Smoking 8;

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Kurt_Bremser
Super User

The spelling must match exactly, including any non-printable characters that might be contained in the string. Print the string with a $HEX format to reveal such characters.

Reeza
Super User

@leackell13 wrote:

Hi,

 

I have a character variable, SmokingStatus, with these values: "Never Smoker","Former Smoker","Light Smoker", and "Heavy Smoker". I am trying to assign them numbers so I can interpret them easier in a regression:


How do you think that coding these to numbers will make it easier to interpret your results? They're categorical variables so need to be included in a CLASS statement and then the output would be labeled as "Never Smoker" vs "Heavy Smoker" compared to 1 vs 3? The first version would be much easier to read. 

 

Are you planning to treat these as a continuous ordinal variable instead of categorical?

 


@leackell13 wrote:

Hi,

 

I have a character variable, SmokingStatus, with these values: "Never Smoker","Former Smoker","Light Smoker", and "Heavy Smoker". I am trying to assign them numbers so I can interpret them easier in a regression:

DATA SubQ.dataclean3;

LENGTH SmokingStatus $22.
Smoking 8.;

SET SubQ.combined;

IF SmokingStatus="*Unknown" THEN Smoking_Status=' '; /* Highlight missing values */
IF SmokingStatus="Never Smoker" THEN Smoking=0; /* Change smoking group to easier variable to use in regression */
IF SmokingStatus="Former Smoker" THEN Smoking=1;
IF Smoking_Status="Light Smoker" THEN Smoking=2;
IF Smoking_Status="Heavy Smoker" THEN Smoking=3;

RUN;

The problem is they assign 0 and 1 correctly but not 2 and 3; for light and heavy smokers, Smoking is shown as missing. Does anyone know why this is? I have also tried this making Smoking a character variable with $22. length and quotations around the number values.

 

Thanks.


 

leackell13
Fluorite | Level 6
That is a good point; I think I was afraid that the values would be too large. So under the CLASS statement, I can just put SmokingStatus (REF='Never Smoker')?
Reeza
Super User

@leackell13 wrote:
That is a good point; I think I was afraid that the values would be too large. So under the CLASS statement, I can just put SmokingStatus (REF='Never Smoker')?

Yes, that is how you'd specify a reference level.

 

I would also add PARAM=REF to indicate that you're using the One hot encoding/dummy variables/Referential parameterization instead of GLM which is the SAS default. 

 

 

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 2508 views
  • 1 like
  • 4 in conversation