Hi,
I have a character variable, SmokingStatus, with these values: "Never Smoker","Former Smoker","Light Smoker", and "Heavy Smoker". I am trying to assign them numbers so I can interpret them easier in a regression:
DATA SubQ.dataclean3;
LENGTH SmokingStatus $22.
Smoking 8.;
SET SubQ.combined;
IF SmokingStatus="*Unknown" THEN Smoking_Status=' '; /* Highlight missing values */
IF SmokingStatus="Never Smoker" THEN Smoking=0; /* Change smoking group to easier variable to use in regression */
IF SmokingStatus="Former Smoker" THEN Smoking=1;
IF Smoking_Status="Light Smoker" THEN Smoking=2;
IF Smoking_Status="Heavy Smoker" THEN Smoking=3;
RUN;
The problem is they assign 0 and 1 correctly but not 2 and 3; for light and heavy smokers, Smoking is shown as missing. Does anyone know why this is? I have also tried this making Smoking a character variable with $22. length and quotations around the number values.
Thanks.
@leackell13 wrote:
That is a good point; I think I was afraid that the values would be too large. So under the CLASS statement, I can just put SmokingStatus (REF='Never Smoker')?
Yes, that is how you'd specify a reference level.
I would also add PARAM=REF to indicate that you're using the One hot encoding/dummy variables/Referential parameterization instead of GLM which is the SAS default.
IF SmokingStatus="*Unknown" THEN Smoking_Status=.;
Numeric missing is . (a dot)
{EDIT:} sorry didn't spot the issue. You are missing _ in the variable name:
SmokingStatus is not the same as Smoking_Status
{EDIT2:} The length statement should be like:
LENGTH SmokingStatus $ 22 Smoking 8;
Bart
The spelling must match exactly, including any non-printable characters that might be contained in the string. Print the string with a $HEX format to reveal such characters.
@leackell13 wrote:
Hi,
I have a character variable, SmokingStatus, with these values: "Never Smoker","Former Smoker","Light Smoker", and "Heavy Smoker". I am trying to assign them numbers so I can interpret them easier in a regression:
How do you think that coding these to numbers will make it easier to interpret your results? They're categorical variables so need to be included in a CLASS statement and then the output would be labeled as "Never Smoker" vs "Heavy Smoker" compared to 1 vs 3? The first version would be much easier to read.
Are you planning to treat these as a continuous ordinal variable instead of categorical?
@leackell13 wrote:
Hi,
I have a character variable, SmokingStatus, with these values: "Never Smoker","Former Smoker","Light Smoker", and "Heavy Smoker". I am trying to assign them numbers so I can interpret them easier in a regression:
DATA SubQ.dataclean3;
LENGTH SmokingStatus $22.
Smoking 8.;
SET SubQ.combined;
IF SmokingStatus="*Unknown" THEN Smoking_Status=' '; /* Highlight missing values */
IF SmokingStatus="Never Smoker" THEN Smoking=0; /* Change smoking group to easier variable to use in regression */
IF SmokingStatus="Former Smoker" THEN Smoking=1;
IF Smoking_Status="Light Smoker" THEN Smoking=2;
IF Smoking_Status="Heavy Smoker" THEN Smoking=3;
RUN;The problem is they assign 0 and 1 correctly but not 2 and 3; for light and heavy smokers, Smoking is shown as missing. Does anyone know why this is? I have also tried this making Smoking a character variable with $22. length and quotations around the number values.
Thanks.
@leackell13 wrote:
That is a good point; I think I was afraid that the values would be too large. So under the CLASS statement, I can just put SmokingStatus (REF='Never Smoker')?
Yes, that is how you'd specify a reference level.
I would also add PARAM=REF to indicate that you're using the One hot encoding/dummy variables/Referential parameterization instead of GLM which is the SAS default.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.