- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a character variable, SmokingStatus, with these values: "Never Smoker","Former Smoker","Light Smoker", and "Heavy Smoker". I am trying to assign them numbers so I can interpret them easier in a regression:
DATA SubQ.dataclean3;
LENGTH SmokingStatus $22.
Smoking 8.;
SET SubQ.combined;
IF SmokingStatus="*Unknown" THEN Smoking_Status=' '; /* Highlight missing values */
IF SmokingStatus="Never Smoker" THEN Smoking=0; /* Change smoking group to easier variable to use in regression */
IF SmokingStatus="Former Smoker" THEN Smoking=1;
IF Smoking_Status="Light Smoker" THEN Smoking=2;
IF Smoking_Status="Heavy Smoker" THEN Smoking=3;
RUN;
The problem is they assign 0 and 1 correctly but not 2 and 3; for light and heavy smokers, Smoking is shown as missing. Does anyone know why this is? I have also tried this making Smoking a character variable with $22. length and quotations around the number values.
Thanks.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@leackell13 wrote:
That is a good point; I think I was afraid that the values would be too large. So under the CLASS statement, I can just put SmokingStatus (REF='Never Smoker')?
Yes, that is how you'd specify a reference level.
I would also add PARAM=REF to indicate that you're using the One hot encoding/dummy variables/Referential parameterization instead of GLM which is the SAS default.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
IF SmokingStatus="*Unknown" THEN Smoking_Status=.;
Numeric missing is . (a dot)
{EDIT:} sorry didn't spot the issue. You are missing _ in the variable name:
SmokingStatus is not the same as Smoking_Status
{EDIT2:} The length statement should be like:
LENGTH SmokingStatus $ 22 Smoking 8;
Bart
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug
"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings
SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The spelling must match exactly, including any non-printable characters that might be contained in the string. Print the string with a $HEX format to reveal such characters.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@leackell13 wrote:
Hi,
I have a character variable, SmokingStatus, with these values: "Never Smoker","Former Smoker","Light Smoker", and "Heavy Smoker". I am trying to assign them numbers so I can interpret them easier in a regression:
How do you think that coding these to numbers will make it easier to interpret your results? They're categorical variables so need to be included in a CLASS statement and then the output would be labeled as "Never Smoker" vs "Heavy Smoker" compared to 1 vs 3? The first version would be much easier to read.
Are you planning to treat these as a continuous ordinal variable instead of categorical?
@leackell13 wrote:
Hi,
I have a character variable, SmokingStatus, with these values: "Never Smoker","Former Smoker","Light Smoker", and "Heavy Smoker". I am trying to assign them numbers so I can interpret them easier in a regression:
DATA SubQ.dataclean3;
LENGTH SmokingStatus $22.
Smoking 8.;
SET SubQ.combined;
IF SmokingStatus="*Unknown" THEN Smoking_Status=' '; /* Highlight missing values */
IF SmokingStatus="Never Smoker" THEN Smoking=0; /* Change smoking group to easier variable to use in regression */
IF SmokingStatus="Former Smoker" THEN Smoking=1;
IF Smoking_Status="Light Smoker" THEN Smoking=2;
IF Smoking_Status="Heavy Smoker" THEN Smoking=3;
RUN;The problem is they assign 0 and 1 correctly but not 2 and 3; for light and heavy smokers, Smoking is shown as missing. Does anyone know why this is? I have also tried this making Smoking a character variable with $22. length and quotations around the number values.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@leackell13 wrote:
That is a good point; I think I was afraid that the values would be too large. So under the CLASS statement, I can just put SmokingStatus (REF='Never Smoker')?
Yes, that is how you'd specify a reference level.
I would also add PARAM=REF to indicate that you're using the One hot encoding/dummy variables/Referential parameterization instead of GLM which is the SAS default.