SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
leackell13
Fluorite | Level 6

Hi,

 

I have a character variable, SmokingStatus, with these values: "Never Smoker","Former Smoker","Light Smoker", and "Heavy Smoker". I am trying to assign them numbers so I can interpret them easier in a regression:

DATA SubQ.dataclean3;

LENGTH SmokingStatus $22.
Smoking 8.;

SET SubQ.combined;

IF SmokingStatus="*Unknown" THEN Smoking_Status=' '; /* Highlight missing values */
IF SmokingStatus="Never Smoker" THEN Smoking=0; /* Change smoking group to easier variable to use in regression */
IF SmokingStatus="Former Smoker" THEN Smoking=1;
IF Smoking_Status="Light Smoker" THEN Smoking=2;
IF Smoking_Status="Heavy Smoker" THEN Smoking=3;

RUN;

The problem is they assign 0 and 1 correctly but not 2 and 3; for light and heavy smokers, Smoking is shown as missing. Does anyone know why this is? I have also tried this making Smoking a character variable with $22. length and quotations around the number values.

 

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

@leackell13 wrote:
That is a good point; I think I was afraid that the values would be too large. So under the CLASS statement, I can just put SmokingStatus (REF='Never Smoker')?

Yes, that is how you'd specify a reference level.

 

I would also add PARAM=REF to indicate that you're using the One hot encoding/dummy variables/Referential parameterization instead of GLM which is the SAS default. 

 

 

 

 

View solution in original post

5 REPLIES 5
yabwon
Onyx | Level 15
IF SmokingStatus="*Unknown" THEN Smoking_Status=.;

Numeric missing is . (a dot)

 

{EDIT:} sorry didn't spot the issue. You are missing _ in the variable name:

SmokingStatus is not the same as Smoking_Status 

 

{EDIT2:} The length statement should be like:

LENGTH SmokingStatus $ 22 Smoking 8;

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Kurt_Bremser
Super User

The spelling must match exactly, including any non-printable characters that might be contained in the string. Print the string with a $HEX format to reveal such characters.

Reeza
Super User

@leackell13 wrote:

Hi,

 

I have a character variable, SmokingStatus, with these values: "Never Smoker","Former Smoker","Light Smoker", and "Heavy Smoker". I am trying to assign them numbers so I can interpret them easier in a regression:


How do you think that coding these to numbers will make it easier to interpret your results? They're categorical variables so need to be included in a CLASS statement and then the output would be labeled as "Never Smoker" vs "Heavy Smoker" compared to 1 vs 3? The first version would be much easier to read. 

 

Are you planning to treat these as a continuous ordinal variable instead of categorical?

 


@leackell13 wrote:

Hi,

 

I have a character variable, SmokingStatus, with these values: "Never Smoker","Former Smoker","Light Smoker", and "Heavy Smoker". I am trying to assign them numbers so I can interpret them easier in a regression:

DATA SubQ.dataclean3;

LENGTH SmokingStatus $22.
Smoking 8.;

SET SubQ.combined;

IF SmokingStatus="*Unknown" THEN Smoking_Status=' '; /* Highlight missing values */
IF SmokingStatus="Never Smoker" THEN Smoking=0; /* Change smoking group to easier variable to use in regression */
IF SmokingStatus="Former Smoker" THEN Smoking=1;
IF Smoking_Status="Light Smoker" THEN Smoking=2;
IF Smoking_Status="Heavy Smoker" THEN Smoking=3;

RUN;

The problem is they assign 0 and 1 correctly but not 2 and 3; for light and heavy smokers, Smoking is shown as missing. Does anyone know why this is? I have also tried this making Smoking a character variable with $22. length and quotations around the number values.

 

Thanks.


 

leackell13
Fluorite | Level 6
That is a good point; I think I was afraid that the values would be too large. So under the CLASS statement, I can just put SmokingStatus (REF='Never Smoker')?
Reeza
Super User

@leackell13 wrote:
That is a good point; I think I was afraid that the values would be too large. So under the CLASS statement, I can just put SmokingStatus (REF='Never Smoker')?

Yes, that is how you'd specify a reference level.

 

I would also add PARAM=REF to indicate that you're using the One hot encoding/dummy variables/Referential parameterization instead of GLM which is the SAS default. 

 

 

 

 

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 3563 views
  • 1 like
  • 4 in conversation