DATA Step, Macro, Functions and more

Logistic Regression/GAM Modeling

Posts: 54

Logistic Regression/GAM Modeling


In my dataset the variables which are indicated by diffrent range say Female_Age_Band are given as 15-20,20-25,25-30,...& so on.But the problem is wherever the data is unavailable that particular observation is labelled as "Unavailable" which is making sas to read this field as a character. So I believe this will make it difficult to invoke this variable in logistic regression.Further , there are also certain categorical fields which has say 3 distinct indicators 0 1 & 2 .But even these fields have the "Unavailable" label.Cannot technically replace with zeroes because zero might be a valid value.

Can someone help with a solution ?
Trusted Advisor
Posts: 1,848

Re: Logistic Regression/GAM Modeling

[ Edited ]

"Unavailable" is a chrachter type field.

You need to create a new numeric variable using

  data temp;

    set have;

      new_variable = input(old_variable, ?? best.);


in this case the non numeric value will be assigned as missing value

and that is what you need for statistics and analyze.

Posts: 54

Re: Logistic Regression/GAM Modeling

Using this also converts the ranges to missing values so the values for eg say 10-20 ,20-30,... are even replaced with missings

Super User
Posts: 13,941

Re: Logistic Regression/GAM Modeling

Setting them to missing > removed from analysis. SAS will discard any record from analysis where any of the model variables are missing. If you don't assign them a "valid" category then you may as well use where clause to subset the data as they would be excluded anyway.


Or take a pass at imputing but I wouldn't go that route unless the number of "missing" or "unavailable" is large relative to the sample size. How large? depends on the actual data.

Ask a Question
Discussion stats
  • 3 replies
  • 3 in conversation