BookmarkSubscribeRSS Feed
chimukah
Obsidian | Level 7

I have a class variable with 9  levels and i included it in class statement in proc logistics. However, during model selection using Stepwise method,  Proc Logistics displayed 5 levels vs 1 (Ref) and omitted the other 3 levels in the class statement

 

proc logistic data =gene1;

class studyID gender Agecat;

model response (event='1') = StudyID gender Age Agecat /

                                          selection=stepwise

                                         sls=0.05

                                         sle=0.2

                                         include=2;

run;

13 REPLIES 13
PaigeMiller
Diamond | Level 26

Can you please show us the output?

 

Also, which CLASS variable is the one with 9 levels?

--
Paige Miller
chimukah
Obsidian | Level 7

How do i say this, this programme is confidential. In order not to breach the confidentialty, i just gave a dummy programme. However, thats the problem i am facing. All though there are some missing data somewhere, i don't want to come to conclusion that it happened as a result of missing data, rather i seek your opinion from those that might have similar problem

PaigeMiller
Diamond | Level 26

I'm afraid if we can't see the output, its hard to say what has happened and what to do about it.

 

Also, in most cases, confidentiality issues can be overcome by re-running the data after replacing actual identifying information with random information.

--
Paige Miller
FreelanceReinh
Jade | Level 19

@chimukah wrote:

(...) All though there are some missing data somewhere, i don't want to come to conclusion that it happened as a result of missing data, ...


Hi @chimukah,

 

I understand that the exclusion of observations with missing values is often annoying, but you may want to check if that is the reason.

proc freq data=gene1;
where cmiss(response, studyID, gender, Agecat, Age)=0;
tables studyID;
run;

After adapting the above step to your real dataset and variable names, do you get all nine categories of studyID in the PROC FREQ output or only six?

 

Note that PROC LOGISTIC excludes observations right from the start (of the variable selection process) due to missing values even in variables which fail to meet the SLENTRY criterion (i.e., which never make it into a model). It also excludes observations with missing values in variables (accidentally) listed in the CLASS statement which do not even occur in the MODEL statement.

chimukah
Obsidian | Level 7
@FreelanceReinhard,
Thank you for your candid advice. However after adapting the code, the missing class did not appear in the categories of the StudyID. I want to assume that SAS uses complete case. my worries is that the omitted. The log didn't mention of class variable separation in the process of model selection
FreelanceReinh
Jade | Level 19

If the missing categories don't appear in the PROC FREQ output, then you have the answer: Missing values in other variables lead to the exclusion of the respective observations. To find out which variables these are, you can adapt and run the following step:

ods select nlevels;
proc freq data=gene1 nlevels;
where studyID in ('first', 'second', 'third missing level');
tables response gender Agecat Age;
run;

(I assumed that studyID is a character variable. Use numeric values without quotes in the WHERE condition, of course, if it's numeric.)

chimukah
Obsidian | Level 7
The output said no observation were selected from the data set..I believe those levels of the class ID were omitted as a result of missing value because proc freq shows that no observation was obtain listing all the class variables in the model.
FreelanceReinh
Jade | Level 19

@chimukah wrote:
The output said no observation were selected from the data set..I believe those levels of the class ID were omitted as a result of missing value because proc freq shows that no observation was obtain listing all the class variables in the model.

The NLEVELS output does not exclude anything because of missing values. If "no observations were selected," then the three values specified in the WHERE condition were not found in the input dataset (e.g., because they were specified incorrectly or simply because they really don't exist).

chimukah
Obsidian | Level 7

StudyID is the class variable with 9 levels.

ballardw
Super User

@chimukah wrote:

StudyID is the class variable with 9 levels.


Doesn't matter. If any of the variables on the model statement are missing then the record will be excluded by default from modeling. So for instance if the records with study_id level=7 have combinations of missing values for the other variables, even if only a few for each other variable, then level=7 would not appear in the output.

 

Show us the LOG from running the code. Include all the messages, notes and/warnings.

chimukah
Obsidian | Level 7
@ballardw,
"So for instance if the records with study_id level=7 have combinations of missing values for the other variables, even if only a few for each other variable, then level=7 would not appear in the output."

I think this is exactly what is happening. Because the levels have missing observations, they were excluded.
ballardw
Super User

@chimukah wrote:
@ballardw,
"So for instance if the records with study_id level=7 have combinations of missing values for the other variables, even if only a few for each other variable, then level=7 would not appear in the output."

I think this is exactly what is happening. Because the levels have missing observations, they were excluded.

For the other class variables this might be mitigated by using the / missing option on the class statement. That way the missing level is treated as a valid level. Your Age variable how ever is not a class variable an so that can't be addressed. I did kind of wonder about including an Age and Age_group variable as I would expect that with many uses the Age_group would be build from Age, unless they reflect entirely different "ages" of some sort. If they both reflect the subjects age I might drop the Age variable and leave the Age_group with the missing option. That does treat all missing ages as if they are basically the same though.

chimukah
Obsidian | Level 7
The Age_group was placed in the class statement, the variables are just dummy for references. however i have noted your suggestion concerning including Age and Agegrp in the same model. I believe the exclusion or Omission was as a result of missing values in one or two of the covariates in the model.

@all Thanks for your candid advice.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 2361 views
  • 4 likes
  • 4 in conversation