I have a class variable with 9 levels and i included it in class statement in proc logistics. However, during model selection using Stepwise method, Proc Logistics displayed 5 levels vs 1 (Ref) and omitted the other 3 levels in the class statement
proc logistic data =gene1;
class studyID gender Agecat;
model response (event='1') = StudyID gender Age Agecat /
selection=stepwise
sls=0.05
sle=0.2
include=2;
run;
Can you please show us the output?
Also, which CLASS variable is the one with 9 levels?
How do i say this, this programme is confidential. In order not to breach the confidentialty, i just gave a dummy programme. However, thats the problem i am facing. All though there are some missing data somewhere, i don't want to come to conclusion that it happened as a result of missing data, rather i seek your opinion from those that might have similar problem
I'm afraid if we can't see the output, its hard to say what has happened and what to do about it.
Also, in most cases, confidentiality issues can be overcome by re-running the data after replacing actual identifying information with random information.
@chimukah wrote:
(...) All though there are some missing data somewhere, i don't want to come to conclusion that it happened as a result of missing data, ...
Hi @chimukah,
I understand that the exclusion of observations with missing values is often annoying, but you may want to check if that is the reason.
proc freq data=gene1;
where cmiss(response, studyID, gender, Agecat, Age)=0;
tables studyID;
run;
After adapting the above step to your real dataset and variable names, do you get all nine categories of studyID in the PROC FREQ output or only six?
Note that PROC LOGISTIC excludes observations right from the start (of the variable selection process) due to missing values even in variables which fail to meet the SLENTRY criterion (i.e., which never make it into a model). It also excludes observations with missing values in variables (accidentally) listed in the CLASS statement which do not even occur in the MODEL statement.
If the missing categories don't appear in the PROC FREQ output, then you have the answer: Missing values in other variables lead to the exclusion of the respective observations. To find out which variables these are, you can adapt and run the following step:
ods select nlevels;
proc freq data=gene1 nlevels;
where studyID in ('first', 'second', 'third missing level');
tables response gender Agecat Age;
run;
(I assumed that studyID is a character variable. Use numeric values without quotes in the WHERE condition, of course, if it's numeric.)
@chimukah wrote:
The output said no observation were selected from the data set..I believe those levels of the class ID were omitted as a result of missing value because proc freq shows that no observation was obtain listing all the class variables in the model.
The NLEVELS output does not exclude anything because of missing values. If "no observations were selected," then the three values specified in the WHERE condition were not found in the input dataset (e.g., because they were specified incorrectly or simply because they really don't exist).
StudyID is the class variable with 9 levels.
@chimukah wrote:
StudyID is the class variable with 9 levels.
Doesn't matter. If any of the variables on the model statement are missing then the record will be excluded by default from modeling. So for instance if the records with study_id level=7 have combinations of missing values for the other variables, even if only a few for each other variable, then level=7 would not appear in the output.
Show us the LOG from running the code. Include all the messages, notes and/warnings.
@chimukah wrote:
@ballardw,
"So for instance if the records with study_id level=7 have combinations of missing values for the other variables, even if only a few for each other variable, then level=7 would not appear in the output."
I think this is exactly what is happening. Because the levels have missing observations, they were excluded.
For the other class variables this might be mitigated by using the / missing option on the class statement. That way the missing level is treated as a valid level. Your Age variable how ever is not a class variable an so that can't be addressed. I did kind of wonder about including an Age and Age_group variable as I would expect that with many uses the Age_group would be build from Age, unless they reflect entirely different "ages" of some sort. If they both reflect the subjects age I might drop the Age variable and leave the Age_group with the missing option. That does treat all missing ages as if they are basically the same though.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.