proc logistic data=BTS201506 ;
class Carrier ;
model DepDelayInd(Descending) = CRSDepTime seqnum DepDelayLagInd DepDelayLag DepDelayLagCum ArrDelayLagInd ArrDelayLag ArrDelayLagCum DepDelayLag2 ArrDelayLag2;
where cancelled=0 ;
run ;
So basically, Carrier is my categorical variable. I want to partition the dataset based on the values of this categorical variable and then build a logistic regression for each of the values of this categorical variable, Carrier.
But the code above does not work.
Help
Instead of Class Carrier use By Carrier. This will do a complete separate analysis for each level of Carrier. The data must be sorted by Carrier first though.
Instead of Class Carrier use By Carrier. This will do a complete separate analysis for each level of Carrier. The data must be sorted by Carrier first though.
But Carrier is a character variable.
When I use By Carrier.
this is error message
ERROR: Variable Carrier should be either numeric or specified in the CLASS statement
First sort your data set by Carrier then use the sorted data set in logistic regression with
By Carrier;
Actually it worked.
My final question is,
It seems that the
"where cancelled = 0"
statement has no effect.
I'm still getting warnings like these in the ouput window:
"Note: 2898 observations were deleted due to missing values for the response or explanatory variables."
But that shouldn't be the case because all the missing values are where cancelled = 1.
By stating where cancelled = 1, i am selecting those observations without missing values.
@junlue wrote:
Actually it worked.
My final question is,
It seems that the
"where cancelled = 0"
statement has no effect.
I'm still getting warnings like these in the ouput window:
"Note: 2898 observations were deleted due to missing values for the response or explanatory variables."
But that shouldn't be the case because all the missing values are where cancelled = 1.
By stating where cancelled = 1, i am selecting those observations without missing values.
Try Running this code and see what you get:
Proc freq data=BTS201506;
tables cancelled* (DepDelayInd CRSDepTime seqnum DepDelayLagInd DepDelayLag DepDelayLagCum ArrDelayLagInd ArrDelayLag ArrDelayLagCum DepDelayLag2 ArrDelayLag2) / list missing;
run;
See if you have any rows with cancelled=0 and something else missing. Or possibly you really meant to keep cancelled=1??
There was a typo in my previous post.
I meant "where cancelled = 0", not 1.
However, this was a non-issue and you are right. There are indeed still missing values in for rows with the value of 0 for the cancelled variable
The procedure, as do most of the regressions, excludes any record with a missing value for the variables on the model statement. It may be that you have some mis-coded variables such as missing should have become 0 or similar that was intended but skipped.
@junlue it sounds like you have your question answered. Please mark the appropriate solution as the correct answer.
If it doesn't work, please post your code AND log.
You're likely putting something in the wrong place, using a BY statement is the correct answer.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.