Hi Everyone!
I am working on crash data. My dependent variable is dichotomous, and the independent variables have categories. But I created a new variable for each category within a variable. For example if there are 3 vehicle types (CAR, BUS, TRUCK), and vehicle type is my independent variable. I created three variables named car, bus and truck and if the vehicle involved in the crash is car then I assigned numeric '1' in that row and other two variables have zeros. After converting all sub categories into variables, total I have 85 variables. I have used proc qlim to develop a binary logit model, between my dependent variable (fatality is there or not) and all my dichotomous independent variables, I am always have missing standard errors, t-statistics values in the output. It is also showing an error message of hessian matrix is singular. I could not figure this out. Please help me. Thanks in advance. I am attaching a couple of rows and columns for reference.
fatnot | ascfatal | ssr | ls | pva | pr | rpa | rma | rmac | rmic | rl | upa | uma | uc | ul | ol | tl | thl |
0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
You could make this a lot easier on yourself by employing the CLASS statement. To carry your example further, suppose the variable VEHICLE had three types (CAR BUS TRUCK). Some skeleton code would look like:
proc qlim;
class vehicle;
model fatnot=vehicle;
run;
This should help immensely with the problem of singular matices and missing standard errors.
Steve Denham
If you stick with your binary variable coding (instead of using the class statement as per Steve's suggestion) then you need to make sure that you always omit one category from your regression. So for your example, you would omit the variable TRUCK from your regression (or one of CAR or BUS - it doesn't matter). Assuming you then have a model with constant, CAR and BUS as predictors, the predicted value for TRUCK is (a transformation of) the constant, the predicted value for CAR based on the constant plus the car parameter and so on.
The technical term for this is parameter over specification. Put more simply if you have 3 levels you only need two variables to identify which level an observation belongs to, the third variable is extraneous and will have the missing values/std. errors.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.