- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello Everyone,
I am currently trying to perform a PROC GEE model on data that I transposed. I get results but I think that something is wrong here.
/*explanation of variables:
I have data from 2009, 2011, 2014, and 2018 measuring if a person is unemployed or not. earlystart is categorized into three categories (0,1,2,). first, I recoded each of these variables so that they would be the same*/
data chomage;
set cannabis;
keep earlystart NTT j09_EVEN12M10 J11_CHOMAGE J14_CHOMAGE J18_CHOMAGE;
run;
data chomage2;
set chomage;
if j09_EVEN12M10=2 then chomj09=0;
if j09_EVEN12M10=1 then chomj09=1;
if j11_CHOMAGE = 2 then chomj11=0;
else if j11_CHOMAGE = 1 then chomj11=1;
chomj14 = j14_CHOMAGE;
chomj18= J18_CHOMAGE;
DROP
j09_EVEN12M10
j11_CHOMAGE
j14_chomage
J18_CHOMAGE;
run;
/*then I transposed my data. NTT is my I.D. variable*/
proc transpose data=chomage2 out=transpose3;
by ntt earlystart; /*earlystart is not changing by each wave of data*/
var chomj09 chomj11 chomj14 chomj18;
run;
/*renamed the column*/
data chomagetranspose;
set transpose3 (rename=(col1=chomj));
drop _name_;
run;
/* I do not know if this is necessary but I removed the missing values for if my independent and dependent variable were missing*/
data finaldeleted;
set chomagetranspose;
if earlystart =. or chomj=. then delete;
run;
/*running the final model*/
proc genmod data=finaldeleted descending;
class ntt earlystart;
model chomj = earlystart/ dist=bin link=logit;
repeated subject=ntt / type=exch covb corrw;
run;
my results below:
Model Information | |
Data Set | WORK.FINALDELETED |
Distribution | Binomial |
Link Function | Logit |
Dependent Variable | chomj |
Number of Observations Read | 3483 |
Number of Observations Used | 3483 |
Number of Events | 1999 |
Number of Trials | 3483 |
Class Level Information | ||
Class | Levels | Values |
NTT | 1476 | 142 149 162 165 169 170 177 178 179 180 182 188 192 200 207 215 217 218 224 228 238 246 258 264 270 274 284 285 286 289 292 295 296 297 298 303 306 307 310 322 326 333 343 349 351 355 356 358 360 361 362 363 364 373 374 387 393 394 398 409 413 414 415 ... |
earlystart | 3 | 0 1 2 |
Response Profile | ||
Ordered | chomj | Total |
1 | 1 | 1999 |
2 | 0 | 1484 |
PROC GENMOD is modeling the probability that chomj='1'. |
/*I want chomage=1 because it means the probability that a person will be
unemployed*/
Parameter Information | ||
Parameter | Effect | earlystart |
Prm1 | Intercept |
|
Prm2 | earlystart | 0 |
Prm3 | earlystart | 1 |
Prm4 | earlystart | 2 |
Algorithm converged. |
GEE Model Information | |
Correlation Structure | Exchangeable |
Subject Effect | NTT (1476 levels) |
Number of Clusters | 1476 |
Correlation Matrix Dimension | 4 |
Maximum Cluster Size | 4 |
Minimum Cluster Size | 1 |
Covariance Matrix (Model-Based) | |||
| Prm1 | Prm2 | Prm3 |
Prm1 | 0.004538 | -0.004538 | -0.004538 |
Prm2 | -0.004538 | 0.009496 | 0.004538 |
Prm3 | -0.004538 | 0.004538 | 0.01291 |
Covariance Matrix (Empirical) | |||
| Prm1 | Prm2 | Prm3 |
Prm1 | 0.004691 | -0.004691 | -0.004691 |
Prm2 | -0.004691 | 0.009653 | 0.004691 |
Prm3 | -0.004691 | 0.004691 | 0.01225 |
Algorithm converged. |
Working Correlation Matrix | ||||
| Col1 | Col2 | Col3 | Col4 |
Row1 | 1.0000 | 0.3325 | 0.3325 | 0.3325 |
Row2 | 0.3325 | 1.0000 | 0.3325 | 0.3325 |
Row3 | 0.3325 | 0.3325 | 1.0000 | 0.3325 |
Row4 | 0.3325 | 0.3325 | 0.3325 | 1.0000 |
Exchangeable Working Correlation | |
Correlation | 0.3324594755 |
GEE Fit Criteria | |
QIC | 4764.3766 |
QICu | 4761.2962 |
Analysis Of GEE Parameter Estimates | |||||||
Empirical Standard Error Estimates | |||||||
Parameter |
| Estimate | Standard | 95% Confidence Limits | Z | Pr > |Z| | |
Intercept |
| 0.4582 | 0.0685 | 0.3239 | 0.5924 | 6.69 | <.0001 |
earlystart | 0 | -0.1410 | 0.0983 | -0.3336 | 0.0516 | -1.43 | 0.1513 |
earlystart | 1 | -0.0495 | 0.1107 | -0.2665 | 0.1674 | -0.45 | 0.6546 |
earlystart | 2 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | . | . |
/*this is what I find strange. Why is the early start category (2) blank?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @katy-barry and welcome to the SAS Support Communities!
I've run your PROC GENMOD step on simple, randomly generated data and -- as expected -- the last row in the "Analysis Of GEE Parameter Estimates" table is identical to what you've shown. The reason is: By default, PROC GENMOD uses GLM coding as the parameterization method for classification variables: see documentation of the CLASS statement (PARAM= option). This is explained in more detail in GLM Parameterization of Classification Variables and Effects and Other Parameterizations, where it says: "Parameter estimates of CLASS main effects that use the GLM coding scheme estimate the difference in the effects of each level compared to the last level."
The "last level" of earlystart is 2 and, of course, the difference in the effect of the last level compared to itself is trivially zero.
If you had specified reference cell coding
class ntt earlystart(param=ref ref='2');
the reference level (here: 2) would have been just omitted, without changing the other parameters and statistics.
Regardless of the parameterization method, there's one more parameter to be estimated than there are degrees of freedom. Hence, it is correct that one parameter is set to zero or omitted.
Similarly, adding the NOINT option to the MODEL statement would have displayed in row "earlystart 2" what is shown in row "Intercept" in your output (and the intercept parameter would be zero instead).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thank you I just tried (param=ref ref= '0') and it worked brilliantly. thank you for your help 🙂
Analysis Of GEE Parameter Estimates | |||||||
Empirical Standard Error Estimates | |||||||
Parameter |
| Estimate | Standard | 95% Confidence Limits | Z | Pr > |Z| | |
Intercept |
| 0.3172 | 0.0704 | 0.1791 | 0.4552 | 4.50 | <.0001 |
earlystart | 1 | 0.0915 | 0.1119 | -0.1279 | 0.3108 | 0.82 | 0.4138 |
earlystart | 2 | 0.1410 | 0.0983 | -0.0516 | 0.3336 | 1.43 | 0.1513 |