BookmarkSubscribeRSS Feed
katy-barry
Calcite | Level 5

Hello Everyone,

         I am currently trying to perform a PROC GEE model on data that I transposed. I get results but I think that something is wrong here. 

 

/*explanation of variables:

I have data from 2009, 2011, 2014, and 2018 measuring if a person is unemployed or not. earlystart is categorized into three categories (0,1,2,). first, I recoded each of these variables so that they would be the same*/

 

data chomage;

set cannabis;

keep earlystart NTT j09_EVEN12M10 J11_CHOMAGE J14_CHOMAGE J18_CHOMAGE;

run;

 

data chomage2; 

set chomage;

if j09_EVEN12M10=2 then chomj09=0;

if j09_EVEN12M10=1 then chomj09=1;

if j11_CHOMAGE = 2 then chomj11=0;

else if j11_CHOMAGE = 1 then chomj11=1;

chomj14 = j14_CHOMAGE;

chomj18= J18_CHOMAGE;

DROP

j09_EVEN12M10

j11_CHOMAGE

j14_chomage

J18_CHOMAGE;

run;

 

/*then I transposed my data. NTT is my I.D. variable*/

 

proc transpose data=chomage2 out=transpose3;

   by ntt earlystart; /*earlystart is not changing by each wave of data*/

   var chomj09 chomj11 chomj14 chomj18;

run;

 

/*renamed the column*/

data chomagetranspose;

  set transpose3 (rename=(col1=chomj));

  drop _name_;

run;

 

/* I do not know if this is necessary but I removed the missing values for if my independent and dependent variable were missing*/

data finaldeleted;

set chomagetranspose;

if earlystart =. or chomj=. then delete;

run;

 

/*running the final model*/

proc genmod data=finaldeleted descending;

class ntt earlystart;

model chomj = earlystart/ dist=bin link=logit;

repeated subject=ntt / type=exch covb corrw; 

run;

 

my results below:

 

 

Model Information

Data Set

WORK.FINALDELETED

Distribution

Binomial

Link Function

Logit

Dependent Variable

chomj

 

 

Number of Observations Read

3483

Number of Observations Used

3483

Number of Events

1999

Number of Trials

3483

 

 

Class Level Information

Class

Levels

Values

NTT

1476

142 149 162 165 169 170 177 178 179 180 182 188 192 200 207 215 217 218 224 228 238 246 258 264 270 274 284 285 286 289 292 295 296 297 298 303 306 307 310 322 326 333 343 349 351 355 356 358 360 361 362 363 364 373 374 387 393 394 398 409 413 414 415 ...

earlystart

3

0 1 2

 

 

Response Profile

Ordered
Value

chomj

Total
Frequency

1

1

1999

2

0

1484

 

PROC GENMOD is modeling the probability that chomj='1'.

 

/*I want chomage=1 because it means the probability that a person will be 

unemployed*/

 

Parameter Information

Parameter

Effect

earlystart

Prm1

Intercept

 

Prm2

earlystart

0

Prm3

earlystart

1

Prm4

earlystart

2

 

 

Algorithm converged.

 

 

GEE Model Information

Correlation Structure

Exchangeable

Subject Effect

NTT (1476 levels)

Number of Clusters

1476

Correlation Matrix Dimension

4

Maximum Cluster Size

4

Minimum Cluster Size

1

 

 

Covariance Matrix (Model-Based)

 

Prm1

Prm2

Prm3

Prm1

0.004538

-0.004538

-0.004538

Prm2

-0.004538

0.009496

0.004538

Prm3

-0.004538

0.004538

0.01291

 

 

Covariance Matrix (Empirical)

 

Prm1

Prm2

Prm3

Prm1

0.004691

-0.004691

-0.004691

Prm2

-0.004691

0.009653

0.004691

Prm3

-0.004691

0.004691

0.01225

 

 

Algorithm converged.

 

 

Working Correlation Matrix

 

Col1

Col2

Col3

Col4

Row1

1.0000

0.3325

0.3325

0.3325

Row2

0.3325

1.0000

0.3325

0.3325

Row3

0.3325

0.3325

1.0000

0.3325

Row4

0.3325

0.3325

0.3325

1.0000

 

 

Exchangeable Working Correlation

Correlation

0.3324594755

 

 

GEE Fit Criteria

QIC

4764.3766

QICu

4761.2962

 

 

Analysis Of GEE Parameter Estimates

Empirical Standard Error Estimates

Parameter

 

Estimate

Standard
Error

95% Confidence Limits

Z

Pr > |Z|

Intercept

 

0.4582

0.0685

0.3239

0.5924

6.69

<.0001

earlystart

0

-0.1410

0.0983

-0.3336

0.0516

-1.43

0.1513

earlystart

1

-0.0495

0.1107

-0.2665

0.1674

-0.45

0.6546

earlystart

2

0.0000

0.0000

0.0000

0.0000

.

.

 

/*this is what I find strange. Why is the early start category (2) blank?

2 REPLIES 2
FreelanceReinh
Jade | Level 19

Hello @katy-barry and welcome to the SAS Support Communities!

 

I've run your PROC GENMOD step on simple, randomly generated data and -- as expected -- the last row in the "Analysis Of GEE Parameter Estimates" table is identical to what you've shown. The reason is: By default, PROC GENMOD uses GLM coding as the parameterization method for classification variables: see documentation of the CLASS statement (PARAM= option). This is explained in more detail in GLM Parameterization of Classification Variables and Effects and Other Parameterizations, where it says: "Parameter estimates of CLASS main effects that use the GLM coding scheme estimate the difference in the effects of each level compared to the last level."

 

The "last level" of earlystart is 2 and, of course, the difference in the effect of the last level compared to itself is trivially zero.

 

If you had specified reference cell coding

class ntt earlystart(param=ref ref='2');

the reference level (here: 2) would have been just omitted, without changing the other parameters and statistics.

 

Regardless of the parameterization method, there's one more parameter to be estimated than there are degrees of freedom. Hence, it is correct that one parameter is set to zero or omitted.

 

Similarly, adding the NOINT option to the MODEL statement would have displayed in row "earlystart 2" what is shown in row "Intercept" in your output (and the intercept parameter would be zero instead).

katy-barry
Calcite | Level 5

thank you I just tried (param=ref ref= '0') and it worked brilliantly. thank you for your help 🙂

 

Analysis Of GEE Parameter Estimates

Empirical Standard Error Estimates

Parameter

 

Estimate

Standard
Error

95% Confidence Limits

Z

Pr > |Z|

Intercept

 

0.3172

0.0704

0.1791

0.4552

4.50

<.0001

earlystart

1

0.0915

0.1119

-0.1279

0.3108

0.82

0.4138

earlystart

2

0.1410

0.0983

-0.0516

0.3336

1.43

0.1513

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 563 views
  • 1 like
  • 2 in conversation