Good evening,
I have two questions when translating stata codes to sas codes.
First, for poisson regression with longitudinal panel data, I tried proc glimmix in sas but get different coefficient estimates from xtpoisson command from stata.
Can anyone give me some suggestions?
Stata code:
xtset patient visit
#delimit ;
xtpoisson y x1 x2 x3 x4 x5 x6 x7 x8 x9 x10, exposure(visit) normal;
#delimit cr
SAS code:
PROC GLIMMIX METHOD=QUAD(QPOINTS=50) DATA=data_a;
CLASS patient;
MODEL y = x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 / DIST=POISSON OFFSET=logvisit LINK=LOG S;
RANDOM INTERCEPT / SUBJECT=patient TYPE=UN;
RUN;
Second, I also fit a cross-sectional simple poisson regression to get predicted counts.
However, stata and sas seems to produce the predcited value differently.
Based on the code below, can anyone give me some hints how to get the same predicted "lnlm" in sas?
Stata code:
#delimit ;
xi: poisson y x1 x2 x3 x4 x5 x6 x7 x8 x9 x10, exposure(visit);
#delimit cr
predict lnlm, xb nooffset
SAS code:
PROC GENMOD DATA=data_b;
CLASS patient;
MODEL y = x1 x2 x3 x4 x5 x6 x7 x8 x9 x10/ DIST=POISSON OFFSET=logvisit LINK=LOG TYPE3 WALD;
output out=out_b p=pred;
RUN;
Thanks!!
It is likely that the difference is due to the covariance structure. You used a unstructured covariance matrix, but I would guess that STATA use the varicance-component structure as default, as SAS also do.
Can you write your model in mathematical terms, not all here understand STATA code.
Hi Jacob,
I tried different covariance structure options, but the coefficients are still different.
I was only given the stata codes, but I assume the original coder wants to fit a random effect poisson regression with an exposure variable, and the random effect follows a normal distribution based on the code.
Thanks!
alphabeta
Hi alphabeta,
You might want to try this. SAS/ETS(R) 14.1 User's Guide
The COUNTREG procedure in ETS has panel data poisson regression which is likely similar to the Stata xtpoisson command.
try
proc countreg data=a groupid=id;
model y = x1-x10 / errorcomp=fixed dist=poisson; run;
Let me know if this helps. -Ken
Hi Ken,
Thanks for your method.
I tried this method with both fixed and random effect, but the coefficients are still not close either.
Thanks!
alphabeta
Hi all,
For the predict part, I figured out lnlm=log(pred/visit), pred=exp(lnlm)*visit
I didn't divide back the exposure variable, that is why stata and sas were different.
For the coefficients, somehow fitting simple poisson regression was not a problem, but fitting a random effect poisson regression using stata and sas gives me very different coefficient estimates.
Still not having a clue yet.
alphabeta
I think the coefficients are different because of your use of the exposure() options. If you run the following example you will find that the SAS and Stata coefficients are exactly the same.
data ships;
input ship time yr_con yr_op service accident op co_65_69 co_70_74 co_75_79
weight;
int=1;
cards;
1 1 1 1 127 0 0 0 0 0 3
1 2 1 2 63 0 1 0 0 0 4
1 3 2 1 1095 3 0 1 0 0 4
1 4 2 2 1095 4 1 1 0 0 3
1 5 3 1 1512 6 0 0 1 0 6
1 6 3 2 3353 18 1 0 1 0 7
1 7 4 1 . . 0 0 0 1 8
1 8 4 2 2244 11 1 0 0 1 7
2 1 1 1 44882 39 0 0 0 0 4
2 2 1 2 17176 29 1 0 0 0 5
2 3 2 1 28609 58 0 1 0 0 3
2 4 2 2 20370 53 1 1 0 0 2
2 5 3 1 7064 12 0 0 1 0 3
2 6 3 2 13099 44 1 0 1 0 4
2 7 4 1 . . 0 0 0 1 6
2 8 4 2 7117 18 1 0 0 1 5
3 1 1 1 1179 1 0 0 0 0 4
3 2 1 2 552 1 1 0 0 0 3
3 3 2 1 781 0 0 1 0 0 2
3 4 2 2 676 1 1 1 0 0 1
3 5 3 1 783 6 0 0 1 0 3
3 6 3 2 1948 2 1 0 1 0 4
3 7 4 1 . . 0 0 0 1 5
3 8 4 2 274 1 1 0 0 1 3
4 1 1 1 251 0 0 0 0 0 2
4 2 1 2 105 0 1 0 0 0 1
4 3 2 1 288 0 0 1 0 0 5
4 4 2 2 192 0 1 1 0 0 5
4 5 3 1 349 2 0 0 1 0 4
4 6 3 2 1208 11 1 0 1 0 3
4 7 4 1 . . 0 0 0 1 2
4 8 4 2 2051 4 1 0 0 1 4
5 1 1 1 45 0 0 0 0 0 5
5 2 1 2 . . 1 0 0 0 6
5 3 2 1 789 7 0 1 0 0 7
5 4 2 2 437 7 1 1 0 0 8
5 5 3 1 1157 5 0 0 1 0 9
5 6 3 2 2161 12 1 0 1 0 6
5 7 4 1 . . 0 0 0 1 4
5 8 4 2 542 1 1 0 0 1 3
;
data two;
set ships;
lnservice = log(service);
run;
proc countreg data=two dist=poisson groupid=ship;
model accident=op co_65_69 co_70_74 co_75_79/ errorcomp=fixed offset=lnservice;
run;
proc countreg data=two dist=negbin groupid=ship;
model accident=op co_65_69 co_70_74 co_75_79/ errorcomp=fixed offset=lnservice;
run;
*poisson with RE and no exposure(service) option;
proc countreg data=two dist=poisson groupid=ship;
model accident=op co_65_69 co_70_74 co_75_79/ errorcomp=random /*offset=lnservice*/;
run;
*Poisson with RE and exposure(service);
proc countreg data=two dist=poisson groupid=ship ;
model accident=op co_65_69 co_70_74 co_75_79/ errorcomp=random offset=lnservice;
run;
proc export data=two outfile="C:\Public\two.dta"; run;
/*open stata and run this
use "C:\Public\two.dta", clear
xtset ship
*coefficients match the "Poisson with RE and no exposure(service) option" exactly
xtpoisson accident op co_65_69 co_70_74 co_75_79, re
estimates store re
*coefficients matches the "Poisson with RE and exposure(service)" exactly
xtpoisson accident op co_65_69 co_70_74 co_75_79, exposure(service) re
estimates store re_wos
outreg2 [re re_wos] using compare, addstat(log-like, `e(ll)')
Hi Ken,
Somehow running sas 9.3 gives me error message as follows:
271 proc countreg data=two dist=poisson groupid=ship ;
-------
22
76
ERROR 22-322: Syntax error, expecting one of the following: ;, ABSFCONV, ABSGCONV, CORRB, COVB,
COVEST, COVOUT, DATA, DIST, FCONV, GCONV, ITPRINT, MAXITER, METHOD, NOPRINT,
OUTEST, PRINTALL, SEED, TYPE.
ERROR 76-322: Syntax error, statement will be ignored.
If change the codes as
proc countreg data=two dist=poisson;
by ship ;
then sas gives me another error message
284 proc countreg data=two dist=poisson;
285 by ship ;
286 model accident=op co_65_69 co_70_74 co_75_79/ errorcomp=random offset=lnservice;
---------
22
76
ERROR 22-322: Syntax error, expecting one of the following: ;, CORRB, COVB, COVEST, COVOUT,
DIST, ITPRINT, METHOD, NOINT, NOPRINT, OFFSET, OUTEST, PRINTALL, TYPE.
ERROR 76-322: Syntax error, statement will be ignored.
Can you take a look for me? Thanks!!
alphabeta,
Our apologies. I was using syntax that we introduced in a more recent version. Please try the following modifications.
remove the groupid=
include a id statement
change countreg to TCOUNTREG
For example
*poisson with RE and no exposure(service) option;
proc tcountreg data=two dist=poisson ;
id ship;
model accident=op co_65_69 co_70_74 co_75_79/ errorcomp=random /*offset=lnservice*/;
run;
*Poisson with RE and exposure(service);
proc tcountreg data=two dist=poisson;
id=ship;
model accident=op co_65_69 co_70_74 co_75_79/ errorcomp=random offset=lnservice;
run;
I have run this vs old syntax and get the same estimates from older to newer versions. ==> it will still match Stata's coefficients.
Good luck-Ken
Hi Ken,
I ran both sas 9.3 and sas 9.4 today, but the coefficients are still different from the stata command given.
The codes and results are as follows
Stata codes:
xtset patient visit
#delimit ;
xtpoisson y x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16, exposure(visit) normal;
#delimit cr
Stata results
Fitting comparison Poisson model:
Iteration 0: log likelihood = -13626.58
Iteration 1: log likelihood = -13626.554
Iteration 2: log likelihood = -13626.554
Fitting full model:
tau = 0.0 log likelihood = -13626.554
tau = 0.1 log likelihood = -13563.763
tau = 0.2 log likelihood = -13549.87
tau = 0.3 log likelihood = -13562.89
Iteration 0: log likelihood = -13549.935
Iteration 1: log likelihood = -13547.007
Iteration 2: log likelihood = -13546.996
Iteration 3: log likelihood = -13546.996
Random-effects Poisson regression Number of obs
> =
> 100000
Group variable: patient Number of groups
> =
> 2391
Random effects u_i ~ Gaussian Obs per group: min
> =
> 1
avg
> =
> 41.8
max
> =
> 231
Integration method: mvaghermite Integration points
> =
> 12
Wald chi2(16)
> =
> 308.08
Log likelihood = -13546.996 Prob > chi2
> =
> 0.0000
-------------------------------------------------------------------
> -----------
y | Coef. Std. Err. z P>|z| [95% Conf
> . Interval]
-------------+-----------------------------------------------------
> -----------
x1 | -.0023401 .0019663 -1.19 0.234 -.0061939
> .0015137
x2 | .0791925 .0509703 1.55 0.120 -.0207075
> .1790925
x3 | -.0556415 .0857818 -0.65 0.517 -.2237707
> .1124877
x4 | .2604183 .1530857 1.70 0.089 -.0396241
> .5604608
x5| .0260236 .1988315 0.13 0.896 -.363679
> .4157261
x6 | .0096076 .0558143 0.17 0.863 -.0997864
> .1190016
x7 | .8417932 .0595242 14.14 0.000 .7251279
> .9584586
x8 | -.0860519 .0591174 -1.46 0.146 -.2019199
> .029816
x9 | -.3834076 .0742257 -5.17 0.000 -.5288874
> -.2379278
x10| .0169214 .0610793 0.28 0.782 -.1027917
> .1366346
x11 | -.2420708 .0549503 -4.41 0.000 -.3497714
> -.1343701
x12 | -.3106347 .0593382 -5.23 0.000 -.4269354
> -.194334
x13 | -.1306361 .0632131 -2.07 0.039 -.2545315
> -.0067407
x14| -.1735819 .0566482 -3.06 0.002 -.2846104
> -.0625535
x15 | -.2393673 .0561851 -4.26 0.000 -.3494881
> -.1292465
x16 | -.0680811 .0809386 -0.84 0.400 -.2267179
> .0905557
_cons | -7.036347 .1595935 -44.09 0.000 -7.349144
> -6.723549
ln(visit) | 1 (exposure)
-------------+-----------------------------------------------------
> -----------
/lnsig2u | -1.364564 .1253317 -10.89 0.000 -1.610209
> -1.118918
-------------+-----------------------------------------------------
> -----------
sigma_u | .5054623 .0316752 .4470412
> .5715182
-------------------------------------------------------------------
> -----------
Likelihood-ratio test of sigma_u=0: chibar2(01) = 159.11 Pr>=chib
> ar2 = 0.000
SAS 9.4 codes
proc countreg data=out_sample1 dist=poisson groupid=patient ;
model y = x1 x2 x3 x4 x5x 6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 / errorcomp=random offset=logvisit;
run;
SAS 9.4 results
The COUNTREG Procedure
Model Fit Summary
Dependent Variable y
Number of Observations 100000
Data Set WORK.OUT_SAMPLE1
Model Poisson
Error Component Random
Number of Cross Sections 2391
Offset Variable logvisit
Log Likelihood -13554
Maximum Absolute Gradient 9.56066E-7
Number of Iterations 15
Optimization Method Newton-Raphson
AIC 27143
SBC 27315
Algorithm converged.
Parameter Estimates
Parameter DF Estimate Standard Error t Value Pr > |t|
Intercept 1 -6.899390 0.156789 -44.00 <.0001
x1 1 -0.002426 0.001927 -1.26 0.2079
x2 1 0.091417 0.050119 1.82 0.0682
x3 1 -0.067597 0.084548 -0.80 0.4240
x4 1 0.267802 0.154460 1.73 0.0830
x5 1 0.039801 0.197891 0.20 0.8406
x6 1 0.003591 0.054764 0.07 0.9477
x7 1 0.848595 0.058601 14.48 <.0001
x8 1 -0.085057 0.058143 -1.46 0.1435
x9 1 -0.384157 0.072340 -5.31 <.0001
x10 1 0.022466 0.060081 0.37 0.7085
x11 1 -0.253048 0.054064 -4.68 <.0001
x12 1 -0.320980 0.058543 -5.48 <.0001
x13 1 -0.146113 0.062766 -2.33 0.0199
x14 1 -0.172344 0.055604 -3.10 0.0019
x15 -0.241250 0.055214 -4.37 <.0001
x16 1 -0.074001 0.079395 -0.93 0.3513
_Alpha 1 0.229979 0.028888 7.96 <.0001
SAS 9.3 codes
proc tcountreg data=out_sample1 dist=poisson;
id patient;
model y= x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 / errorcomp=random offset=logvisit;
run;
SAS 9.3 results
The TCOUNTREG Procedure
Model Fit Summary
Dependent Variable y
Number of Observations 100000
Data Set WORK.OUT_SAMPLE1
Model Poisson
Error Component Random
Offset Variable logvisit
Log Likelihood -13557
Maximum Absolute Gradient 1.12475E-6
Number of Iterations 14
Optimization Method Newton-Raphson
AIC 27151
SBC 27322
Algorithm converged.
Parameter Estimates
Parameter DF Estimate Standard Error t Value Pr > |t|
Intercept 1 -6.899832 0.153968 -44.81 <.0001
x1 1 -0.002457 0.001891 -1.30 0.1938
x2 1 0.086361 0.049271 1.75 0.0796
x3 1 -0.070813 0.083061 -0.85 0.3939
x4 1 0.263515 0.152089 1.73 0.0832
x5 1 0.030784 0.195286 0.16 0.8747
x6 1 -0.000162 0.053809 -0.00 0.9976
x7 1 0.837899 0.057722 14.52 <.0001
x8 1 -0.084471 0.057015 -1.48 0.1385
x9 1 -0.373889 0.071050 -5.26 <.0001
x10 1 0.019545 0.058936 0.33 0.7402
x11 1 -0.246067 0.053034 -4.64 <.0001
x12 1 -0.315442 0.057560 -5.48 <.0001
x13 1 -0.141599 0.061752 -2.29 0.0218
x14 1 -0.175507 0.054645 -3.21 0.0013
x15 1 -0.241094 0.054212 -4.45 <.0001
x16 1 -0.080489 0.077757 -1.04 0.3006
_Alpha 1 0.222232 0.028434 7.82 <.0001
Is this difference due to "unbalanced"panel data?
I noticed stata results indicate panel variable: patient (unbalanced).
Thanks!!
alphabeta
It is possible that it is due to an imbalance. You could check this by deleting a couple time obs from a cross section and then checking against my code i provided.
I am afraid that if you need any more assistance, you will have to contact Tech Support and open a formal track.
1-800-727-0025
Good luck-Ken
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.