Has someone else ever faced this situation:
Im using **proc gee** to get a pulled estimate and then instead of using
the by statement to have the stratified analysis and proc sort to sort based on the stratified variable, I use where variable=0 and where variable=1.
Now, the weird thing is if I sort my data based on ID before getting the pulled estimates, I get different coefficient estimates, if I don't sort at all I get another set of estimates (which in this case, the pulled estimate doesn`t lay between the two stratified-estimate intervals) and if I sort based on my stratified variable I get another set of coefficient estimates.
I have never heard that we need to sort the data before running proc gee for pulled estimates but also why my estimates are not laying in the interval when my dataset is randomly sorted?, and why am I getting different estimates when I sort the dataset every time something different when I sort based on ID or based on sex (my stratified variable)!?
Pulled estimate:
proc gee data=data;
class x1 sex x2;
model y = x1 sex x2 x3 ;
repeated subject = x1 / type=un;
run;
Stratified estimates:
proc gee data=data;
where sex=1;
class x1 x2;
model y = x1 x2 x3 ;
repeated subject = x1 / type=un;
run;
proc gee data=data;
where sex=0;
class x1 x2;
model y = x1 x2 x3 ;
repeated subject = x1 / type=un;
run;
My expected outcome:
the beta estimates for x2 and x3 in stratified analysis(sex=0/1) <the beta estimates for x2 and x3 in pulled analysis< the beta estimates for x2 and x3 in stratified analysis(sex=0/1)
My remedy: proc sort data=data; by ID (and once by sex); run;
Getting a complete different estimates yet not getting the expected outcome
I don't see any Sort code.
To get "expected" results you need to provide 1) the data set and 2) what the expected result may be.
Depending on the actual underlying algorithms some change could be expected from different orders of the data as rounding/internal summary steps could yield different results. The question is how much and is there a practical difference? A difference of $1. when discussing values in $1,000,000,000 ranges is not likely important but if all of the values are less than $10 it would likely be a practical difference.
In one shop I worked with we had some model software where we changed the order of the variables on the MODEL statement equivalent (not SAS so different code). The result could vary quite a bit depending on the order the variables appeared. So if we got a "large" difference in result that model was deemed unusable even though some order of the variables would yield extremely good diagnostic values.
Thank you.
About the data, I can`t unfortunately share the data and in a hypothetic data again it is possible that the issue I am facing can not be replicated.
But the difference are quiet dramatic. i.e positive estimates become negative or instead of 0.22 I get 0.77.
@hagml wrote:
Thank you.
About the data, I can`t unfortunately share the data and in a hypothetic data again it is possible that the issue I am facing can not be replicated.
But the difference are quiet dramatic. i.e positive estimates become negative or instead of 0.22 I get 0.77.
Try creating a simulated dataset that replicates the problem. Then please post the code to create the simulated dataset, and the PROC GEE code that shows the surprising result. So sort the data one way, run PROC GEE, sort a different way, and run PROC GEE again. That way you would be providing a fully reproducible example of the problem, which people can use to test and explore. If sort order matters, I would think you could show it fairly easily.
Also, in your real code, do you perhaps have order=data specified somewhere? In that case, the GEE step would use the order of the data to determine the category used for as the reference category for the CLASS variables. But if that's the issue, it should be pretty obvious, as it would effect the parameter estimates but not over model statistics. (I assume, I haven't used PROC GEE).
I tried an example I stole from the docs ( https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_code_geeex1.htm ) , but couldn't make sort order change the results.
data Resp;
input Center ID Treatment $ Sex $ Age Baseline Visit1-Visit4;
datalines;
1 1 P M 46 0 0 0 0 0
1 2 P M 28 0 0 0 0 0
1 3 A M 23 1 1 1 1 1
1 4 P M 44 1 1 1 1 0
1 5 P F 13 1 1 1 1 1
1 6 A M 34 0 0 0 0 0
1 7 P M 43 0 1 0 1 1
1 8 A M 28 0 0 0 0 0
1 9 A M 31 1 1 1 1 1
1 10 P M 37 1 0 1 1 0
1 11 A M 30 1 1 1 1 1
1 12 A M 14 0 1 1 1 0
1 13 P M 23 1 1 0 0 0
1 14 P M 30 0 0 0 0 0
1 15 P M 20 1 1 1 1 1
1 16 A M 22 0 0 0 0 1
1 17 P M 25 0 0 0 0 0
1 18 A F 47 0 0 1 1 1
1 19 P F 31 0 0 0 0 0
1 20 A M 20 1 1 0 1 0
1 21 A M 26 0 1 0 1 0
1 22 A M 46 1 1 1 1 1
1 23 A M 32 1 1 1 1 1
1 24 A M 48 0 1 0 0 0
1 25 P F 35 0 0 0 0 0
1 26 A M 26 0 0 0 0 0
1 27 P M 23 1 1 0 1 1
1 28 P F 36 0 1 1 0 0
1 29 P M 19 0 1 1 0 0
1 30 A M 28 0 0 0 0 0
1 31 P M 37 0 0 0 0 0
1 32 A M 23 0 1 1 1 1
1 33 A M 30 1 1 1 1 0
1 34 P M 15 0 0 1 1 0
1 35 A M 26 0 0 0 1 0
1 36 P F 45 0 0 0 0 0
1 37 A M 31 0 0 1 0 0
1 38 A M 50 0 0 0 0 0
1 39 P M 28 0 0 0 0 0
1 40 P M 26 0 0 0 0 0
1 41 P M 14 0 0 0 0 1
1 42 A M 31 0 0 1 0 0
1 43 P M 13 1 1 1 1 1
1 44 P M 27 0 0 0 0 0
1 45 P M 26 0 1 0 1 1
1 46 P M 49 0 0 0 0 0
1 47 P M 63 0 0 0 0 0
1 48 A M 57 1 1 1 1 1
1 49 P M 27 1 1 1 1 1
1 50 A M 22 0 0 1 1 1
1 51 A M 15 0 0 1 1 1
1 52 P M 43 0 0 0 1 0
1 53 A F 32 0 0 0 1 0
1 54 A M 11 1 1 1 1 0
1 55 P M 24 1 1 1 1 1
1 56 A M 25 0 1 1 0 1
2 1 P F 39 0 0 0 0 0
2 2 A M 25 0 0 1 1 1
2 3 A M 58 1 1 1 1 1
2 4 P F 51 1 1 0 1 1
2 5 P F 32 1 0 0 1 1
2 6 P M 45 1 1 0 0 0
2 7 P F 44 1 1 1 1 1
2 8 P F 48 0 0 0 0 0
2 9 A M 26 0 1 1 1 1
2 10 A M 14 0 1 1 1 1
2 11 P F 48 0 0 0 0 0
2 12 A M 13 1 1 1 1 1
2 13 P M 20 0 1 1 1 1
2 14 A M 37 1 1 0 0 1
2 15 A M 25 1 1 1 1 1
2 16 A M 20 0 0 0 0 0
2 17 P F 58 0 1 0 0 0
2 18 P M 38 1 1 0 0 0
2 19 A M 55 1 1 1 1 1
2 20 A M 24 1 1 1 1 1
2 21 P F 36 1 1 0 0 1
2 22 P M 36 0 1 1 1 1
2 23 A F 60 1 1 1 1 1
2 24 P M 15 1 0 0 1 1
2 25 A M 25 1 1 1 1 0
2 26 A M 35 1 1 1 1 1
2 27 A M 19 1 1 0 1 1
2 28 P F 31 1 1 1 1 1
2 29 A M 21 1 1 1 1 1
2 30 A F 37 0 1 1 1 1
2 31 P M 52 0 1 1 1 1
2 32 A M 55 0 0 1 1 0
2 33 P M 19 1 0 0 1 1
2 34 P M 20 1 0 1 1 1
2 35 P M 42 1 0 0 0 0
2 36 A M 41 1 1 1 1 1
2 37 A M 52 0 0 0 0 0
2 38 P F 47 0 1 1 0 1
2 39 P M 11 1 1 1 1 1
2 40 P M 14 0 0 0 1 0
2 41 P M 15 1 1 1 1 1
2 42 P M 66 1 1 1 1 1
2 43 A M 34 0 1 1 0 1
2 44 P M 43 0 0 0 0 0
2 45 P M 33 1 1 1 0 1
2 46 P M 48 1 1 0 0 0
2 47 A M 20 0 1 1 1 1
2 48 P F 39 1 0 1 0 0
2 49 A M 28 0 1 0 0 0
2 50 P F 38 0 0 0 0 0
2 51 A M 43 1 1 1 1 0
2 52 A F 39 0 1 1 1 1
2 53 A M 68 0 1 1 1 1
2 54 A F 63 1 1 1 1 1
2 55 A M 31 1 1 1 1 1
;
data Resp;
set Resp;
Visit=1; Outcome=Visit1; output;
Visit=2; Outcome=Visit2; output;
Visit=3; Outcome=Visit3; output;
Visit=4; Outcome=Visit4; output;
run;
proc sort data=Resp ;
by ID Visit ;
run ;
proc gee data=Resp descend;
class ID Treatment Center Sex Baseline;
model Outcome=Treatment Center Sex Age Baseline /
dist=bin link=logit;
repeated subject=ID(Center) / corr=exch corrw;
run;
proc sort data=Resp ;
by age ;
run ;
proc gee data=Resp descend;
class ID Treatment Center Sex Baseline;
model Outcome=Treatment Center Sex Age Baseline /
dist=bin link=logit;
repeated subject=ID(Center) / corr=exch corrw;
run;
In each of your stratified estimates, you filter on a single value of SEX. Yet in the corresponding PROC GEE code you include SEX in a CLASS statement and a MODEL statement. Why? Does the inclusion of this unnecessary and unhelpful predictor impact the GEE algorithm? As I have no GEE experience, I can't offer an answer.
Thank you for the comment.
It was a typo in the body of the question which is now modified.
Run
proc freq data=data;
tables sex / missprint;
run;
Does the output show any missing values for the SEX variable? Or any values other than 0/1? If so, the calls to PROC GEE are using different observations.
You can read about how PROC GEE handles missing values in the response by looking at the doc: SAS Help Center: Weighted Generalized Estimating Equations under the MAR Assumption
If you prefer experiments to theory, you can also run an experiment: Use a DATA step to set about 20-30 values of the response variables to missing and rerun the analysis. Study the output to see how the statistics change.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.