<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Getting different estimates when the data sorted based on ID to a randomly sorted data in proc g in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Getting-different-estimates-when-the-data-sorted-based-on-ID-to/m-p/884544#M43811</link>
    <description>&lt;P&gt;I don't see any Sort code.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To get "expected" results you need to provide 1) the data set and 2) what the expected result may be.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Depending on the actual underlying algorithms some change could be expected from different orders of the data as rounding/internal summary steps could yield different results. The question is how much and is there a practical difference? A difference of $1. when discussing values in $1,000,000,000 ranges is not likely important but if all of the values are less than $10 it would likely be a practical difference.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In one shop I worked with we had some model software where we changed the order of the variables on the MODEL statement equivalent (not SAS so different code). The result could vary quite a bit depending on the order the variables appeared. So if we got a "large" difference in result that model was deemed unusable even though some order of the variables would yield extremely good diagnostic values.&lt;/P&gt;</description>
    <pubDate>Wed, 12 Jul 2023 18:11:15 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2023-07-12T18:11:15Z</dc:date>
    <item>
      <title>Getting different estimates when the data sorted based on ID to a randomly sorted data in proc gee</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Getting-different-estimates-when-the-data-sorted-based-on-ID-to/m-p/884524#M43807</link>
      <description>&lt;P&gt;Has someone else ever faced this situation:&lt;/P&gt;&lt;P&gt;I&lt;CODE&gt;m using **proc gee** to get a pulled estimate and then instead of using&lt;/CODE&gt;the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;by statement&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;to have the stratified analysis and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;proc sort&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;to sort based on the stratified variable, I use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;where variable=0&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;and&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;where variable=1&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;Now, the weird thing is if I sort my data based on ID before getting the pulled estimates, I get different coefficient estimates, if I don't sort at all I get another set of estimates (which in this case, the pulled estimate doesn`t lay between the two stratified-estimate intervals) and if I sort based on my stratified variable I get another set of coefficient estimates.&lt;/P&gt;&lt;P&gt;I have never heard that we need to sort the data before running&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;proc gee&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;for pulled estimates but also why my estimates are not laying in the interval when my dataset is randomly sorted?, and why am I getting different estimates when I sort the dataset every time something different when I sort based on ID or based on sex (my stratified variable)!?&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Pulled estimate:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;proc gee data=data;&lt;/P&gt;&lt;P&gt;class x1 sex x2;&lt;/P&gt;&lt;P&gt;model y = x1 sex x2 x3 ;&lt;/P&gt;&lt;P&gt;repeated subject = x1 / type=un;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Stratified estimates:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;proc gee data=data;&lt;/P&gt;&lt;P&gt;where sex=1;&lt;/P&gt;&lt;P&gt;class x1 x2;&lt;/P&gt;&lt;P&gt;model y = x1 x2 x3 ;&lt;/P&gt;&lt;P&gt;repeated subject = x1 / type=un;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;proc gee data=data;&lt;/P&gt;&lt;P&gt;where sex=0;&lt;/P&gt;&lt;P&gt;class x1 x2;&lt;/P&gt;&lt;P&gt;model y = x1 x2 x3 ;&lt;/P&gt;&lt;P&gt;repeated subject = x1 / type=un;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;My expected outcome:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;the beta estimates for x2 and x3 in stratified analysis(sex=0/1) &amp;lt;the beta estimates for x2 and x3 in pulled analysis&amp;lt; the beta estimates for x2 and x3 in stratified analysis(sex=0/1)&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;My remedy:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;proc sort data=data; by ID (and once by sex); run;&lt;/P&gt;&lt;P&gt;Getting a complete different estimates yet not getting the expected outcome&lt;/P&gt;</description>
      <pubDate>Wed, 12 Jul 2023 20:13:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Getting-different-estimates-when-the-data-sorted-based-on-ID-to/m-p/884524#M43807</guid>
      <dc:creator>hagml</dc:creator>
      <dc:date>2023-07-12T20:13:05Z</dc:date>
    </item>
    <item>
      <title>Re: Getting different estimates when the data sorted based on ID to a randomly sorted data in proc g</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Getting-different-estimates-when-the-data-sorted-based-on-ID-to/m-p/884544#M43811</link>
      <description>&lt;P&gt;I don't see any Sort code.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To get "expected" results you need to provide 1) the data set and 2) what the expected result may be.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Depending on the actual underlying algorithms some change could be expected from different orders of the data as rounding/internal summary steps could yield different results. The question is how much and is there a practical difference? A difference of $1. when discussing values in $1,000,000,000 ranges is not likely important but if all of the values are less than $10 it would likely be a practical difference.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In one shop I worked with we had some model software where we changed the order of the variables on the MODEL statement equivalent (not SAS so different code). The result could vary quite a bit depending on the order the variables appeared. So if we got a "large" difference in result that model was deemed unusable even though some order of the variables would yield extremely good diagnostic values.&lt;/P&gt;</description>
      <pubDate>Wed, 12 Jul 2023 18:11:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Getting-different-estimates-when-the-data-sorted-based-on-ID-to/m-p/884544#M43811</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2023-07-12T18:11:15Z</dc:date>
    </item>
    <item>
      <title>Re: Getting different estimates when the data sorted based on ID to a randomly sorted data in proc g</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Getting-different-estimates-when-the-data-sorted-based-on-ID-to/m-p/884557#M43813</link>
      <description>&lt;P&gt;In each of your stratified estimates, you filter on a single value of SEX.&amp;nbsp; Yet in the corresponding PROC GEE code you include SEX in a CLASS statement and a MODEL statement.&amp;nbsp; Why?&amp;nbsp; Does the inclusion of this unnecessary and unhelpful predictor impact the GEE algorithm?&amp;nbsp; As I have no GEE experience, I can't offer an answer.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Jul 2023 19:29:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Getting-different-estimates-when-the-data-sorted-based-on-ID-to/m-p/884557#M43813</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2023-07-12T19:29:02Z</dc:date>
    </item>
    <item>
      <title>Re: Getting different estimates when the data sorted based on ID to a randomly sorted data in proc g</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Getting-different-estimates-when-the-data-sorted-based-on-ID-to/m-p/884559#M43814</link>
      <description>&lt;P&gt;Thank you.&lt;/P&gt;&lt;P&gt;About the data, I can`t unfortunately share the data and in a hypothetic data again it is possible that the issue I am facing can not be replicated.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But the difference are quiet dramatic. i.e positive estimates become negative or instead of 0.22 I get 0.77.&lt;/P&gt;</description>
      <pubDate>Wed, 12 Jul 2023 19:32:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Getting-different-estimates-when-the-data-sorted-based-on-ID-to/m-p/884559#M43814</guid>
      <dc:creator>hagml</dc:creator>
      <dc:date>2023-07-12T19:32:47Z</dc:date>
    </item>
    <item>
      <title>Re: Getting different estimates when the data sorted based on ID to a randomly sorted data in proc g</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Getting-different-estimates-when-the-data-sorted-based-on-ID-to/m-p/884561#M43815</link>
      <description>&lt;P&gt;Thank you for the comment.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It was a typo in the body of the question which is now modified.&lt;/P&gt;</description>
      <pubDate>Wed, 12 Jul 2023 19:39:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Getting-different-estimates-when-the-data-sorted-based-on-ID-to/m-p/884561#M43815</guid>
      <dc:creator>hagml</dc:creator>
      <dc:date>2023-07-12T19:39:30Z</dc:date>
    </item>
    <item>
      <title>Re: Getting different estimates when the data sorted based on ID to a randomly sorted data in proc g</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Getting-different-estimates-when-the-data-sorted-based-on-ID-to/m-p/884568#M43817</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/445978"&gt;@hagml&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Thank you.&lt;/P&gt;
&lt;P&gt;About the data, I can`t unfortunately share the data and in a hypothetic data again it is possible that the issue I am facing can not be replicated.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But the difference are quiet dramatic. i.e positive estimates become negative or instead of 0.22 I get 0.77.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Try creating a simulated dataset that replicates the problem.&amp;nbsp; Then please post the code to create the simulated dataset, and the PROC GEE code that shows the surprising result.&amp;nbsp; So sort the data one way, run PROC GEE, sort a different way, and run PROC GEE again.&amp;nbsp; That way you would be providing a fully reproducible example of the problem, which people can use to test and explore.&amp;nbsp; If sort order matters, I would think you could show it fairly easily.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also, in your real code, do you perhaps have order=data specified somewhere?&amp;nbsp; In that case, the GEE step would use the order of the data to determine the category used for as the reference category for the CLASS variables.&amp;nbsp; But if that's the issue, it should be pretty obvious, as it would effect the parameter estimates but not over model statistics.&amp;nbsp; (I assume, I haven't used PROC GEE).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I tried an example I stole from the docs ( &lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_code_geeex1.htm" target="_blank"&gt;https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_code_geeex1.htm&lt;/A&gt;&amp;nbsp;) , but couldn't make sort order change the results.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data Resp;
   input Center ID Treatment $ Sex $ Age Baseline Visit1-Visit4;
   datalines;
1  1 P M 46 0 0 0 0 0
1  2 P M 28 0 0 0 0 0
1  3 A M 23 1 1 1 1 1
1  4 P M 44 1 1 1 1 0
1  5 P F 13 1 1 1 1 1
1  6 A M 34 0 0 0 0 0
1  7 P M 43 0 1 0 1 1
1  8 A M 28 0 0 0 0 0
1  9 A M 31 1 1 1 1 1
1 10 P M 37 1 0 1 1 0
1 11 A M 30 1 1 1 1 1
1 12 A M 14 0 1 1 1 0
1 13 P M 23 1 1 0 0 0
1 14 P M 30 0 0 0 0 0
1 15 P M 20 1 1 1 1 1
1 16 A M 22 0 0 0 0 1
1 17 P M 25 0 0 0 0 0
1 18 A F 47 0 0 1 1 1
1 19 P F 31 0 0 0 0 0
1 20 A M 20 1 1 0 1 0
1 21 A M 26 0 1 0 1 0
1 22 A M 46 1 1 1 1 1
1 23 A M 32 1 1 1 1 1
1 24 A M 48 0 1 0 0 0
1 25 P F 35 0 0 0 0 0
1 26 A M 26 0 0 0 0 0
1 27 P M 23 1 1 0 1 1
1 28 P F 36 0 1 1 0 0
1 29 P M 19 0 1 1 0 0
1 30 A M 28 0 0 0 0 0
1 31 P M 37 0 0 0 0 0
1 32 A M 23 0 1 1 1 1
1 33 A M 30 1 1 1 1 0
1 34 P M 15 0 0 1 1 0
1 35 A M 26 0 0 0 1 0
1 36 P F 45 0 0 0 0 0
1 37 A M 31 0 0 1 0 0
1 38 A M 50 0 0 0 0 0
1 39 P M 28 0 0 0 0 0
1 40 P M 26 0 0 0 0 0
1 41 P M 14 0 0 0 0 1
1 42 A M 31 0 0 1 0 0
1 43 P M 13 1 1 1 1 1
1 44 P M 27 0 0 0 0 0
1 45 P M 26 0 1 0 1 1
1 46 P M 49 0 0 0 0 0
1 47 P M 63 0 0 0 0 0
1 48 A M 57 1 1 1 1 1
1 49 P M 27 1 1 1 1 1
1 50 A M 22 0 0 1 1 1
1 51 A M 15 0 0 1 1 1
1 52 P M 43 0 0 0 1 0
1 53 A F 32 0 0 0 1 0
1 54 A M 11 1 1 1 1 0
1 55 P M 24 1 1 1 1 1
1 56 A M 25 0 1 1 0 1
2  1 P F 39 0 0 0 0 0
2  2 A M 25 0 0 1 1 1
2  3 A M 58 1 1 1 1 1
2  4 P F 51 1 1 0 1 1
2  5 P F 32 1 0 0 1 1
2  6 P M 45 1 1 0 0 0
2  7 P F 44 1 1 1 1 1
2  8 P F 48 0 0 0 0 0
2  9 A M 26 0 1 1 1 1
2 10 A M 14 0 1 1 1 1
2 11 P F 48 0 0 0 0 0
2 12 A M 13 1 1 1 1 1
2 13 P M 20 0 1 1 1 1
2 14 A M 37 1 1 0 0 1
2 15 A M 25 1 1 1 1 1
2 16 A M 20 0 0 0 0 0
2 17 P F 58 0 1 0 0 0
2 18 P M 38 1 1 0 0 0
2 19 A M 55 1 1 1 1 1
2 20 A M 24 1 1 1 1 1
2 21 P F 36 1 1 0 0 1
2 22 P M 36 0 1 1 1 1
2 23 A F 60 1 1 1 1 1
2 24 P M 15 1 0 0 1 1
2 25 A M 25 1 1 1 1 0
2 26 A M 35 1 1 1 1 1
2 27 A M 19 1 1 0 1 1
2 28 P F 31 1 1 1 1 1
2 29 A M 21 1 1 1 1 1
2 30 A F 37 0 1 1 1 1
2 31 P M 52 0 1 1 1 1
2 32 A M 55 0 0 1 1 0
2 33 P M 19 1 0 0 1 1
2 34 P M 20 1 0 1 1 1
2 35 P M 42 1 0 0 0 0
2 36 A M 41 1 1 1 1 1
2 37 A M 52 0 0 0 0 0
2 38 P F 47 0 1 1 0 1
2 39 P M 11 1 1 1 1 1
2 40 P M 14 0 0 0 1 0
2 41 P M 15 1 1 1 1 1
2 42 P M 66 1 1 1 1 1
2 43 A M 34 0 1 1 0 1
2 44 P M 43 0 0 0 0 0
2 45 P M 33 1 1 1 0 1
2 46 P M 48 1 1 0 0 0
2 47 A M 20 0 1 1 1 1
2 48 P F 39 1 0 1 0 0
2 49 A M 28 0 1 0 0 0
2 50 P F 38 0 0 0 0 0
2 51 A M 43 1 1 1 1 0
2 52 A F 39 0 1 1 1 1
2 53 A M 68 0 1 1 1 1
2 54 A F 63 1 1 1 1 1
2 55 A M 31 1 1 1 1 1
;

data Resp;
   set Resp;
   Visit=1;  Outcome=Visit1;  output;
   Visit=2;  Outcome=Visit2;  output;
   Visit=3;  Outcome=Visit3;  output;
   Visit=4;  Outcome=Visit4;  output;
run;

proc sort data=Resp ;
  by ID Visit ;
run ;

proc gee data=Resp descend;
   class ID Treatment Center Sex Baseline;
   model Outcome=Treatment Center Sex Age Baseline /
         dist=bin link=logit;
   repeated subject=ID(Center) / corr=exch corrw;
run;

proc sort data=Resp ;
  by age ;
run ;

proc gee data=Resp descend;
   class ID Treatment Center Sex Baseline;
   model Outcome=Treatment Center Sex Age Baseline /
         dist=bin link=logit;
   repeated subject=ID(Center) / corr=exch corrw;
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 12 Jul 2023 20:03:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Getting-different-estimates-when-the-data-sorted-based-on-ID-to/m-p/884568#M43817</guid>
      <dc:creator>Quentin</dc:creator>
      <dc:date>2023-07-12T20:03:09Z</dc:date>
    </item>
    <item>
      <title>Re: Getting different estimates when the data sorted based on ID to a randomly sorted data in proc g</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Getting-different-estimates-when-the-data-sorted-based-on-ID-to/m-p/884819#M43833</link>
      <description>&lt;P&gt;Run&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc freq data=data;
tables sex / missprint;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Does the output show any missing values for the SEX variable? Or any values other than 0/1? If so, the calls to PROC GEE are using different observations.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 14 Jul 2023 15:22:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Getting-different-estimates-when-the-data-sorted-based-on-ID-to/m-p/884819#M43833</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2023-07-14T15:22:09Z</dc:date>
    </item>
    <item>
      <title>Re: Getting different estimates when the data sorted based on ID to a randomly sorted data in proc g</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Getting-different-estimates-when-the-data-sorted-based-on-ID-to/m-p/886274#M43856</link>
      <description>Thanks much Rick. We do have 2 missing data in 900 total sample. But this made me think if having missing data in the outcome (which we have a lot) can also affect the estimates? I also tried to use proc genmod as I was getting error in hessian matrix when using proc gee (the error was again pointing out to the missing data) but with proc genmod, although I am not getting that error, the estimates again (!) are completely different from proc gee however, this issue in the question still remains.</description>
      <pubDate>Tue, 25 Jul 2023 17:31:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Getting-different-estimates-when-the-data-sorted-based-on-ID-to/m-p/886274#M43856</guid>
      <dc:creator>hagml</dc:creator>
      <dc:date>2023-07-25T17:31:40Z</dc:date>
    </item>
    <item>
      <title>Re: Getting different estimates when the data sorted based on ID to a randomly sorted data in proc g</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Getting-different-estimates-when-the-data-sorted-based-on-ID-to/m-p/886276#M43857</link>
      <description>&lt;P&gt;You can read about how PROC GEE handles missing values in the response by looking at the doc:&amp;nbsp;&lt;A href="https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_gee_details10.htm" target="_blank"&gt;SAS Help Center: Weighted Generalized Estimating Equations under the MAR Assumption&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you prefer experiments to theory, you can also run an experiment: Use a DATA step to set about 20-30 values of the response variables to missing and rerun the analysis. Study the output to see how the statistics change.&lt;/P&gt;</description>
      <pubDate>Tue, 25 Jul 2023 17:40:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Getting-different-estimates-when-the-data-sorted-based-on-ID-to/m-p/886276#M43857</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2023-07-25T17:40:59Z</dc:date>
    </item>
  </channel>
</rss>

