<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Difference-in-difference analysis for rates using group level data in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Difference-in-difference-analysis-for-rates-using-group-level/m-p/764391#M37327</link>
    <description>Thank you very much! I learned a lot, esp the need to model the data with proc logistic first. Appreciate your quick response!</description>
    <pubDate>Fri, 27 Aug 2021 05:20:16 GMT</pubDate>
    <dc:creator>cjpsas</dc:creator>
    <dc:date>2021-08-27T05:20:16Z</dc:date>
    <item>
      <title>Difference-in-difference analysis for rates using group level data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Difference-in-difference-analysis-for-rates-using-group-level/m-p/764357#M37324</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am relatively new to SAS and am trying to conduct a DID analysis to determine how rates of 3 different health insurance policies&amp;nbsp;(ins = 0,1, or 2) changed from time t0 to t1 between states that implemented a policy (s=1) or did not implement a policy (s=0)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I don't have individual data, but instead have counts/percentages of insurance rates based on state and time.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I apologize if this is quite basic, but most of the examples I have found either involve differences in means or individual level data.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My data set is below:&lt;/P&gt;&lt;P&gt;ins s t count percent&lt;BR /&gt;0 0 0 281 5.3&lt;BR /&gt;0 0 1 97 5.0&lt;BR /&gt;0 1 0 841 3.4&lt;BR /&gt;0 1 1 154 1.8&lt;BR /&gt;1 0 0 410 7.7&lt;BR /&gt;1 0 1 159 8.3&lt;BR /&gt;1 1 0 2488 10.1&lt;BR /&gt;1 1 1 1193 14.1&lt;BR /&gt;2 0 0 4602 86.9&lt;BR /&gt;2 0 1 1671 86.7&lt;BR /&gt;2 1 0 21350 86.5&lt;BR /&gt;2 1 1 7137 84.1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Appreciate any help or hints!&lt;/P&gt;&lt;P&gt;Edit: In case the data set formatting gets messed up when this posts, I've attached a txt file as well&lt;/P&gt;</description>
      <pubDate>Fri, 27 Aug 2021 00:08:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Difference-in-difference-analysis-for-rates-using-group-level/m-p/764357#M37324</guid>
      <dc:creator>cjpsas</dc:creator>
      <dc:date>2021-08-27T00:08:35Z</dc:date>
    </item>
    <item>
      <title>Re: Difference-in-difference analysis for rates using group level data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Difference-in-difference-analysis-for-rates-using-group-level/m-p/764380#M37325</link>
      <description>&lt;P&gt;As always, you should search the SAS Notes and Samples at &lt;A href="http://support.sas.com/notes" target="_blank"&gt;http://support.sas.com/notes&lt;/A&gt;&amp;nbsp;and the list of Frequently-Asked for Statistics at &lt;A href="http://support.sas.com/kb/30333" target="_blank"&gt;http://support.sas.com/kb/30333&lt;/A&gt;&amp;nbsp;for relevant notes and sample programs. A search there will find &lt;A href="http://support.sas.com/kb/61830" target="_self"&gt;this note&lt;/A&gt; on estimating and testing the so-called "difference in difference". The second section of that note discusses and illustrates how this is done for a binary response. In your case with aggregated binary data, you need to obtain the denominators of the percentages and then use the events/trials syntax to model the aggregated data in PROC LOGISTIC. You can then proceed as shown there. The only wrinkle here is that I suspect you want separate estimates for the 3 policies. In that case, you need to include INS in the model and interact it with S and T to allow for the policies to have differing DIDs. Note in the results from the LSMEANS statement below that the observed percentages are the "Mean" values since this is a saturated model. You can then specify the DID contrast within each policy as shown in the code below. See the documentation of the &lt;A href="http://support.sas.com/kb/62362" target="_self"&gt;NLMeans macro&lt;/A&gt; for details and many examples of its use.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data x;
input ins s t count percent;
n=round(count/(percent/100));
datalines;
0 0 0 281 5.3
0 0 1 97 5.0
0 1 0 841 3.4
0 1 1 154 1.8
1 0 0 410 7.7
1 0 1 159 8.3
1 1 0 2488 10.1
1 1 1 1193 14.1
2 0 0 4602 86.9
2 0 1 1671 86.7
2 1 0 21350 86.5
2 1 1 7137 84.1
;
      proc logistic data=x;
        class ins s t / param=glm ref=first;
        model count/n = ins|s|t;
        lsmeans ins*s*t / e ilink;
        ods output coef=coeffs;
        store log;
        run;
      data difdif;
        input k1-k12;
        set=1;
        datalines;
        1 -1 -1 1   0 0 0 0     0 0 0 0
        0 0 0 0     1 -1 -1 1   0 0 0 0
        0 0 0 0     0 0 0 0     1 -1 -1 1   
        ;
      %NLMeans(instore=log, coef=coeffs, link=logit, contrasts=difdif,
               title=Difference in Difference of Means)
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 27 Aug 2021 01:11:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Difference-in-difference-analysis-for-rates-using-group-level/m-p/764380#M37325</guid>
      <dc:creator>StatDave</dc:creator>
      <dc:date>2021-08-27T01:11:16Z</dc:date>
    </item>
    <item>
      <title>Re: Difference-in-difference analysis for rates using group level data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Difference-in-difference-analysis-for-rates-using-group-level/m-p/764391#M37327</link>
      <description>Thank you very much! I learned a lot, esp the need to model the data with proc logistic first. Appreciate your quick response!</description>
      <pubDate>Fri, 27 Aug 2021 05:20:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Difference-in-difference-analysis-for-rates-using-group-level/m-p/764391#M37327</guid>
      <dc:creator>cjpsas</dc:creator>
      <dc:date>2021-08-27T05:20:16Z</dc:date>
    </item>
    <item>
      <title>Re: Difference-in-difference analysis for rates using group level data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Difference-in-difference-analysis-for-rates-using-group-level/m-p/770998#M37708</link>
      <description>&lt;P&gt;Hey StatDave_sas,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Nice response. For this example, is there a good way to add covariates like 'Age' to aggregated data to get an adjusted DID? Seems straightforward to add these covariates with case level data, but with aggregated data, I imagine you'd have to create Age 'groups' (eg. 0-30y = 0, 31-60y = 1, 61+ = 2) to transform it into categorical data, then recalculate counts and % with the new Age column added. Do you have a better way to do this?&lt;/P&gt;</description>
      <pubDate>Tue, 28 Sep 2021 19:07:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Difference-in-difference-analysis-for-rates-using-group-level/m-p/770998#M37708</guid>
      <dc:creator>sasuser2222</dc:creator>
      <dc:date>2021-09-28T19:07:30Z</dc:date>
    </item>
    <item>
      <title>Re: Difference-in-difference analysis for rates using group level data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Difference-in-difference-analysis-for-rates-using-group-level/m-p/771006#M37710</link>
      <description>&lt;P&gt;This is addressed in &lt;A href="http://support.sas.com/kb/61830" target="_self"&gt;the note&lt;/A&gt; I referred to earlier and can be done by including the AT and OM options in the LSMEANS statement to fix the additional covariates at desired values. But, as mentioned in the note, a simpler approach is with the Margins macro. While that macro cannot accept data aggregated into&amp;nbsp;&lt;EM&gt;events/trials&lt;/EM&gt; form (as can be used in PROC LOGISTIC), it can be used with data aggregated so that there are separate observations with counts of events and nonevents (as can be used with the FREQ statement in PROC LOGISTIC). If the data are in &lt;EM&gt;events/trials&lt;/EM&gt; form (that is, one observation per population with separate variables containing counts of events and trials), then it is a simple matter to use a DATA step to split each observation into two observations containing a count of events in one and a count of nonevents in the other for each population. Then you can specify &lt;STRONG&gt;freq=&lt;/STRONG&gt; in the Margins macro to specify the variable containing the counts. Otherwise, the approach is as discussed in the note.&lt;/P&gt;</description>
      <pubDate>Tue, 28 Sep 2021 19:37:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Difference-in-difference-analysis-for-rates-using-group-level/m-p/771006#M37710</guid>
      <dc:creator>StatDave</dc:creator>
      <dc:date>2021-09-28T19:37:35Z</dc:date>
    </item>
    <item>
      <title>Re: Difference-in-difference analysis for rates using group level data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Difference-in-difference-analysis-for-rates-using-group-level/m-p/771194#M37723</link>
      <description>&lt;P&gt;Thanks! I'm probably approaching this the wrong way, but if your data set already contains the additional covariates you want to control for, as well as the counts and percentages for each group, I thought it would be a simple matter of incorporating them into your model statement, since the proportions of the covariates are already part of your data set.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For example, based off the original code in this thread, if the data set also included Age (0, 1, 2) and Race (0, 1) (I didn't write out all the datalines), you would add Age and Race to your class and model statements, but then I'm not sure what else is needed.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="sas"&gt;data x;
input ins age race s t count percent;
n=round(count/(percent/100));
datalines;
0 0 0 0 0 281 5.3
0 0 0 0 1 97 5.0
0 0 0 1 0 841 3.4
0 0 0 1 1 154 1.8
0 0 1 0 0 410 7.7
0 0 1 0 1 159 8.3
etc...
;
      proc logistic data=x;
        class ins age race s t / param=glm ref=first;
        model count/n = ins|s|t age race;
        lsmeans ins*s*t / e ilink;
        ods output coef=coeffs;
        store log;
        run;
      data difdif;
        input k1-k12;
        set=1;
        datalines;
        1 -1 -1 1   0 0 0 0     0 0 0 0
        0 0 0 0     1 -1 -1 1   0 0 0 0
        0 0 0 0     0 0 0 0     1 -1 -1 1   
        ;
      %NLMeans(instore=log, coef=coeffs, link=logit, contrasts=difdif,
               title=Difference in Difference of Means)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 29 Sep 2021 18:11:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Difference-in-difference-analysis-for-rates-using-group-level/m-p/771194#M37723</guid>
      <dc:creator>sasuser2222</dc:creator>
      <dc:date>2021-09-29T18:11:41Z</dc:date>
    </item>
    <item>
      <title>Re: Difference-in-difference analysis for rates using group level data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Difference-in-difference-analysis-for-rates-using-group-level/m-p/771218#M37724</link>
      <description>&lt;P&gt;Indeed, nothing else is required in this case with categorical covariates if you are content with the default way that LSMEANS computes its estimates averaged over balanced levels in those covariates. Additionally if there is a continuous covariate, the AT option is only needed if you want to fix its value at something other than the mean which is the default for LSMEANS.&lt;/P&gt;</description>
      <pubDate>Wed, 29 Sep 2021 19:31:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Difference-in-difference-analysis-for-rates-using-group-level/m-p/771218#M37724</guid>
      <dc:creator>StatDave</dc:creator>
      <dc:date>2021-09-29T19:31:01Z</dc:date>
    </item>
    <item>
      <title>Re: Difference-in-difference analysis for rates using group level data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Difference-in-difference-analysis-for-rates-using-group-level/m-p/771464#M37734</link>
      <description>&lt;P&gt;That makes sense! I'm playing around with the original data set but I must still be doing something wrong ---&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I created a variable called Age with levels (0,1) without changing the group totals from the original data set. So the model data set I created has twice as many rows (since each "ins s t" combination is now split into Age = 0 and Age =1).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I adjusted for this new variable Age as shown in the code below. It ran just fine. But I must be missing something because when I removed Age as a covariate by simply removing it from the model statement, I expected these results to be identical to the original code with the original dataset (from your post on 8/26, which did not have any Age data at all), but they did not match. Shouldn't removing Age as a covariate cause the Age=0 and Age=1 rows for a given (ins s t) combo to be treated as one group; thus the two datasets (with and without Age) should be handled in the same way? What am I missing here?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;(Finally just wanted to express gratitude to&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13633"&gt;@StatDave&lt;/a&gt;&amp;nbsp;for helping self-taught SAS users like me find some clarity in the fog of countless hours of SAS notes and tutorials and youtube videos!)&lt;/P&gt;&lt;LI-CODE lang="sas"&gt;data x;
input ins age s t count percent;
n=round(count/(percent/100));
datalines;
0 0 0 0 46 0.9
0 0 0 1 18 0.9
0 0 1 0 172 0.7
0 0 1 1 33 0.4
0 1 0 0 235 4.4
0 1 0 1 79 4.1
0 1 1 0 669 2.7
0 1 1 1 121 1.4
1 0 0 0 60 1.1
1 0 0 1 29 1.5
1 0 1 0 442 1.8
1 0 1 1 222 2.6
1 1 0 0 350 6.6
1 1 0 1 130 6.7
1 1 1 0 2046 8.3
1 1 1 1 971 11.4
2 0 0 0 1019 19.3
2 0 0 1 367 19.0
2 0 1 0 4947 20.0
2 0 1 1 1665 19.6
2 1 0 0 3583 67.7
2 1 0 1 1304 67.7
2 1 1 0 16403 66.5
2 1 1 1 5472 64.5
;
      proc logistic data=x;
        class ins age s t / param=glm ref=first;
        model count/n = ins|s|t age;
        lsmeans ins*s*t / e ilink;
        ods output coef=coeffs;
        store log;
        run;
      data difdif;
        input k1-k12;
        set=1;
        datalines;
        1 -1 -1 1   0 0 0 0     0 0 0 0
        0 0 0 0     1 -1 -1 1   0 0 0 0
        0 0 0 0     0 0 0 0     1 -1 -1 1   
        ;
      %NLMeans(instore=log, coef=coeffs, link=logit, contrasts=difdif,
               title=Difference in Difference of Means - Adjusted for age)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 30 Sep 2021 18:58:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Difference-in-difference-analysis-for-rates-using-group-level/m-p/771464#M37734</guid>
      <dc:creator>sasuser2222</dc:creator>
      <dc:date>2021-09-30T18:58:36Z</dc:date>
    </item>
  </channel>
</rss>

