<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Has the distribution of values in categorical variable been stable or changed over time series in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Has-the-distribution-of-values-in-categorical-variable-been/m-p/855903#M42311</link>
    <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
data sim;
set sim;
format yym monyy.;
yym=input(cats(month, "-01"), yymmdd10.);
run;

proc sort data=sim;
by yym;
run;

proc freq data=sim noprint;
by yym;                    
tables  CategoricalVar/ out=sim2;
weight frequency;   
run;

ods graphics on;
proc sgplot data=sim2 PCTLEVEL=GROUP ;
vbarbasic yym / response=percent group=CategoricalVar stat=sum groupdisplay=stack ;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="pic.png" style="width: 819px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/79872i1E5D41B3BD1FEB46/image-size/large?v=v2&amp;amp;px=999" role="button" title="pic.png" alt="pic.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 27 Jan 2023 08:07:54 GMT</pubDate>
    <dc:creator>acordes</dc:creator>
    <dc:date>2023-01-27T08:07:54Z</dc:date>
    <item>
      <title>Has the distribution of values in categorical variable been stable or changed over time series</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Has-the-distribution-of-values-in-categorical-variable-been/m-p/855891#M42310</link>
      <description>&lt;P&gt;This is probably more of a statistics question than programming one but hopefully that's ok. I use SAS Enterprise Guide 7.15 and am searching for the best method(s) to use to conduct two hypothesis tests related to the distribution of a categorical variable in a population over time.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Using a simulated dataset (named SIM, provided at bottom), the first question I have is: Is this distribution of the categorical variable “stable” over the course of the first 12 time points? Is there a way to statistically answer that question?&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="dist.png" style="width: 751px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/79868iE6FC0986481B53C4/image-dimensions/751x563?v=v2" width="751" height="563" role="button" title="dist.png" alt="dist.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Though I’m using the same simulated data set for the second question, it can be considered independently from the one above.&lt;/P&gt;
&lt;P&gt;Let's say in the study from which this data set was drawn, there was a change applied on Jan. 1, 2022 - after 12 time points/halfway through the time series.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="dist2.png" style="width: 748px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/79869i342731DF55A6FEF2/image-dimensions/748x561?v=v2" width="748" height="561" role="button" title="dist2.png" alt="dist2.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;The question is: In what way(s), if any, is the distribution of the categorical variable different, or changing over time, in 2022?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;By eyeball, we would maybe guess that the distribution of the four values is stable in the pre-intervention period, but changes over time in the post-intervention period – perhaps A and/or B decrease and C and/or D increase (in terms of proportions of total).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For this, I am not enthusiastic about a single chi-square test of homogeneity in which I aggregate 2021 and 2022 and analyze as a 2x4 contingency table. I have it in my head that interrupted time series could yield what we want – but I’m unsure because I’m most interested in being able to detect or describe the change in distribution rather than change in 1 individual categorical variable alone.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data SIM;&lt;BR /&gt;input Month $ CategoricalVar $ Frequency;&lt;BR /&gt;datalines;&lt;BR /&gt;2021-01 A 152 &lt;BR /&gt;2021-01 B 289 &lt;BR /&gt;2021-01 C 193 &lt;BR /&gt;2021-01 D 103 &lt;BR /&gt;2021-02 A 145 &lt;BR /&gt;2021-02 B 250 &lt;BR /&gt;2021-02 C 193 &lt;BR /&gt;2021-02 D 101 &lt;BR /&gt;2021-03 A 178 &lt;BR /&gt;2021-03 B 312 &lt;BR /&gt;2021-03 C 248 &lt;BR /&gt;2021-03 D 117 &lt;BR /&gt;2021-04 A 174 &lt;BR /&gt;2021-04 B 309 &lt;BR /&gt;2021-04 C 238 &lt;BR /&gt;2021-04 D 135 &lt;BR /&gt;2021-05 A 184 &lt;BR /&gt;2021-05 B 339 &lt;BR /&gt;2021-05 C 234 &lt;BR /&gt;2021-05 D 116 &lt;BR /&gt;2021-06 A 180 &lt;BR /&gt;2021-06 B 340 &lt;BR /&gt;2021-06 C 241 &lt;BR /&gt;2021-06 D 113 &lt;BR /&gt;2021-07 A 203 &lt;BR /&gt;2021-07 B 370 &lt;BR /&gt;2021-07 C 241 &lt;BR /&gt;2021-07 D 109 &lt;BR /&gt;2021-08 A 185 &lt;BR /&gt;2021-08 B 345 &lt;BR /&gt;2021-08 C 252 &lt;BR /&gt;2021-08 D 134 &lt;BR /&gt;2021-09 A 198 &lt;BR /&gt;2021-09 B 333 &lt;BR /&gt;2021-09 C 252 &lt;BR /&gt;2021-09 D 130 &lt;BR /&gt;2021-10 A 207 &lt;BR /&gt;2021-10 B 378 &lt;BR /&gt;2021-10 C 233 &lt;BR /&gt;2021-10 D 127 &lt;BR /&gt;2021-11 A 168 &lt;BR /&gt;2021-11 B 298 &lt;BR /&gt;2021-11 C 223 &lt;BR /&gt;2021-11 D 127 &lt;BR /&gt;2021-12 A 172 &lt;BR /&gt;2021-12 B 308 &lt;BR /&gt;2021-12 C 260 &lt;BR /&gt;2021-12 D 127 &lt;BR /&gt;2022-01 A 122 &lt;BR /&gt;2022-01 B 290 &lt;BR /&gt;2022-01 C 247 &lt;BR /&gt;2022-01 D 144 &lt;BR /&gt;2022-02 A 151 &lt;BR /&gt;2022-02 B 287 &lt;BR /&gt;2022-02 C 218 &lt;BR /&gt;2022-02 D 107 &lt;BR /&gt;2022-03 A 170 &lt;BR /&gt;2022-03 B 316 &lt;BR /&gt;2022-03 C 276 &lt;BR /&gt;2022-03 D 162 &lt;BR /&gt;2022-04 A 150 &lt;BR /&gt;2022-04 B 325 &lt;BR /&gt;2022-04 C 277 &lt;BR /&gt;2022-04 D 119 &lt;BR /&gt;2022-05 A 148 &lt;BR /&gt;2022-05 B 289 &lt;BR /&gt;2022-05 C 287 &lt;BR /&gt;2022-05 D 134 &lt;BR /&gt;2022-06 A 148 &lt;BR /&gt;2022-06 B 238 &lt;BR /&gt;2022-06 C 252 &lt;BR /&gt;2022-06 D 154 &lt;BR /&gt;2022-07 A 130 &lt;BR /&gt;2022-07 B 258 &lt;BR /&gt;2022-07 C 241 &lt;BR /&gt;2022-07 D 153 &lt;BR /&gt;2022-08 A 135 &lt;BR /&gt;2022-08 B 235 &lt;BR /&gt;2022-08 C 300 &lt;BR /&gt;2022-08 D 140 &lt;BR /&gt;2022-09 A 152 &lt;BR /&gt;2022-09 B 229 &lt;BR /&gt;2022-09 C 280 &lt;BR /&gt;2022-09 D 172 &lt;BR /&gt;2022-10 A 154 &lt;BR /&gt;2022-10 B 330 &lt;BR /&gt;2022-10 C 315 &lt;BR /&gt;2022-10 D 187 &lt;BR /&gt;2022-11 A 130 &lt;BR /&gt;2022-11 B 278 &lt;BR /&gt;2022-11 C 312 &lt;BR /&gt;2022-11 D 179 &lt;BR /&gt;2022-12 A 135 &lt;BR /&gt;2022-12 B 267 &lt;BR /&gt;2022-12 C 299 &lt;BR /&gt;2022-12 D 175 &lt;BR /&gt;;&lt;/P&gt;</description>
      <pubDate>Fri, 27 Jan 2023 04:00:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Has-the-distribution-of-values-in-categorical-variable-been/m-p/855891#M42310</guid>
      <dc:creator>Rodcjones</dc:creator>
      <dc:date>2023-01-27T04:00:41Z</dc:date>
    </item>
    <item>
      <title>Re: Has the distribution of values in categorical variable been stable or changed over time series</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Has-the-distribution-of-values-in-categorical-variable-been/m-p/855903#M42311</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
data sim;
set sim;
format yym monyy.;
yym=input(cats(month, "-01"), yymmdd10.);
run;

proc sort data=sim;
by yym;
run;

proc freq data=sim noprint;
by yym;                    
tables  CategoricalVar/ out=sim2;
weight frequency;   
run;

ods graphics on;
proc sgplot data=sim2 PCTLEVEL=GROUP ;
vbarbasic yym / response=percent group=CategoricalVar stat=sum groupdisplay=stack ;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="pic.png" style="width: 819px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/79872i1E5D41B3BD1FEB46/image-size/large?v=v2&amp;amp;px=999" role="button" title="pic.png" alt="pic.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 27 Jan 2023 08:07:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Has-the-distribution-of-values-in-categorical-variable-been/m-p/855903#M42311</guid>
      <dc:creator>acordes</dc:creator>
      <dc:date>2023-01-27T08:07:54Z</dc:date>
    </item>
    <item>
      <title>Re: Has the distribution of values in categorical variable been stable or changed over time series</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Has-the-distribution-of-values-in-categorical-variable-been/m-p/855916#M42312</link>
      <description>You could try multinomial proportions 's confidence interval.&lt;BR /&gt;Check &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13684"&gt;@Rick_SAS&lt;/a&gt; 's blog&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://blogs.sas.com/content/iml/2017/02/15/confidence-intervals-multinomial-proportions.html" target="_blank"&gt;https://blogs.sas.com/content/iml/2017/02/15/confidence-intervals-multinomial-proportions.html&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 27 Jan 2023 11:10:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Has-the-distribution-of-values-in-categorical-variable-been/m-p/855916#M42312</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2023-01-27T11:10:33Z</dc:date>
    </item>
    <item>
      <title>Re: Has the distribution of values in categorical variable been stable or changed over time series</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Has-the-distribution-of-values-in-categorical-variable-been/m-p/855925#M42313</link>
      <description>&lt;P&gt;I think you can build on Arne's visualization. If your main interest is whether there is a linear trend for a certain time period, you can analyze the trend of the proportions. For example, if you fit an OLS line for the proportions in each category over time, does any line have a statistically significant slope (not zero)?&amp;nbsp; The test for unequal slopes across linear models (ANCOVA) is explained in&amp;nbsp;&lt;A href="https://support.sas.com/kb/24/177.html" target="_blank"&gt;24177 - Comparing parameters (slopes) from a model fit to two or more groups (sas.com)&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* https://support.sas.com/kb/24/177.html */
 proc glm data=sim2;
   class CategoricalVar;
   model Percent = CategoricalVar yym CategoricalVar*yym / noint solution;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ANCOVAPlot1.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/79873i3E02D9DC9A6A9F5A/image-size/large?v=v2&amp;amp;px=999" role="button" title="ANCOVAPlot1.png" alt="ANCOVAPlot1.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can look at the Type3 tests and the parameter estimates to conclude whether the slopes of the lines are different. Note that since these are proportions, they can't all increase! If one proportion goes up, at least one other must go down.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 27 Jan 2023 11:43:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Has-the-distribution-of-values-in-categorical-variable-been/m-p/855925#M42313</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2023-01-27T11:43:47Z</dc:date>
    </item>
    <item>
      <title>Re: Has the distribution of values in categorical variable been stable or changed over time series</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Has-the-distribution-of-values-in-categorical-variable-been/m-p/855926#M42314</link>
      <description>&lt;P&gt;POPULATION STABILITY INDEX (PSI)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Examining Distributional Shifts by Using Population Stability Index (PSI) for Model Validation and Diagnosis&lt;BR /&gt;Alec Zhixiao Lin, LoanDepot, Foothill Ranch, CA&lt;BR /&gt;&lt;A href="https://www.lexjansen.com/wuss/2017/47_Final_Paper_PDF.pdf" target="_blank"&gt;https://www.lexjansen.com/wuss/2017/47_Final_Paper_PDF.pdf&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Koen&lt;/P&gt;</description>
      <pubDate>Fri, 27 Jan 2023 11:52:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Has-the-distribution-of-values-in-categorical-variable-been/m-p/855926#M42314</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2023-01-27T11:52:43Z</dc:date>
    </item>
    <item>
      <title>Re: Has the distribution of values in categorical variable been stable or changed over time series</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Has-the-distribution-of-values-in-categorical-variable-been/m-p/856013#M42320</link>
      <description>&lt;P&gt;You can use a generalized logistic model to test those hypotheses. For the first, this model provides a test of the effect of MONTH in the first year data. The Type3 test of MONTH is not significant (p=.86) suggesting "stability" in the sense of no changes in the proportions over the months. The LSMEANS statement provides the proportions and plots them.&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data SIM;
input year 1-4 Month 6-7 CategoricalVar $ Frequency;
datalines;
...
;
proc logistic data=sim;
where year=2021;
freq frequency;
class month / param=glm;
model categoricalvar=month / link=glogit;
lsmeans month / ilink plots=meanplot(ilink);
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This next model assesses the change between years. The Type3 test for YEAR is significant (p&amp;lt;.0001). The LSMEANS statement shows the probabilities for each category in each year and confirms your eyeball conclusion.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc logistic data=sim;
freq frequency;
class year month / param=glm;
model categoricalvar=year month year*month / link=glogit;
lsmeans year / ilink plots=meanplot(ilink);
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;But both could be done using a nonmodeling approach with simple chi-square tests. This approach might be necessary with data that is more sparse which could cause numerical problems in the model-based approach.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc freq data=sim;
weight frequency;
table year*categoricalvar/chisq;
run;
proc freq data=sim;
where year=2021;
weight frequency;
table month*categoricalvar/chisq;
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 27 Jan 2023 16:47:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Has-the-distribution-of-values-in-categorical-variable-been/m-p/856013#M42320</guid>
      <dc:creator>StatDave</dc:creator>
      <dc:date>2023-01-27T16:47:43Z</dc:date>
    </item>
    <item>
      <title>Re: Has the distribution of values in categorical variable been stable or changed over time series</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Has-the-distribution-of-values-in-categorical-variable-been/m-p/856568#M42349</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/127222"&gt;@acordes&lt;/a&gt; &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13684"&gt;@Rick_SAS&lt;/a&gt; &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt; &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/60547"&gt;@sbxkoenk&lt;/a&gt; &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13633"&gt;@StatDave&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;Thank you all for these prompt and thoughtful responses! I've begun investigating each and plan to report back on the results of my learning/experimentation.&lt;/P&gt;</description>
      <pubDate>Wed, 01 Feb 2023 01:23:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Has-the-distribution-of-values-in-categorical-variable-been/m-p/856568#M42349</guid>
      <dc:creator>Rodcjones</dc:creator>
      <dc:date>2023-02-01T01:23:33Z</dc:date>
    </item>
  </channel>
</rss>

