<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Can i use Kolmogorov-Smirnov on categorical variables in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817035#M40325</link>
    <description>&lt;P&gt;i have two datasets A and B&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;i use the KS statistic to understand if A has come from the B&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Now, i have the variable c which is categorical. It takes values between 400 to 744. Each number shows a type of sub-product. For example, 400 means product x while 552 means product y and so on.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I wonder if i can use KS with this variable?&lt;/P&gt;</description>
    <pubDate>Wed, 08 Jun 2022 11:52:10 GMT</pubDate>
    <dc:creator>Toni2</dc:creator>
    <dc:date>2022-06-08T11:52:10Z</dc:date>
    <item>
      <title>Can i use Kolmogorov-Smirnov on categorical variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817035#M40325</link>
      <description>&lt;P&gt;i have two datasets A and B&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;i use the KS statistic to understand if A has come from the B&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Now, i have the variable c which is categorical. It takes values between 400 to 744. Each number shows a type of sub-product. For example, 400 means product x while 552 means product y and so on.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I wonder if i can use KS with this variable?&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jun 2022 11:52:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817035#M40325</guid>
      <dc:creator>Toni2</dc:creator>
      <dc:date>2022-06-08T11:52:10Z</dc:date>
    </item>
    <item>
      <title>Re: Can i use Kolmogorov-Smirnov on categorical variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817036#M40326</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;I wonder if i can use KS with this variable?&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Probably not. What is the question you want to answer?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jun 2022 11:55:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817036#M40326</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2022-06-08T11:55:09Z</dc:date>
    </item>
    <item>
      <title>Re: Can i use Kolmogorov-Smirnov on categorical variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817041#M40327</link>
      <description>&lt;P&gt;Calling&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13684"&gt;@Rick_SAS&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Agree with Paige, you can't get cumulative probability .&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jun 2022 12:01:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817041#M40327</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2022-06-08T12:01:36Z</dc:date>
    </item>
    <item>
      <title>Re: Can i use Kolmogorov-Smirnov on categorical variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817043#M40328</link>
      <description>thanks. The question is : if we can use KS for variable c can extract safe results that A has come from B ?&lt;BR /&gt;</description>
      <pubDate>Wed, 08 Jun 2022 12:02:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817043#M40328</guid>
      <dc:creator>Toni2</dc:creator>
      <dc:date>2022-06-08T12:02:36Z</dc:date>
    </item>
    <item>
      <title>Re: Can i use Kolmogorov-Smirnov on categorical variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817045#M40329</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/372747"&gt;@Toni2&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;thanks. The question is : if we can use KS for variable c can extract safe results that A has come from B ?&lt;BR /&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I don't really know what this means. Much more explanation needed, and we probably need an example in detail, as well. Don't write one brief sentence, that's not what I am looking for when I ask for "much more explanation". Don't talk about KS test, talk about what question you want the data to help you answer.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jun 2022 12:38:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817045#M40329</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2022-06-08T12:38:33Z</dc:date>
    </item>
    <item>
      <title>Re: Can i use Kolmogorov-Smirnov on categorical variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817056#M40330</link>
      <description>NO. You mean like some Sign Rank Test ?  Unlike continuous variable ,Category variable don't has "Distribution" term .</description>
      <pubDate>Wed, 08 Jun 2022 12:22:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817056#M40330</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2022-06-08T12:22:37Z</dc:date>
    </item>
    <item>
      <title>Re: Can i use Kolmogorov-Smirnov on categorical variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817066#M40331</link>
      <description>&lt;P&gt;The two-sample KS test is a way to determine whether the distribution of some continuous variable is the same for two groups. For example, the following SAS statements analyze whether the distribution of height is the same for boys and for girls:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc npar1way data=sashelp.class;
   class sex;
   var height;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Now let's examine your question. You cannot use the KS test on a categorical variable, but you can use a categorical variable to determine subgroups of the data that you want to test. For example, suppose that you want to analyze the distribution of PRICE at two kinds of stores: convenience stores and grocery stores.&amp;nbsp; You might have several kinds of snacks that you want to analyze, such as potato chips, pretzels, tortilla chips, and so forth. If so, you can use a BY statement to run the analysis for each product.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The following SAS statements analyze the distribution of prices (PRICE) between convenience stores and grocery stores (STORETYPE) for each kind of product (PRODUCT):&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=Prices;
   by Product;
run;
proc npar1way data=Prices ks D;
   by Product;       /* repeat anlysis for each type of product */
   class StoreType;  /* 'Convenience' or 'Grocery' */
   var Price;        /* analyze the distribution of prices */
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;In your original questions, A and B would be the prices for convenience stores or Grocery stores, respectively. C would be the product type.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Trying to use C (product type) in any other way probably does not give you a correct analysis.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jun 2022 13:08:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817066#M40331</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2022-06-08T13:08:21Z</dc:date>
    </item>
    <item>
      <title>Re: Can i use Kolmogorov-Smirnov on categorical variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817067#M40332</link>
      <description>&lt;P&gt;well,&amp;nbsp; below is a small extract from dataset A for variable c. Dataset A has approx. 350k observations&amp;nbsp;&lt;/P&gt;
&lt;P&gt;c&lt;/P&gt;
&lt;P&gt;737&lt;BR /&gt;701&lt;BR /&gt;702&lt;BR /&gt;702&lt;BR /&gt;742&lt;BR /&gt;735&lt;BR /&gt;710&lt;BR /&gt;731&lt;BR /&gt;702&lt;BR /&gt;710&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This is the a small extract from dataset B for variable c. B has approx. 10m observations&lt;/P&gt;
&lt;P&gt;c&lt;/P&gt;
&lt;P&gt;407&lt;BR /&gt;737&lt;BR /&gt;724&lt;BR /&gt;701&lt;BR /&gt;702&lt;BR /&gt;702&lt;BR /&gt;724&lt;BR /&gt;742&lt;BR /&gt;701&lt;BR /&gt;710&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As i wrote in my initial post c is a categorical variable. Therefore each value corresponds in a characteristic. When i test with KS if A above has come from B, KS passes for c but then i think that observations do not correspond to actual numbers.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;On the other hand, if the test takes in consideration the volume of values (for example, observation 700 appears 3 times) then the KS can answer to the question that A has come from B.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jun 2022 13:10:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817067#M40332</guid>
      <dc:creator>Toni2</dc:creator>
      <dc:date>2022-06-08T13:10:29Z</dc:date>
    </item>
    <item>
      <title>Re: Can i use Kolmogorov-Smirnov on categorical variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817068#M40333</link>
      <description>thanks, quick question, is it wrong to use npar1way with discrete variables ? If yes, what alternative function exists to use in SAS?</description>
      <pubDate>Wed, 08 Jun 2022 13:20:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817068#M40333</guid>
      <dc:creator>Toni2</dc:creator>
      <dc:date>2022-06-08T13:20:23Z</dc:date>
    </item>
    <item>
      <title>Re: Can i use Kolmogorov-Smirnov on categorical variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817076#M40334</link>
      <description>&lt;P&gt;Yes, it is wrong to use PROC NPAR1WAY to analyze a discrete variable. The analyses and tests assume that the data are continuous.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can use PROC FREQ and a chi-square test to test whether the frequency distribution of the products differs between the A group and the B group. For example, the following statements simulate two groups and 10 products. The frequency distribution of the A group is slightly different from the B group. The chi-square test (or a related test) can detect this difference for very large samples, but not for small samples:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data FakeData;
call streaminit(1);
array ProbA[10] (.1  .1 .1 .1 .1  .1 .1  .1 .1  .1);
array ProbB[10] (.08 .1 .1 .1 .11 .1 .09 .1 .12 .1);
Group = 'A';
do i = 1 to 1000;
   Product = rand("Table", of ProbA[*]);  /* simulate counts for Group='A' */
   output;
end;
Group = 'B';
do i = 1 to 600;
   Product = rand("Table", of ProbB[*]);  /* simulate counts for Group='B' */
   output;
end;

proc freq data=FakeData;
   tables Group*Product / chisq expected nopercent nocol;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 08 Jun 2022 13:40:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817076#M40334</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2022-06-08T13:40:55Z</dc:date>
    </item>
    <item>
      <title>Re: Can i use Kolmogorov-Smirnov on categorical variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817079#M40335</link>
      <description>thank you</description>
      <pubDate>Wed, 08 Jun 2022 13:44:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817079#M40335</guid>
      <dc:creator>Toni2</dc:creator>
      <dc:date>2022-06-08T13:44:37Z</dc:date>
    </item>
    <item>
      <title>Re: Can i use Kolmogorov-Smirnov on categorical variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817080#M40336</link>
      <description>&lt;P&gt;But I can save you the trouble of running the analysis. If&amp;nbsp;one group has 350k observations and the other has 10M observations, then the tests are likely to reject the null hypothesis that the distributions are the same. To understand why I say this, read &lt;A href="https://blogs.sas.com/content/iml/2016/11/28/goodness-of-fit-large-small-samples.html" target="_self"&gt;"Goodness-of-fit tests: A cautionary tale for large and small samples."&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jun 2022 13:45:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Can-i-use-Kolmogorov-Smirnov-on-categorical-variables/m-p/817080#M40336</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2022-06-08T13:45:30Z</dc:date>
    </item>
  </channel>
</rss>

