<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: classify each customer to 1-100 groups based on percentiles without 0 in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/classify-each-customer-to-1-100-groups-based-on-percentiles/m-p/954655#M372822</link>
    <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/4954"&gt;@Astounding&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;"&lt;SPAN&gt;PROC RANK processes the 0 salary values into percentile 1. "&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;and&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;"Run the PROC RANK on the salary &amp;gt; 0 observations, then put the groups back together again."&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;The latter is what my code effectively does using the "where=(salary&amp;gt;0)". filter to the data set submitted to PROC RANK, which of course necessitates the subsequent recovery of the unranked observations.&amp;nbsp; &amp;nbsp;But instead of a separate WHERE statement it's coded as a data set name parameter.&amp;nbsp; I suspect you may have missed that.&amp;nbsp; Welcome to a club I frequently visit.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 26 Dec 2024 22:45:19 GMT</pubDate>
    <dc:creator>mkeintz</dc:creator>
    <dc:date>2024-12-26T22:45:19Z</dc:date>
    <item>
      <title>classify each customer to 1-100 groups based on percentiles without 0</title>
      <link>https://communities.sas.com/t5/SAS-Programming/classify-each-customer-to-1-100-groups-based-on-percentiles/m-p/954604#M372802</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;Hello&lt;/P&gt;
&lt;P&gt;I want to create a new variable that classify each customerID into a group between 1 and 100 by percentiles of salary amount.&lt;/P&gt;
&lt;P&gt;Please note that I want that the percentiles will be calculated only on cusomers with positive salary (customers with salary zero will not influence on the percentiles calculation). So for example:&lt;/P&gt;
&lt;P&gt;customers with salary 0 be in group 0&lt;/P&gt;
&lt;P&gt;customers with salary greater than 0 and lower equal to percentile 1 will be in group 1&lt;/P&gt;
&lt;P&gt;customers with salary greater than percentile 1 and lower equal to percentile 2 will be in group 2&lt;/P&gt;
&lt;P&gt;and so on&lt;/P&gt;
&lt;P&gt;customers with salary greater than percentile 99 and lower equal to percentile 100( max) will be in group 100&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What is the way to do it please&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Dec 2024 17:55:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/classify-each-customer-to-1-100-groups-based-on-percentiles/m-p/954604#M372802</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2024-12-25T17:55:44Z</dc:date>
    </item>
    <item>
      <title>Re: classify each customer to 1-100 groups based on percentiles without 0</title>
      <link>https://communities.sas.com/t5/SAS-Programming/classify-each-customer-to-1-100-groups-based-on-percentiles/m-p/954606#M372804</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159549"&gt;@Ronein&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Nice question.&amp;nbsp; Where is the sample data, to help us help you?&lt;/P&gt;</description>
      <pubDate>Wed, 25 Dec 2024 18:20:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/classify-each-customer-to-1-100-groups-based-on-percentiles/m-p/954606#M372804</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2024-12-25T18:20:41Z</dc:date>
    </item>
    <item>
      <title>Re: classify each customer to 1-100 groups based on percentiles without 0</title>
      <link>https://communities.sas.com/t5/SAS-Programming/classify-each-customer-to-1-100-groups-based-on-percentiles/m-p/954607#M372805</link>
      <description>&lt;P&gt;Without getting into a coding example, why not run PROC RANK (with groups=100) on the subset of salaries &amp;gt; 0, which can yield an output dataset with ranks of 0 through 99?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Merge that data with the original data, increasing rank by 1 for the positive salary subset, and assign a rank of zero to cases with salary=0.&amp;nbsp; You would then have rank values of 00 through 100.&lt;/P&gt;</description>
      <pubDate>Wed, 25 Dec 2024 18:50:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/classify-each-customer-to-1-100-groups-based-on-percentiles/m-p/954607#M372805</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2024-12-25T18:50:23Z</dc:date>
    </item>
    <item>
      <title>Re: classify each customer to 1-100 groups based on percentiles without 0</title>
      <link>https://communities.sas.com/t5/SAS-Programming/classify-each-customer-to-1-100-groups-based-on-percentiles/m-p/954608#M372806</link>
      <description>&lt;P&gt;Here's a question you will need to address.&amp;nbsp; The methods you choose (whether PROC RANK or any other method) will not let you avoid answering this question.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How do you want to handle ties?&amp;nbsp; To illustrate, let's say you have 100,000 observations.&amp;nbsp; But 5,000 of them have the exact same score.&amp;nbsp; How do you distribute them among percentiles?&amp;nbsp; Will all 5,000 go into the same percentile (leaving some other percentile with 0 observations)?&amp;nbsp; Or will you assign some of the 5,000 to one percentile and some of the 5,000 with the exact same score to a different percentile?&lt;/P&gt;</description>
      <pubDate>Wed, 25 Dec 2024 21:06:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/classify-each-customer-to-1-100-groups-based-on-percentiles/m-p/954608#M372806</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2024-12-25T21:06:51Z</dc:date>
    </item>
    <item>
      <title>Re: classify each customer to 1-100 groups based on percentiles without 0</title>
      <link>https://communities.sas.com/t5/SAS-Programming/classify-each-customer-to-1-100-groups-based-on-percentiles/m-p/954609#M372807</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;Yes-&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;5,000 will go into the same percentile (leaving some other percentile with 0 observations)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;What is the way to do it?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Dec 2024 21:18:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/classify-each-customer-to-1-100-groups-based-on-percentiles/m-p/954609#M372807</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2024-12-25T21:18:22Z</dc:date>
    </item>
    <item>
      <title>Re: classify each customer to 1-100 groups based on percentiles without 0</title>
      <link>https://communities.sas.com/t5/SAS-Programming/classify-each-customer-to-1-100-groups-based-on-percentiles/m-p/954622#M372809</link>
      <description>&lt;P&gt;Here's a reasonable way:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=have;
by salary;
run;
proc summary data=have;
   where salary &amp;gt; 0;
   output out=n_salaries n=n_salaries;
run;
data want;
   if _n_=1 then set n_salaries;
   set have nobs=_nobs_;
   by salary;
   retain group;
   if salary &amp;gt; 0 then n + 1;
   if first.salary then do;
      if salary &amp;lt;= 0 then group = 0;
      else group = ceil(100 * n/n_salaries);
   end;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;It's untested, since there's no data.&amp;nbsp; But it should be in the ballpark.&amp;nbsp; It uses GROUP to represent the percentile, and it does allow a largest percentile of 100.&lt;/P&gt;</description>
      <pubDate>Thu, 26 Dec 2024 02:43:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/classify-each-customer-to-1-100-groups-based-on-percentiles/m-p/954622#M372809</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2024-12-26T02:43:03Z</dc:date>
    </item>
    <item>
      <title>Re: classify each customer to 1-100 groups based on percentiles without 0</title>
      <link>https://communities.sas.com/t5/SAS-Programming/classify-each-customer-to-1-100-groups-based-on-percentiles/m-p/954628#M372811</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159549"&gt;@Ronein&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;Yes-&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;5,000 will go into the same percentile (leaving some other percentile with 0 observations)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;What is the way to do it?&lt;/SPAN&gt;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;And which percentile will ties be assigned: lower bound?&amp;nbsp; upper bound? mid-point?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Let's say you want the mid-point, and data is sorted by ID:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc rank data=have (keep=id salary  where=(salary&amp;gt;0));
  groups=100 ties=mean out=need;
  var salary;
  ranks salary_pctile;
run;

data want;
  merge have need;
  by id;
  if salary&amp;gt;0 then salary_pctile=salary_pctile+1;
  else if salary=0 then salary_pctile=0;
run; 
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Untested in the absence of sample data.&lt;/P&gt;</description>
      <pubDate>Thu, 26 Dec 2024 03:51:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/classify-each-customer-to-1-100-groups-based-on-percentiles/m-p/954628#M372811</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2024-12-26T03:51:19Z</dc:date>
    </item>
    <item>
      <title>Re: classify each customer to 1-100 groups based on percentiles without 0</title>
      <link>https://communities.sas.com/t5/SAS-Programming/classify-each-customer-to-1-100-groups-based-on-percentiles/m-p/954643#M372815</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm usuallly right there with you, trying to use the simplest and most direct tools.&amp;nbsp; But this is a case where I feared PROC RANK would put us on the inexorable path toward 20 posts before hitting a solution.&amp;nbsp; Here's what I expected:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Poster would actually try a PROC RANK solution, then complain that it didn't work.&lt;/LI&gt;
&lt;LI&gt;Someone would post that "didn't work" is awfully vague and would request a copy of the log.&lt;/LI&gt;
&lt;LI&gt;Poster would post the log, but as text so that it is difficult to read.&lt;/LI&gt;
&lt;LI&gt;Someone would post instructions on the right way to post a log.&lt;/LI&gt;
&lt;LI&gt;Poster would actually post the log in a readable form.&lt;/LI&gt;
&lt;LI&gt;Nothing would appear to be wrong, and someone would ask the original poster why s/he insists that it didn't work.&lt;/LI&gt;
&lt;LI&gt;Poster would eventually say that there is nobody assigned to percentile 1 or 2, and that the lowest salary starts with percentile 3.&lt;/LI&gt;
&lt;LI&gt;I would get to post why that happens and say, "That's what you asked for."&lt;/LI&gt;
&lt;LI&gt;Poster would reply that percentiles should be assigned differently...&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;You must get the idea by now.&amp;nbsp; Let me skip some of the process and jump right to the issue.&lt;/P&gt;
&lt;P&gt;PROC RANK processes the 0 salary values into percentile 1.&amp;nbsp; Once this statement runs, nobody is left in the first percentile:&amp;nbsp; if salary = 0 then percentile = 0;&lt;/P&gt;
&lt;P&gt;Percentile 2 might be a little light as well.&lt;/P&gt;
&lt;P&gt;I was imagining an approach where percentiles get assigned based only on the positive salaries (still assigning salary=0 to percentile 0).&amp;nbsp; This could easily be achieved by cleaning the data now:&amp;nbsp; if salary = 0 then salary = .;&amp;nbsp; But for some reason, it seems the original poster is not allowed to do this.&amp;nbsp; Cleaning the data first would solve for:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;percentile assignment using a simple PROC RANK&lt;/LI&gt;
&lt;LI&gt;detecting other bad data.&amp;nbsp; For example if one data entry person used salary=0 for missing values, perhaps another used salary=-999.&lt;/LI&gt;
&lt;LI&gt;duplicate entries for the same person.&amp;nbsp; If the current form of the data is acceptable, I'm not going to try to explain what happens in a many-to-many merge&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;I'm not claiming that my posted solution is best or even that it works.&amp;nbsp; LIke you, I don't have any data to use to test it.&amp;nbsp; Unlike you, I haven't had SAS available for a few years.&amp;nbsp; (No, I'm not in jail, just not motivated to fiddle with my ancient desktop machine.)&amp;nbsp; Once the data is clean, another viable approach (even with 0 representing missing values) might be to separate the data into two sets.&amp;nbsp; One holds salary=0 observations, and one holds salary &amp;gt; 0 observations.&amp;nbsp; Run the PROC RANK on the salary &amp;gt; 0 observations, then put the groups back together again.&lt;/P&gt;
&lt;P&gt;Anyway, we'll see where this journey goes.&amp;nbsp; Best of luck to all of us along the way.&lt;/P&gt;</description>
      <pubDate>Thu, 26 Dec 2024 15:04:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/classify-each-customer-to-1-100-groups-based-on-percentiles/m-p/954643#M372815</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2024-12-26T15:04:18Z</dc:date>
    </item>
    <item>
      <title>Re: classify each customer to 1-100 groups based on percentiles without 0</title>
      <link>https://communities.sas.com/t5/SAS-Programming/classify-each-customer-to-1-100-groups-based-on-percentiles/m-p/954655#M372822</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/4954"&gt;@Astounding&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;"&lt;SPAN&gt;PROC RANK processes the 0 salary values into percentile 1. "&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp;and&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;"Run the PROC RANK on the salary &amp;gt; 0 observations, then put the groups back together again."&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;The latter is what my code effectively does using the "where=(salary&amp;gt;0)". filter to the data set submitted to PROC RANK, which of course necessitates the subsequent recovery of the unranked observations.&amp;nbsp; &amp;nbsp;But instead of a separate WHERE statement it's coded as a data set name parameter.&amp;nbsp; I suspect you may have missed that.&amp;nbsp; Welcome to a club I frequently visit.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 26 Dec 2024 22:45:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/classify-each-customer-to-1-100-groups-based-on-percentiles/m-p/954655#M372822</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2024-12-26T22:45:19Z</dc:date>
    </item>
    <item>
      <title>Re: classify each customer to 1-100 groups based on percentiles without 0</title>
      <link>https://communities.sas.com/t5/SAS-Programming/classify-each-customer-to-1-100-groups-based-on-percentiles/m-p/954740#M372858</link>
      <description>&lt;P&gt;Of course you're right here.&amp;nbsp; Time to invest on my side:&amp;nbsp; a new pair of bifocals and a new coffee maker.&amp;nbsp; Happy 2025 to all.&lt;/P&gt;</description>
      <pubDate>Mon, 30 Dec 2024 01:58:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/classify-each-customer-to-1-100-groups-based-on-percentiles/m-p/954740#M372858</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2024-12-30T01:58:07Z</dc:date>
    </item>
  </channel>
</rss>

