<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: top n% per group in New SAS User</title>
    <link>https://communities.sas.com/t5/New-SAS-User/top-n-per-group/m-p/602621#M16845</link>
    <description>&lt;P&gt;Thanks for your reply. After some more thinking I realize I want the first 5% of observations that appear in the dataset, for each of the various sized groups. Which I think takes the cost variable out of the equation.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For example, for a group that has 100 members, I want to flag the first 5. For the group has 25,643 members I want to flag the first 1,282 which appear in the dataset under that group.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I hope that clears things up! Thanks.&lt;/P&gt;</description>
    <pubDate>Thu, 07 Nov 2019 20:55:52 GMT</pubDate>
    <dc:creator>rcogan</dc:creator>
    <dc:date>2019-11-07T20:55:52Z</dc:date>
    <item>
      <title>top n% per group</title>
      <link>https://communities.sas.com/t5/New-SAS-User/top-n-per-group/m-p/601439#M16721</link>
      <description>&lt;P&gt;Relatively new SAS user (Enterprise Guide 7.1) and programmer in general so thanks in advance for bearing with me.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am looking to find and flag the top %n of members in a group. My dataset roughly resembles this (but think hundreds of millions of observations):&lt;/P&gt;&lt;P&gt;&lt;U&gt;member_ID&lt;/U&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;U&gt;Group&lt;/U&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;U&gt;Cost&lt;/U&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; 1&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; A&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$10&lt;/P&gt;&lt;P&gt;&amp;nbsp; 2&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; A&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$8&lt;/P&gt;&lt;P&gt;&amp;nbsp; 3&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; A&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$1&lt;/P&gt;&lt;P&gt;&amp;nbsp; 4&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; A&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$0&lt;/P&gt;&lt;P&gt;&amp;nbsp; 5&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; B&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$15&lt;/P&gt;&lt;P&gt;&amp;nbsp; 6&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; B&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$10&lt;/P&gt;&lt;P&gt;&amp;nbsp; 7&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; B&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$7&lt;/P&gt;&lt;P&gt;&amp;nbsp; 8&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; C&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$13&lt;/P&gt;&lt;P&gt;&amp;nbsp; 9&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; C&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$5&lt;/P&gt;&lt;P&gt;&amp;nbsp; 10&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; C&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;$0&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am trying to find the top 5% of members per group and seeing how much they collectively cost. The data set is ordered so that cost is descending per group, so it should be as easy as slicing off the top and flagging them. My example above is bad because you can't get 5% from only a couple observations, but hopefully you get the idea.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This was the code I used before I added group into the mix and it gave me a new dataset with the top 5% of total members successfully.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data top5;&lt;BR /&gt;do _N_=1 to round(nobs*.05);&lt;BR /&gt;set my_data_set point=_N_ nobs=nobs;&lt;BR /&gt;output top5;&lt;BR /&gt;end;&lt;BR /&gt;stop;&lt;BR /&gt;run;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Nov 2019 17:35:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/top-n-per-group/m-p/601439#M16721</guid>
      <dc:creator>rcogan</dc:creator>
      <dc:date>2019-11-04T17:35:27Z</dc:date>
    </item>
    <item>
      <title>Re: top n% per group</title>
      <link>https://communities.sas.com/t5/New-SAS-User/top-n-per-group/m-p/601442#M16722</link>
      <description>&lt;P&gt;If you use PROC RANK with the option GROUPS&amp;nbsp; = 100, each observation in each group will get a rank between 0 and 99 (or maybe it 1 to 100, I don't remember exactly). Then if you want the top 23%, you find all observations where the rank is &amp;lt;=22. This way, you also have control over how to handle ties.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://documentation.sas.com/?cdcId=pgmsascdc&amp;amp;cdcVersion=9.4_3.4&amp;amp;docsetId=proc&amp;amp;docsetTarget=p0le3p5ngj1zlbn1mh3tistq9t76.htm&amp;amp;locale=en"&gt;https://documentation.sas.com/?cdcId=pgmsascdc&amp;amp;cdcVersion=9.4_3.4&amp;amp;docsetId=proc&amp;amp;docsetTarget=p0le3p5ngj1zlbn1mh3tistq9t76.htm&amp;amp;locale=en&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Nov 2019 17:45:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/top-n-per-group/m-p/601442#M16722</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2019-11-04T17:45:05Z</dc:date>
    </item>
    <item>
      <title>Re: top n% per group</title>
      <link>https://communities.sas.com/t5/New-SAS-User/top-n-per-group/m-p/601493#M16724</link>
      <description>&lt;P&gt;Your question is not clear.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you want the top costs which taken together amount to less than 5% of the total cost then&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
do until(last.member_id);
	set have; by member_id;
 	totalCost = sum(totalCost, cost);
 	end;
do until(last.member_id);
	set have; by member_id;
    cumCost = sum(cumCost, cost);
    if cumCost &amp;lt;= 0.05 * totalCost then output;
    end;
drop totalCost cumCost;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;If you want all costs that individually amount to less than 5% of the total cost then:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
do until(last.member_id);
	set have; by member_id;
 	totalCost = sum(totalCost, cost);
 	end;
do until(last.member_id);
	set have; by member_id;
    if cost &amp;lt;= 0.05 * totalCost then output;
    end;
drop totalCost;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 04 Nov 2019 19:57:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/top-n-per-group/m-p/601493#M16724</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2019-11-04T19:57:19Z</dc:date>
    </item>
    <item>
      <title>Re: top n% per group</title>
      <link>https://communities.sas.com/t5/New-SAS-User/top-n-per-group/m-p/602621#M16845</link>
      <description>&lt;P&gt;Thanks for your reply. After some more thinking I realize I want the first 5% of observations that appear in the dataset, for each of the various sized groups. Which I think takes the cost variable out of the equation.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For example, for a group that has 100 members, I want to flag the first 5. For the group has 25,643 members I want to flag the first 1,282 which appear in the dataset under that group.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I hope that clears things up! Thanks.&lt;/P&gt;</description>
      <pubDate>Thu, 07 Nov 2019 20:55:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/top-n-per-group/m-p/602621#M16845</guid>
      <dc:creator>rcogan</dc:creator>
      <dc:date>2019-11-07T20:55:52Z</dc:date>
    </item>
    <item>
      <title>Re: top n% per group</title>
      <link>https://communities.sas.com/t5/New-SAS-User/top-n-per-group/m-p/602639#M16847</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/286669"&gt;@rcogan&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Do you understand the code&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/462"&gt;@PGStats&lt;/a&gt;&amp;nbsp;posted?&lt;/P&gt;
&lt;P&gt;You need only replace the summing of cost with a count of obs in the first loop and then in the second loop only output the first n% of the group count of obs.&lt;/P&gt;</description>
      <pubDate>Thu, 07 Nov 2019 22:14:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/top-n-per-group/m-p/602639#M16847</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2019-11-07T22:14:12Z</dc:date>
    </item>
    <item>
      <title>Re: top n% per group</title>
      <link>https://communities.sas.com/t5/New-SAS-User/top-n-per-group/m-p/602644#M16848</link>
      <description>Thanks for walking me through that</description>
      <pubDate>Thu, 07 Nov 2019 22:46:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/top-n-per-group/m-p/602644#M16848</guid>
      <dc:creator>rcogan</dc:creator>
      <dc:date>2019-11-07T22:46:17Z</dc:date>
    </item>
  </channel>
</rss>

