<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: choose top 10 percent data based on one variable in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/choose-top-10-percent-data-based-on-one-variable/m-p/272056#M269513</link>
    <description>&lt;P&gt;thank you. yes, I forgot to sort as "By descending Var1;" .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My base data has 76790 rows, so each 10% sub set will have 7679 rows.&lt;/P&gt;
&lt;P&gt;Yes, I can think of using flag variables to indicate the top10 of each variables, just add a number of new flag variables. This is better than managing multiple data files.&lt;/P&gt;</description>
    <pubDate>Fri, 20 May 2016 17:40:45 GMT</pubDate>
    <dc:creator>fengyuwuzu</dc:creator>
    <dc:date>2016-05-20T17:40:45Z</dc:date>
    <item>
      <title>choose top 10 percent data based on one variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/choose-top-10-percent-data-based-on-one-variable/m-p/272003#M269511</link>
      <description>&lt;P&gt;I want to select out the top 10% data based on several variables (that means several sub sets, each bsed on one variable). I have an idea like the code below. &lt;/P&gt;
&lt;P&gt;Is this the best way to do it? Or is there a better way to do it?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks in advance!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/*first define a macro selecting first 10% data */

%macro top10pct(lib, dataset, var);
proc sql noprint;
select max(ceil(0.1*nlobs)) into :N_top10pct
from &amp;amp;lib..&amp;amp;dataset
;
quit;

data &amp;amp;lib..&amp;amp;var;
set &amp;amp;lib..&amp;amp;dataset;
if _n_ &amp;lt;= &amp;amp;N_top10pct;
run;
%mend top10pct;

/* top 10 % of var1 */
proc sort data=mylib.mydata; 
by var1;
run;
%top10pct(mylib, mydata, var1);

/* top 10 % of var2 */
proc sort data=mylib.mydata; 
by var2;
run;
%top10pct(mylib, mydata, var2);

/* top 10 % of var3 */
proc sort data=mylib.mydata; 
by var3;
run;
%top10pct(mylib, mydata, var3);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 20 May 2016 15:02:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/choose-top-10-percent-data-based-on-one-variable/m-p/272003#M269511</guid>
      <dc:creator>fengyuwuzu</dc:creator>
      <dc:date>2016-05-20T15:02:22Z</dc:date>
    </item>
    <item>
      <title>Re: choose top 10 percent data based on one variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/choose-top-10-percent-data-based-on-one-variable/m-p/272041#M269512</link>
      <description>&lt;P&gt;What will you be doing with those multiple subsets? Often it is a better idea to add&amp;nbsp;flag variables and selectively process than to keep track of multiple data sets. And if the base data is "large" you can have performance issues with many large datasets floating around.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also your current approach is selecting the smallest values for your variable. Which sounds funny to call it the Top 10 percent.&lt;/P&gt;
&lt;P&gt;Or did you mean to have your sort&lt;/P&gt;
&lt;P&gt;By descending Var1;&lt;/P&gt;</description>
      <pubDate>Fri, 20 May 2016 16:49:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/choose-top-10-percent-data-based-on-one-variable/m-p/272041#M269512</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2016-05-20T16:49:58Z</dc:date>
    </item>
    <item>
      <title>Re: choose top 10 percent data based on one variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/choose-top-10-percent-data-based-on-one-variable/m-p/272056#M269513</link>
      <description>&lt;P&gt;thank you. yes, I forgot to sort as "By descending Var1;" .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My base data has 76790 rows, so each 10% sub set will have 7679 rows.&lt;/P&gt;
&lt;P&gt;Yes, I can think of using flag variables to indicate the top10 of each variables, just add a number of new flag variables. This is better than managing multiple data files.&lt;/P&gt;</description>
      <pubDate>Fri, 20 May 2016 17:40:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/choose-top-10-percent-data-based-on-one-variable/m-p/272056#M269513</guid>
      <dc:creator>fengyuwuzu</dc:creator>
      <dc:date>2016-05-20T17:40:45Z</dc:date>
    </item>
    <item>
      <title>Re: choose top 10 percent data based on one variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/choose-top-10-percent-data-based-on-one-variable/m-p/272072#M269514</link>
      <description>&lt;P&gt;If the values are numeric here is a way to add a flag variable for those values at or above the 90th percentile&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc summary data=sashelp.cars;
   var  MSRP  Invoice horsepower;
   output out= carsum(drop= _:) p90=/autoname;
run;

Proc sql;
   create table want as
   select a.*, (a.Msrp ge b.Msrp_P90) as MSRP_flag,
     (a.Invoice ge b.Invoice_P90) as Invoice_flag,
     (a.horsepower ge b.horsepower_P90) as horsepower_flag
   from SASHELP.Cars as a,  Carsum as b;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Since Percentile may be an odd construct when dealing with character values something else may be needed. Consider whether you really want "ABC" to be greater than "ABBBQ".&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 20 May 2016 18:09:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/choose-top-10-percent-data-based-on-one-variable/m-p/272072#M269514</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2016-05-20T18:09:54Z</dc:date>
    </item>
    <item>
      <title>Re: choose top 10 percent data based on one variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/choose-top-10-percent-data-based-on-one-variable/m-p/272111#M269515</link>
      <description>&lt;P&gt;Thank you so much!!!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Have a great weekend.&lt;/P&gt;</description>
      <pubDate>Fri, 20 May 2016 19:35:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/choose-top-10-percent-data-based-on-one-variable/m-p/272111#M269515</guid>
      <dc:creator>fengyuwuzu</dc:creator>
      <dc:date>2016-05-20T19:35:51Z</dc:date>
    </item>
    <item>
      <title>Re: choose top 10 percent data based on one variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/choose-top-10-percent-data-based-on-one-variable/m-p/416632#M269516</link>
      <description>&lt;P&gt;Hello guys,&lt;/P&gt;
&lt;P&gt;how to transform this code if&amp;nbsp; i need top30 percentile and bottom30 percentile at the same time.&amp;nbsp;&lt;BR /&gt;thanks in advance&lt;/P&gt;</description>
      <pubDate>Tue, 28 Nov 2017 10:46:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/choose-top-10-percent-data-based-on-one-variable/m-p/416632#M269516</guid>
      <dc:creator>raqthesolid</dc:creator>
      <dc:date>2017-11-28T10:46:37Z</dc:date>
    </item>
  </channel>
</rss>

