<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: removing outliers in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/removing-outliers/m-p/766839#M243068</link>
    <description>&lt;P&gt;I'm also confused because the macro says to use PROC UNIVARIATE, but proceeds to use PROC MEANS to calculate the 25th and 75th percentiles.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is there a reason you're not just using PROC UNIVARIATE? It can identify these values for you, but I may be misunderstanding something completely.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here are some resources:&lt;/P&gt;
&lt;P&gt;&lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/procstat/procstat_univariate_examples03.htm" target="_self"&gt;Identifying Extreme Observations and Extreme Values&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://www.lexjansen.com/phuse/2011/cc/CC01.pdf" target="_self"&gt;Data cleaning and spotting outliers with UNIVARIATE&lt;/A&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 09 Sep 2021 12:57:33 GMT</pubDate>
    <dc:creator>maguiremq</dc:creator>
    <dc:date>2021-09-09T12:57:33Z</dc:date>
    <item>
      <title>removing outliers</title>
      <link>https://communities.sas.com/t5/SAS-Programming/removing-outliers/m-p/766833#M243064</link>
      <description>&lt;P&gt;I am wondering if I can use this code (please see below) for removing outliers from my data set. I am not familiar with macro at all. I tried this and it looks like it would work if I made modifications. Explaining the variables I used as a test:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;variable a: freq 6-41 (only outlier is&amp;nbsp; 112)&lt;/P&gt;&lt;P&gt;var b freq is 3-47 (outliers are 0 135 340 and 90750)&lt;/P&gt;&lt;P&gt;The code below&amp;nbsp; removes everything that is &amp;gt;16 for var a (var a&amp;nbsp; freq is 6-16 in my outfile) and for var b it removes everything that is &amp;gt;47 and it actually also removes anything that is &amp;lt; 21 (my outfile shows freq 21-47 for var b)&lt;/P&gt;&lt;P&gt;Probably has to do with the P25 and P75 below? How can I change the code&amp;nbsp; to remove any outliers like 0 and anything that is higher than the max value for each var (which will vary considerably per var. I also have a lot more than only 2 vars).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For example if var a max value is 41 anything higher than that would be removed.&amp;nbsp; If the max for var b is 47 anything higher than that would be removed.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Var c and d,&amp;nbsp; if added to the code below, sas tells me it does not work:&amp;nbsp; variable c freq 26.4 - 104, it has a lot of values with decimal places like 26.4 but also numbers without decimals. I would removed anything higher than the 104 though but I need the values with decimals to stay.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Var d freq 11.7 - 400 but the decimals here are outliers...there should be no decimals in this var -&amp;nbsp; freq should be only in the 100's and I have one random 11.7 that would need to be removed. The code below does not "like" the values with decimals.&lt;/P&gt;&lt;P&gt;Is there a way to make changes in this code for what I need or should I do this in a completely different way? Thank you !&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The code that I found and tried is below:&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;%macro outliers(input=, var=, output= );&lt;/P&gt;&lt;P&gt;%let Q1=;&lt;BR /&gt;%let Q3=;&lt;BR /&gt;%let varL=;&lt;BR /&gt;%let varH=;&lt;/P&gt;&lt;P&gt;%let n=%sysfunc(countw(&amp;amp;var));&lt;BR /&gt;%do i= 1 %to &amp;amp;n;&lt;BR /&gt;%let val = %scan(&amp;amp;var,&amp;amp;i);&lt;BR /&gt;%let Q1 = &amp;amp;Q1 &amp;amp;val._P25;&lt;BR /&gt;%let Q3 = &amp;amp;Q3 &amp;amp;val._P75;&lt;BR /&gt;%let varL = &amp;amp;varL &amp;amp;val.L;&lt;BR /&gt;%let varH = &amp;amp;varH &amp;amp;val.H;&lt;BR /&gt;%end;&lt;/P&gt;&lt;P&gt;/* Calculate the quartiles and inter-quartile range using proc univariate */&lt;BR /&gt;proc means data=&amp;amp;input nway noprint;&lt;BR /&gt;var &amp;amp;var;&lt;BR /&gt;output out=temp P25= P75= / autoname;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;/* Extract the upper and lower limits into macro variables */&lt;BR /&gt;data temp;&lt;BR /&gt;set temp;&lt;BR /&gt;ID = 1;&lt;BR /&gt;array varb(&amp;amp;n) &amp;amp;Q1;&lt;BR /&gt;array varc(&amp;amp;n) &amp;amp;Q3;&lt;BR /&gt;array lower(&amp;amp;n) &amp;amp;varL;&lt;BR /&gt;array upper(&amp;amp;n) &amp;amp;varH;&lt;BR /&gt;do i = 1 to dim(varb);&lt;BR /&gt;lower(i) = varb(i) - 3 * (varc(i) - varb(i));&lt;BR /&gt;upper(i) = varc(i) + 3 * (varc(i) - varb(i));&lt;BR /&gt;end;&lt;BR /&gt;drop i _type_ _freq_;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;data temp1;&lt;BR /&gt;set &amp;amp;input;&lt;BR /&gt;ID = 1;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;data &amp;amp;output;&lt;BR /&gt;merge temp1 temp;&lt;BR /&gt;by ID;&lt;BR /&gt;array var(&amp;amp;n) &amp;amp;var;&lt;BR /&gt;array lower(&amp;amp;n) &amp;amp;varL;&lt;BR /&gt;array upper(&amp;amp;n) &amp;amp;varH;&lt;BR /&gt;do i = 1 to dim(var);&lt;BR /&gt;if not missing(var(i)) then do;&lt;BR /&gt;if var(i) &amp;gt;= lower(i) and var(i) &amp;lt;= upper(i);&lt;BR /&gt;end;&lt;BR /&gt;end;&lt;BR /&gt;drop &amp;amp;Q1 &amp;amp;Q3 &amp;amp;varL &amp;amp;varH ID i;&lt;BR /&gt;run;&lt;BR /&gt;%mend;&lt;/P&gt;&lt;P&gt;%outliers(input=abfile, var= a&amp;nbsp; b, output= outabfile);&lt;/P&gt;</description>
      <pubDate>Thu, 09 Sep 2021 12:31:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/removing-outliers/m-p/766833#M243064</guid>
      <dc:creator>Mscarboncopy</dc:creator>
      <dc:date>2021-09-09T12:31:05Z</dc:date>
    </item>
    <item>
      <title>Re: removing outliers</title>
      <link>https://communities.sas.com/t5/SAS-Programming/removing-outliers/m-p/766834#M243065</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;Var c and d,&amp;nbsp; if added to the code below, sas tells me it does not work:&amp;nbsp; variable c freq 26.4 - 104, it has a lot of values with decimal places like 26.4 but also numbers without decimals. I would removed anything higher than the 104 though but I need the values with decimals to stay.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;Either I am misunderstanding, or this doesn't make sense. You want 105 to be removed, but 105.1 to not be removed?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;Var d freq 11.7 - 400 but the decimals here are outliers...there should be no decimals in this var -&amp;nbsp; freq should be only in the 100's and I have one random 11.7 that would need to be removed. The code below does not "like" the values with decimals.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;Again, I don't understand. And I don't know what you mean by "code below does not 'like' the values with decimals". The word "like" means nothing in a mathematical/statistical/programming sense.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;We can't see what you are seeing, perhaps showing us some of the data and some of the results might help.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 09 Sep 2021 12:43:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/removing-outliers/m-p/766834#M243065</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2021-09-09T12:43:33Z</dc:date>
    </item>
    <item>
      <title>Re: removing outliers</title>
      <link>https://communities.sas.com/t5/SAS-Programming/removing-outliers/m-p/766839#M243068</link>
      <description>&lt;P&gt;I'm also confused because the macro says to use PROC UNIVARIATE, but proceeds to use PROC MEANS to calculate the 25th and 75th percentiles.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is there a reason you're not just using PROC UNIVARIATE? It can identify these values for you, but I may be misunderstanding something completely.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here are some resources:&lt;/P&gt;
&lt;P&gt;&lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/procstat/procstat_univariate_examples03.htm" target="_self"&gt;Identifying Extreme Observations and Extreme Values&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://www.lexjansen.com/phuse/2011/cc/CC01.pdf" target="_self"&gt;Data cleaning and spotting outliers with UNIVARIATE&lt;/A&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 09 Sep 2021 12:57:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/removing-outliers/m-p/766839#M243068</guid>
      <dc:creator>maguiremq</dc:creator>
      <dc:date>2021-09-09T12:57:33Z</dc:date>
    </item>
    <item>
      <title>Re: removing outliers</title>
      <link>https://communities.sas.com/t5/SAS-Programming/removing-outliers/m-p/767074#M243136</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/210474"&gt;@Mscarboncopy&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;Please have a look at this paper&amp;nbsp;&lt;A href="https://www.lexjansen.com/phuse/2011/cc/CC01.pdf" target="_blank"&gt;https://www.lexjansen.com/phuse/2011/cc/CC01.pdf&lt;/A&gt;&amp;nbsp;.&lt;BR /&gt;This may have a solution for you.&lt;/P&gt;</description>
      <pubDate>Fri, 10 Sep 2021 17:01:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/removing-outliers/m-p/767074#M243136</guid>
      <dc:creator>Sajid01</dc:creator>
      <dc:date>2021-09-10T17:01:54Z</dc:date>
    </item>
  </channel>
</rss>

