<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Macro to identify and output outliers for multiple variables in a dataset in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Macro-to-identify-and-output-outliers-for-multiple-variables-in/m-p/526165#M143240</link>
    <description>&lt;P&gt;Dear SAS Community&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am writing a program to flag and output outliers in a dataset for multiple variables. I want to output all outlier observations&amp;nbsp;that are above or below 3x the interquartile range into a dataset to query. I would like to do this using a macro to avoid doing this for one by one, but I am not experienced macro writing.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'd like to produce a data set that is a list of&amp;nbsp;all the observations that have an outlier with the following columns:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1) PID: The ID for the observation&lt;/P&gt;&lt;P&gt;2) Variable: The name of the variable that is an outlier&lt;/P&gt;&lt;P&gt;3) Value: The value of the outlier variable&lt;/P&gt;&lt;P&gt;4) Issue: The issue with the value i.e. "Too High" or "Too Low"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I was doing it manually by running Proc Univariate and looking at the distributions for all the variables and&amp;nbsp;then outputting each one using the attached sample part of my code (there are many more variables), but it is taking too long for the number of variables I have.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data outlierqueries;&lt;BR /&gt;keep PID variable value issue;
	set fulldataset;
	if height &amp;gt; xxx then do;
		variable='height';
		value=put(height, 10.);
		issue='Seems too high';
		output;
		end;
	if height &amp;lt; xxx then do;
		variable='height';
		value=put(height, 10.);
		issue='Seems too low';
		output;
		end;
	if weight &amp;gt; xxx then do;
		variable='weight';
		value=put(weight, 10.);
		issue='Seems too high';
		output;
		end;
	if weight &amp;lt; xxx then do;
		variable='weight';
		value=put(weight, 10.);
		issue='Seems too low';
		output;
		end;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any suggestions/help you can offer would be much appreciated!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Cara&lt;/P&gt;</description>
    <pubDate>Thu, 10 Jan 2019 19:15:50 GMT</pubDate>
    <dc:creator>cbt2119</dc:creator>
    <dc:date>2019-01-10T19:15:50Z</dc:date>
    <item>
      <title>Macro to identify and output outliers for multiple variables in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Macro-to-identify-and-output-outliers-for-multiple-variables-in/m-p/526165#M143240</link>
      <description>&lt;P&gt;Dear SAS Community&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am writing a program to flag and output outliers in a dataset for multiple variables. I want to output all outlier observations&amp;nbsp;that are above or below 3x the interquartile range into a dataset to query. I would like to do this using a macro to avoid doing this for one by one, but I am not experienced macro writing.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'd like to produce a data set that is a list of&amp;nbsp;all the observations that have an outlier with the following columns:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1) PID: The ID for the observation&lt;/P&gt;&lt;P&gt;2) Variable: The name of the variable that is an outlier&lt;/P&gt;&lt;P&gt;3) Value: The value of the outlier variable&lt;/P&gt;&lt;P&gt;4) Issue: The issue with the value i.e. "Too High" or "Too Low"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I was doing it manually by running Proc Univariate and looking at the distributions for all the variables and&amp;nbsp;then outputting each one using the attached sample part of my code (there are many more variables), but it is taking too long for the number of variables I have.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data outlierqueries;&lt;BR /&gt;keep PID variable value issue;
	set fulldataset;
	if height &amp;gt; xxx then do;
		variable='height';
		value=put(height, 10.);
		issue='Seems too high';
		output;
		end;
	if height &amp;lt; xxx then do;
		variable='height';
		value=put(height, 10.);
		issue='Seems too low';
		output;
		end;
	if weight &amp;gt; xxx then do;
		variable='weight';
		value=put(weight, 10.);
		issue='Seems too high';
		output;
		end;
	if weight &amp;lt; xxx then do;
		variable='weight';
		value=put(weight, 10.);
		issue='Seems too low';
		output;
		end;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any suggestions/help you can offer would be much appreciated!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Cara&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jan 2019 19:15:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Macro-to-identify-and-output-outliers-for-multiple-variables-in/m-p/526165#M143240</guid>
      <dc:creator>cbt2119</dc:creator>
      <dc:date>2019-01-10T19:15:50Z</dc:date>
    </item>
    <item>
      <title>Re: Macro to identify and output outliers for multiple variables in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Macro-to-identify-and-output-outliers-for-multiple-variables-in/m-p/526167#M143242</link>
      <description>&lt;P&gt;Two suggestions ...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;First, consider whether the 10. format is correct?&amp;nbsp; Perhaps you would want one position after the decimal point?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Second, what output data set(s) can you get from PROC UNIVARIATE?&amp;nbsp; The problem can be automated, but macros might not be the best approach.&amp;nbsp; Can you obtain an output data set with the equivalent of three variables:&amp;nbsp; VARIABLE_NAME, LOW_CUTOFF, HIGH_CUTOFF?&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jan 2019 19:24:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Macro-to-identify-and-output-outliers-for-multiple-variables-in/m-p/526167#M143242</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2019-01-10T19:24:21Z</dc:date>
    </item>
    <item>
      <title>Re: Macro to identify and output outliers for multiple variables in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Macro-to-identify-and-output-outliers-for-multiple-variables-in/m-p/526172#M143245</link>
      <description>&lt;P&gt;You can run PROC SUMMARY on all of your variables, and have the interquartile range for each variable output to a data set. Then loop through the variables one by one to identify outliers. Here is an outline of the code:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
%macro identify_outliers(variablenames=,datasetname=);
	proc summary data=&amp;amp;datasetname;
	    var &amp;amp;variablenames;
		output out=stats qrange= mean=/autoname;
	run;

	%do i=1 %to %sysfunc(countw(&amp;amp;variablenames));
	    %let thisname=%scan(&amp;amp;variablenames,&amp;amp;i,%str( ));
	    data outliers_&amp;amp;thisname;
		    if _n_=1 then set stats;
		    set &amp;amp;datasetname;
                    if &amp;amp;thisname&amp;gt;(&amp;amp;thisname._mean + 3*&amp;amp;thisname._qrange) then do;
			         /*** Do something here ***/
                    end;
             run;
	%end;
%mend;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Hashtag: #PROCSUMMARYRULEZ&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jan 2019 19:36:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Macro-to-identify-and-output-outliers-for-multiple-variables-in/m-p/526172#M143245</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2019-01-10T19:36:22Z</dc:date>
    </item>
    <item>
      <title>Re: Macro to identify and output outliers for multiple variables in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Macro-to-identify-and-output-outliers-for-multiple-variables-in/m-p/526178#M143249</link>
      <description>&lt;P&gt;Thank you, this makes sense. I am just having a small issue when I call the macro and define the variables. I get this error, "ERROR: The keyword parameter INPUT was not defined with the macro.&lt;BR /&gt;ERROR: The keyword parameter VAR was not defined with the macro.&lt;BR /&gt;ERROR: The keyword parameter OUTPUT was not defined with the macro."&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I think I'm likely calling it incorrectly. This is my modified code (not all variables are represented here, just a portion):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%let variablenames=weight height totalfat totalmass totallean;

 %macro identify_outliers(variablenames=,datasetname=);
	proc summary data=&amp;amp;datasetname;
	    var &amp;amp;variablenames;
		output out=stats qrange= mean=/autoname;
	run;

	%do i=1 %to %sysfunc(countw(&amp;amp;variablenames));
	    %let thisname=%scan(&amp;amp;variablenames,&amp;amp;i,%str( ));
	    data outliers_&amp;amp;thisname;
		    if _n_=1 then set stats;
		    set &amp;amp;datasetname;
                    if &amp;amp;thisname&amp;gt;(&amp;amp;thisname._mean + 3*&amp;amp;thisname._qrange) then do;
			        	variable=&amp;amp;thisname;
					value=put(&amp;amp;thisname, 10.);
					issue='Out of range';
                    end;
             run;
	%end;
%mend;&lt;BR /&gt;&lt;BR /&gt;%identify_outliers(input=alldxa, var=&amp;amp;variablenames, output=outliers);&lt;BR /&gt;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 10 Jan 2019 19:55:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Macro-to-identify-and-output-outliers-for-multiple-variables-in/m-p/526178#M143249</guid>
      <dc:creator>cbt2119</dc:creator>
      <dc:date>2019-01-10T19:55:15Z</dc:date>
    </item>
    <item>
      <title>Re: Macro to identify and output outliers for multiple variables in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Macro-to-identify-and-output-outliers-for-multiple-variables-in/m-p/526184#M143251</link>
      <description>&lt;P&gt;There's no OUTPUT parameter in the macro I wrote.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You'd have to add this in to the macro if you want to specify an OUTPUT data set name.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You might also want to add a KEEP statement in the output data set, which I have currently named OUTLIERS_&amp;amp;THISNAME.&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jan 2019 20:06:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Macro-to-identify-and-output-outliers-for-multiple-variables-in/m-p/526184#M143251</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2019-01-10T20:06:01Z</dc:date>
    </item>
    <item>
      <title>Re: Macro to identify and output outliers for multiple variables in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Macro-to-identify-and-output-outliers-for-multiple-variables-in/m-p/526200#M143259</link>
      <description>&lt;P&gt;Thanks, I've got it now. Works well!&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jan 2019 20:55:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Macro-to-identify-and-output-outliers-for-multiple-variables-in/m-p/526200#M143259</guid>
      <dc:creator>cbt2119</dc:creator>
      <dc:date>2019-01-10T20:55:21Z</dc:date>
    </item>
  </channel>
</rss>

