<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Using SAS ARRAYS to replace extreme values with a percentile value in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Using-SAS-ARRAYS-to-replace-extreme-values-with-a-percentile/m-p/270550#M53780</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/85559"&gt;@JonDickens1607﻿&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Sorry for the delay. I was away from my workstation. Big thank you to&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159"&gt;@Tom﻿&lt;/a&gt;&amp;nbsp;for stepping in!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;A)&lt;/STRONG&gt; As Tom has already explained, I used the min operator, which I could (or rather should) have written &lt;FONT face="courier new,courier"&gt;min&lt;/FONT&gt; rather than &amp;gt;&amp;lt;. My first&amp;nbsp;idea was to write in the DO loop (in my&amp;nbsp;first example) as follows:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;if v[d+i]&amp;gt;p[i] then v[d+i]=p[i];&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This would have been clearer. Nevertheless, in this context it is equivalent to the shorter assignment statement&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;v[d+i]=v[d+i] min p[i];&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;(now using the less cryptic notation for the min operator).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Honestly, in the past 18 years I have virtually always used the MIN &lt;EM&gt;function&lt;/EM&gt;&amp;nbsp;and virtually never the MIN &lt;EM&gt;operator&lt;/EM&gt;. But these two are &lt;U&gt;not&lt;/U&gt; equivalent. The MIN and MAX&amp;nbsp;&lt;EM&gt;operators&lt;/EM&gt;&amp;nbsp;differ from the MIN and MAX &lt;EM&gt;functions&lt;/EM&gt; in how they handle missing values: The functions compute the minimum (or maximum, respectively) of the &lt;EM&gt;non-missing&lt;/EM&gt; arguments, whereas the operators determine the smaller (or greater, resp.) of the two values surrounding them in terms of the usual sort order for missing, special missing and non-missing values:&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;._ &amp;lt; . &amp;lt; .A &amp;lt; .B &amp;lt; ... &amp;lt; .Z &amp;lt; &lt;EM&gt;non-missing numeric values&lt;/EM&gt;&lt;/PRE&gt;
&lt;P&gt;In your case it would be incorrect to write&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;v[d+i]=min(v[d+i], p[i]);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;U&gt;&lt;EM&gt;if&lt;/EM&gt;&lt;/U&gt; v[d+i] was a missing value for one or more observations and values of &lt;FONT face="courier new,courier"&gt;i&lt;/FONT&gt;! The MIN &lt;EM&gt;function&lt;/EM&gt; would replace the missing value with the percentile p[i], which is most likely not what you want. In contrast, the assigment statement using the MIN &lt;EM&gt;operator&lt;/EM&gt; leaves the missing value unchanged (and the same holds for the conditional assignment statement &lt;FONT face="courier new,courier"&gt;if v[d+i]&amp;gt;p[i] then v[d+i]=p[i];&lt;/FONT&gt;).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The RETAIN&amp;nbsp;statement I used in the very first version of my post (and which I removed only a few minutes later) was redundant. (It had to do with an earlier draft version of the code.) Sorry, if it confused you. Setting D=DIM(P) was just to avoid repetitive calls of the DIM function in the code (i.e. a kind of abbreviation).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The reason for the somewhat "surprising" D+I index in the first example is:&amp;nbsp;With the first SET statement the 3 (in your case: 60) numeric variables from dataset PERCTLS go into the program data vector (PDV). The second SET statment adds all variables from SASHELP.CLASS to the PDV. Hence, the variable list &lt;FONT face="courier new,courier"&gt;_numeric_&lt;/FONT&gt; in the second ARRAY statement includes 2&lt;FONT face="symbol"&gt;*&lt;/FONT&gt;3 (in your case 2&lt;FONT face="symbol"&gt;*&lt;/FONT&gt;60) numeric variables: The percentiles (P75_1, P75_2, P75_3 in the example), followed by the original variables (AGE, HEIGHT, WEIGHT in the example). So,&amp;nbsp;AGE, HEIGHT and WEIGHT are v[&lt;STRONG&gt;4&lt;/STRONG&gt;], v[5] and v[6], respectively, and the corresponding percentiles &lt;SPAN&gt;P75_1, P75_2 and P75_3&amp;nbsp;&lt;/SPAN&gt;are p[&lt;STRONG&gt;1&lt;/STRONG&gt;], p[2] and p[3], resp. D=DIM(P) is the "offset" that must be used in the indices of array V to match elements of V and elements of P correctly.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This complication was one reason why I added the second example. Here, I don't use the comprehensive variable list &lt;FONT face="courier new,courier"&gt;_numeric_&lt;/FONT&gt; for the second array definition, but a variable list comprising variables from SASHELP.HEART only. The same variable list was (of course) used in the VAR statement of the PROC SUMMARY step. Thus, the two arrays match 1:1 and there is no need for an "offset".&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;B)&lt;/STRONG&gt; The VARNUM option in the PROC CONTENTS statement of the second example is important to determine the correct specification of the abbreviated variable list, because variable lists of this kind refer to the variable order in the PDV (see the documentation I linked to).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;AgeCHDdiag-numeric-Weight&lt;/PRE&gt;
&lt;P&gt;is (in this example) a shorthand notation for&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;AgeCHDdiag AgeAtStart Height Weight&lt;/PRE&gt;
&lt;P&gt;namely all &lt;EM&gt;numeric&lt;/EM&gt; variables in the PDV from AgeCHDdiag to Weight. I added MRW and Cholesterol arbitrarily. Obviously, with 60 variables as in your case, variable lists are particularly convenient.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;PROC SUMMARY computes the 99% percentiles and writes them to variables P99_1, P99_2, ..., P99_6 in dataset PERCTLS (which contains only one observation). Default variables _TYPE_ and _FREQ_ of the output dataset are dropped using the DROP= dataset option and the colon abbreviation for "all variables whose names start with an underscore." The list ends with P99_6 (and not P99_888) because it corresponds 1:1 to the variable list in the VAR statement, which consists of 6 variables.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As described earlier, the two SET statements in the data step bring the percentiles (P99_1, ...) and the original variables of SASHELP.HEART (Status, DeathCause, AgeCHDdiag, ...) together side by side in the PDV. The second ARRAY&amp;nbsp;statement &lt;EM&gt;must&lt;/EM&gt; use the same variable list as was used in PROC SUMMARY. Otherwise, variable values would be -- inappropriately -- compared to percentiles of different variables!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The assignment statement in the DO loop replaces "extremely large values" ("outliers") by the corresponding 99% percentiles, as desired. Finally, the DROP statement removes the index variable I and the percentile variables (using the colon abbreviation for "all variables whose names start with 'P99_'"), assuming you don't need the percentiles (which are constant across all observations!) in dataset WANT. They are available in dataset PERCTLS without duplicates after all.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The possibility to get rid of the percentile variables using the short notation P99_: was the reason for specifying these names in the OUTPUT statement of the PROC SUMMARY step and &lt;EM&gt;not&lt;/EM&gt; using the AUTONAME option instead (which would have created variable names such Height_P99, Weight_P99 etc.). However, thinking again about it, the variable list&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;AgeCHDdiag_P99--Cholesterol_P99&lt;/PRE&gt;
&lt;P&gt;which could replace &lt;FONT face="courier new,courier"&gt;p99_:&lt;/FONT&gt;&amp;nbsp;in the DROP statement would not have been much longer. So,&amp;nbsp;it might even be better to use this in conjunction with the AUTONAME option in PROC SUMMARY:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;output out=perctls(drop=_:) p99= /autoname;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The advantage would be that the name of each percentile variable contains the name of the corresponding original variable, which is probably helpful when dealing with 60 variables (unless these are numbered like VAR1, ..., VAR60&amp;nbsp;anyway).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;(Edit: only removed some vertical white space)&lt;/P&gt;</description>
    <pubDate>Sat, 14 May 2016 22:23:35 GMT</pubDate>
    <dc:creator>FreelanceReinh</dc:creator>
    <dc:date>2016-05-14T22:23:35Z</dc:date>
    <item>
      <title>Using SAS ARRAYS to replace extreme values with a percentile value</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-SAS-ARRAYS-to-replace-extreme-values-with-a-percentile/m-p/270500#M53757</link>
      <description>&lt;P&gt;&lt;SPAN&gt;I have a data set containing 60 variables with positively skew distributions, bounded below by 0 but containing very large extreme values which are most likely data errors.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I have used the array method to replace all the erroneous negative values with 0.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I would like to replace all the large values in each variable with the 99th percentile value which is much closer to the rest of the values. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I am using SAS University Edition with SAS Studio 3.5&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I would appreciate it if you could help me with this task. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I have tried to do this using arrays in a data step and the program runs without error but when I check the output data with Proc Means, the very large values are still there. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I am not sure how to use the Percentile Function: P99 = PCTL(99, OF D[K]) where D is an array. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thank you&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 14 May 2016 10:14:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-SAS-ARRAYS-to-replace-extreme-values-with-a-percentile/m-p/270500#M53757</guid>
      <dc:creator>JonDickens1607</dc:creator>
      <dc:date>2016-05-14T10:14:14Z</dc:date>
    </item>
    <item>
      <title>Re: Using SAS ARRAYS to replace extreme values with a percentile value</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-SAS-ARRAYS-to-replace-extreme-values-with-a-percentile/m-p/270505#M53759</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/85559"&gt;@JonDickens1607﻿&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It was a good idea to use arrays. But the PCTL function applied to an array operates on the array values within the respective observation, i.e., it computes one percentile per &lt;EM&gt;row&lt;/EM&gt;, whereas you need one percentile per &lt;EM&gt;column.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here is an example showing how you could proceed:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Compute 75% percentiles */

proc summary data=sashelp.class;
var _numeric_;
output out=perctls(drop=_:) p75=p75_1-p75_3;
run;

/* Replace larger values by the 75% percentiles */

data want;
if _n_=1 then set perctls;
array p _all_;
set sashelp.class;
array v _numeric_;
d=dim(p);
do i=1 to d;
  v[d+i]=v[d+i]&amp;gt;&amp;lt;p[i];
end;
drop d i p75_:;
run;

proc print data=sashelp.class;
run;

proc print data=want;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;I use 75% percentiles for demonstration because sashelp.class contains only 19 observations. Just search and replace 75 by 99 and adapt the variable lists &lt;FONT face="courier new,courier"&gt;_numeric_&lt;/FONT&gt; (2 instances) and the "3" in&amp;nbsp;&lt;FONT face="courier new,courier"&gt;p75_1-p75_&lt;STRONG&gt;3&lt;/STRONG&gt;&lt;/FONT&gt;. And please feel free to ask if anything is unclear.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Edit: Depending on how you define the variable lists, the indices might need to be adapted, too.&lt;/P&gt;</description>
      <pubDate>Sat, 14 May 2016 12:03:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-SAS-ARRAYS-to-replace-extreme-values-with-a-percentile/m-p/270505#M53759</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2016-05-14T12:03:47Z</dc:date>
    </item>
    <item>
      <title>Re: Using SAS ARRAYS to replace extreme values with a percentile value</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-SAS-ARRAYS-to-replace-extreme-values-with-a-percentile/m-p/270507#M53761</link>
      <description>Thank you for your response to my question.&lt;BR /&gt;&lt;BR /&gt;I will apply your suggestion to my SAS Code and then let you know the&lt;BR /&gt;result.&lt;BR /&gt;&lt;BR /&gt;Cheers&lt;BR /&gt;&lt;BR /&gt;##- Please type your reply above this line. Simple formatting, no&lt;BR /&gt;attachments. -##</description>
      <pubDate>Sat, 14 May 2016 12:01:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-SAS-ARRAYS-to-replace-extreme-values-with-a-percentile/m-p/270507#M53761</guid>
      <dc:creator>JonDickens1607</dc:creator>
      <dc:date>2016-05-14T12:01:55Z</dc:date>
    </item>
    <item>
      <title>Re: Using SAS ARRAYS to replace extreme values with a percentile value</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-SAS-ARRAYS-to-replace-extreme-values-with-a-percentile/m-p/270513#M53763</link>
      <description>&lt;P&gt;Here is another example, very similar to the previous one, but possibly even closer to your case. Your 60 variables shall be represented by 6 variables, arbitrarily selected from SASHELP.HEART.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc contents data=sashelp.heart varnum; /* VARNUM is important! */
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Select variables and form variable list (cf. &lt;A href="http://support.sas.com/documentation/cdl/en/lrcon/68089/HTML/default/viewer.htm#p0wphcpsfgx6o7n1sjtqzizp1n39.htm" target="_blank"&gt;documentation&lt;/A&gt;) based on PROC CONTENTS output:&lt;/P&gt;
&lt;PRE&gt;                    Variables &lt;FONT color="#008000"&gt;&lt;STRONG&gt;in Creation Order&lt;/STRONG&gt;&lt;/FONT&gt;

 #    Variable          Type    Len    Label

 1    Status            Char      5
 2    DeathCause        Char     26    Cause of Death
&lt;FONT color="#FF0000"&gt; 3    AgeCHDdiag        Num       8    Age CHD Diagnosed&lt;/FONT&gt;
 4    Sex               Char      6
 &lt;FONT color="#FF0000"&gt;5    AgeAtStart        Num       8    Age at Start&lt;/FONT&gt;
&lt;FONT color="#FF0000"&gt; 6    Height            Num       8
 7    Weight            Num       8&lt;/FONT&gt;
 8    Diastolic         Num       8
 9    Systolic          Num       8
&lt;FONT color="#FF0000"&gt;10    MRW               Num       8    Metropolitan Relative Weight&lt;/FONT&gt;
11    Smoking           Num       8
12    AgeAtDeath        Num       8    Age at Death
&lt;FONT color="#FF0000"&gt;13    Cholesterol       Num       8&lt;/FONT&gt;
14    Chol_Status       Char     10    Cholesterol Status
...&lt;/PRE&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Compute 99% percentiles of selected variables */

proc summary data=sashelp.heart;
var AgeCHDdiag-numeric-Weight MRW Cholesterol;
output out=perctls(drop=_:) p99=p99_1-p99_888; /* It doesn't hurt that 888 is way too large: */
run;                                           /* Only p99_1-p99_6 are in fact created.      */

/* Replace larger values by the 99% percentiles */ 

data want;
if _n_=1 then set perctls;
array p _all_;
set sashelp.heart;
array v AgeCHDdiag-numeric-Weight MRW Cholesterol;
do i=1 to dim(v);
  v[i]=v[i]&amp;gt;&amp;lt;p[i];
end;
drop i p99_:;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Due to the different way of specifying the variable list for array v, both arrays have now the same dimension (6), which simplifies the indices in the DO loop.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Checks of dataset WANT could include a calculation of maximum values of the modified variables (these should now match the percentiles in dataset PERCTLS) and a PROC COMPARE with the original dataset (showing the modified values):&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc means data=want max;
var AgeCHDdiag-numeric-Weight MRW Cholesterol;
run;

proc print data=perctls;
run;

proc compare data=sashelp.heart c=want;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 14 May 2016 14:07:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-SAS-ARRAYS-to-replace-extreme-values-with-a-percentile/m-p/270513#M53763</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2016-05-14T14:07:49Z</dc:date>
    </item>
    <item>
      <title>Re: Using SAS ARRAYS to replace extreme values with a percentile value</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-SAS-ARRAYS-to-replace-extreme-values-with-a-percentile/m-p/270535#M53776</link>
      <description>Thank you for your kind assistance.&lt;BR /&gt;&lt;BR /&gt;A. Please explain the syntax:&lt;BR /&gt;&lt;BR /&gt;solution 1: V[D+I] = V[D+I] &amp;gt;&amp;lt; P[I]&lt;BR /&gt;Using Array P and Array V and Retaining D = DIM(P).&lt;BR /&gt;&lt;BR /&gt;solution 2: V[I] =V[I] &amp;gt;&lt;P&gt;Using Array P and Array V without D&lt;BR /&gt;&lt;BR /&gt;B. Please explain the logic behind your SAS Code.&lt;BR /&gt;&lt;BR /&gt;Thank you&lt;BR /&gt;&lt;BR /&gt;##- Please type your reply above this line. Simple formatting, no&lt;BR /&gt;attachments. -##&lt;/P&gt;</description>
      <pubDate>Sat, 14 May 2016 19:14:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-SAS-ARRAYS-to-replace-extreme-values-with-a-percentile/m-p/270535#M53776</guid>
      <dc:creator>JonDickens1607</dc:creator>
      <dc:date>2016-05-14T19:14:55Z</dc:date>
    </item>
    <item>
      <title>Re: Using SAS ARRAYS to replace extreme values with a percentile value</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-SAS-ARRAYS-to-replace-extreme-values-with-a-percentile/m-p/270536#M53777</link>
      <description>&lt;P&gt;This&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt; V[D+I] &amp;gt;&amp;lt; P[I]&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;is using the min operator. &amp;nbsp;It would probably be clearer to use the MIN() function instead.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt; min(V[D+I],P[I])&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;So the result will be that outliers will be truncated and replaced with 99th percentile value for that variable.&lt;/P&gt;
&lt;P&gt;Note that is in only going to truncate the high values. &amp;nbsp;If you also have extremely low values then you will need to also output the 1st percentale and use a similar technique to set extremely low values to the 1st percentile value.&lt;/P&gt;</description>
      <pubDate>Sat, 14 May 2016 19:33:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-SAS-ARRAYS-to-replace-extreme-values-with-a-percentile/m-p/270536#M53777</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2016-05-14T19:33:50Z</dc:date>
    </item>
    <item>
      <title>Re: Using SAS ARRAYS to replace extreme values with a percentile value</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-SAS-ARRAYS-to-replace-extreme-values-with-a-percentile/m-p/270537#M53778</link>
      <description>Hi Tom,&lt;BR /&gt;&lt;BR /&gt;Thanks for the clarification.&lt;BR /&gt;&lt;BR /&gt;Given that I am using a function with arrays which is the best syntax to&lt;BR /&gt;use:&lt;BR /&gt;&lt;BR /&gt;a. MIN( V[D+I] , P[I] )&lt;BR /&gt;&lt;BR /&gt;b. MIN( OF V[D+I] , P[I] )&lt;BR /&gt;&lt;BR /&gt;c. MIN( V[*] , P[*] )&lt;BR /&gt;&lt;BR /&gt;d. MIN( OF V[*] , P[*] )&lt;BR /&gt;&lt;BR /&gt;Please explain your response.&lt;BR /&gt;&lt;BR /&gt;Cheers&lt;BR /&gt;&lt;BR /&gt;##- Please type your reply above this line. Simple formatting, no&lt;BR /&gt;attachments. -##</description>
      <pubDate>Sat, 14 May 2016 19:47:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-SAS-ARRAYS-to-replace-extreme-values-with-a-percentile/m-p/270537#M53778</guid>
      <dc:creator>JonDickens1607</dc:creator>
      <dc:date>2016-05-14T19:47:55Z</dc:date>
    </item>
    <item>
      <title>Re: Using SAS ARRAYS to replace extreme values with a percentile value</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-SAS-ARRAYS-to-replace-extreme-values-with-a-percentile/m-p/270538#M53779</link>
      <description>&lt;P&gt;Depends what you want to do. Let's first check them out.&lt;/P&gt;
&lt;P&gt;A is take the minimum of two values. &amp;nbsp;Not sure why the index is different, but the syntax is valid as long as the values of D+I and I are within the range of index values for the respective arrays.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;B is valid and is really the same as A. &amp;nbsp;Note that for B SAS will first separate on the commas and then the check the use of&amp;nbsp;OF keyword in the first parameter.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;C is invalid . You cannot use * as the index of the array without using the OF keyword to support a list of values.&lt;/P&gt;
&lt;P&gt;D is invalid also. &amp;nbsp;The first parameter (of v(*)) is valid, but you cannot use the * index in the second parameter. &amp;nbsp;Either remove the unneeded comma or add another OF keyword.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note that if you corrected C and D to use valid syntax, say by using min(of v(*) p(*)). then it would work but have a totally different meaning than A and B. &amp;nbsp;A and B is taking the min of two values and the correct C is taking the min over all of the elements of two arrays.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 14 May 2016 20:06:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-SAS-ARRAYS-to-replace-extreme-values-with-a-percentile/m-p/270538#M53779</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2016-05-14T20:06:17Z</dc:date>
    </item>
    <item>
      <title>Re: Using SAS ARRAYS to replace extreme values with a percentile value</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-SAS-ARRAYS-to-replace-extreme-values-with-a-percentile/m-p/270550#M53780</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/85559"&gt;@JonDickens1607﻿&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Sorry for the delay. I was away from my workstation. Big thank you to&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159"&gt;@Tom﻿&lt;/a&gt;&amp;nbsp;for stepping in!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;A)&lt;/STRONG&gt; As Tom has already explained, I used the min operator, which I could (or rather should) have written &lt;FONT face="courier new,courier"&gt;min&lt;/FONT&gt; rather than &amp;gt;&amp;lt;. My first&amp;nbsp;idea was to write in the DO loop (in my&amp;nbsp;first example) as follows:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;if v[d+i]&amp;gt;p[i] then v[d+i]=p[i];&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This would have been clearer. Nevertheless, in this context it is equivalent to the shorter assignment statement&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;v[d+i]=v[d+i] min p[i];&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;(now using the less cryptic notation for the min operator).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Honestly, in the past 18 years I have virtually always used the MIN &lt;EM&gt;function&lt;/EM&gt;&amp;nbsp;and virtually never the MIN &lt;EM&gt;operator&lt;/EM&gt;. But these two are &lt;U&gt;not&lt;/U&gt; equivalent. The MIN and MAX&amp;nbsp;&lt;EM&gt;operators&lt;/EM&gt;&amp;nbsp;differ from the MIN and MAX &lt;EM&gt;functions&lt;/EM&gt; in how they handle missing values: The functions compute the minimum (or maximum, respectively) of the &lt;EM&gt;non-missing&lt;/EM&gt; arguments, whereas the operators determine the smaller (or greater, resp.) of the two values surrounding them in terms of the usual sort order for missing, special missing and non-missing values:&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;._ &amp;lt; . &amp;lt; .A &amp;lt; .B &amp;lt; ... &amp;lt; .Z &amp;lt; &lt;EM&gt;non-missing numeric values&lt;/EM&gt;&lt;/PRE&gt;
&lt;P&gt;In your case it would be incorrect to write&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;v[d+i]=min(v[d+i], p[i]);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;U&gt;&lt;EM&gt;if&lt;/EM&gt;&lt;/U&gt; v[d+i] was a missing value for one or more observations and values of &lt;FONT face="courier new,courier"&gt;i&lt;/FONT&gt;! The MIN &lt;EM&gt;function&lt;/EM&gt; would replace the missing value with the percentile p[i], which is most likely not what you want. In contrast, the assigment statement using the MIN &lt;EM&gt;operator&lt;/EM&gt; leaves the missing value unchanged (and the same holds for the conditional assignment statement &lt;FONT face="courier new,courier"&gt;if v[d+i]&amp;gt;p[i] then v[d+i]=p[i];&lt;/FONT&gt;).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The RETAIN&amp;nbsp;statement I used in the very first version of my post (and which I removed only a few minutes later) was redundant. (It had to do with an earlier draft version of the code.) Sorry, if it confused you. Setting D=DIM(P) was just to avoid repetitive calls of the DIM function in the code (i.e. a kind of abbreviation).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The reason for the somewhat "surprising" D+I index in the first example is:&amp;nbsp;With the first SET statement the 3 (in your case: 60) numeric variables from dataset PERCTLS go into the program data vector (PDV). The second SET statment adds all variables from SASHELP.CLASS to the PDV. Hence, the variable list &lt;FONT face="courier new,courier"&gt;_numeric_&lt;/FONT&gt; in the second ARRAY statement includes 2&lt;FONT face="symbol"&gt;*&lt;/FONT&gt;3 (in your case 2&lt;FONT face="symbol"&gt;*&lt;/FONT&gt;60) numeric variables: The percentiles (P75_1, P75_2, P75_3 in the example), followed by the original variables (AGE, HEIGHT, WEIGHT in the example). So,&amp;nbsp;AGE, HEIGHT and WEIGHT are v[&lt;STRONG&gt;4&lt;/STRONG&gt;], v[5] and v[6], respectively, and the corresponding percentiles &lt;SPAN&gt;P75_1, P75_2 and P75_3&amp;nbsp;&lt;/SPAN&gt;are p[&lt;STRONG&gt;1&lt;/STRONG&gt;], p[2] and p[3], resp. D=DIM(P) is the "offset" that must be used in the indices of array V to match elements of V and elements of P correctly.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This complication was one reason why I added the second example. Here, I don't use the comprehensive variable list &lt;FONT face="courier new,courier"&gt;_numeric_&lt;/FONT&gt; for the second array definition, but a variable list comprising variables from SASHELP.HEART only. The same variable list was (of course) used in the VAR statement of the PROC SUMMARY step. Thus, the two arrays match 1:1 and there is no need for an "offset".&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;B)&lt;/STRONG&gt; The VARNUM option in the PROC CONTENTS statement of the second example is important to determine the correct specification of the abbreviated variable list, because variable lists of this kind refer to the variable order in the PDV (see the documentation I linked to).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;AgeCHDdiag-numeric-Weight&lt;/PRE&gt;
&lt;P&gt;is (in this example) a shorthand notation for&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;AgeCHDdiag AgeAtStart Height Weight&lt;/PRE&gt;
&lt;P&gt;namely all &lt;EM&gt;numeric&lt;/EM&gt; variables in the PDV from AgeCHDdiag to Weight. I added MRW and Cholesterol arbitrarily. Obviously, with 60 variables as in your case, variable lists are particularly convenient.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;PROC SUMMARY computes the 99% percentiles and writes them to variables P99_1, P99_2, ..., P99_6 in dataset PERCTLS (which contains only one observation). Default variables _TYPE_ and _FREQ_ of the output dataset are dropped using the DROP= dataset option and the colon abbreviation for "all variables whose names start with an underscore." The list ends with P99_6 (and not P99_888) because it corresponds 1:1 to the variable list in the VAR statement, which consists of 6 variables.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As described earlier, the two SET statements in the data step bring the percentiles (P99_1, ...) and the original variables of SASHELP.HEART (Status, DeathCause, AgeCHDdiag, ...) together side by side in the PDV. The second ARRAY&amp;nbsp;statement &lt;EM&gt;must&lt;/EM&gt; use the same variable list as was used in PROC SUMMARY. Otherwise, variable values would be -- inappropriately -- compared to percentiles of different variables!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The assignment statement in the DO loop replaces "extremely large values" ("outliers") by the corresponding 99% percentiles, as desired. Finally, the DROP statement removes the index variable I and the percentile variables (using the colon abbreviation for "all variables whose names start with 'P99_'"), assuming you don't need the percentiles (which are constant across all observations!) in dataset WANT. They are available in dataset PERCTLS without duplicates after all.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The possibility to get rid of the percentile variables using the short notation P99_: was the reason for specifying these names in the OUTPUT statement of the PROC SUMMARY step and &lt;EM&gt;not&lt;/EM&gt; using the AUTONAME option instead (which would have created variable names such Height_P99, Weight_P99 etc.). However, thinking again about it, the variable list&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;AgeCHDdiag_P99--Cholesterol_P99&lt;/PRE&gt;
&lt;P&gt;which could replace &lt;FONT face="courier new,courier"&gt;p99_:&lt;/FONT&gt;&amp;nbsp;in the DROP statement would not have been much longer. So,&amp;nbsp;it might even be better to use this in conjunction with the AUTONAME option in PROC SUMMARY:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;output out=perctls(drop=_:) p99= /autoname;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The advantage would be that the name of each percentile variable contains the name of the corresponding original variable, which is probably helpful when dealing with 60 variables (unless these are numbered like VAR1, ..., VAR60&amp;nbsp;anyway).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;(Edit: only removed some vertical white space)&lt;/P&gt;</description>
      <pubDate>Sat, 14 May 2016 22:23:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-SAS-ARRAYS-to-replace-extreme-values-with-a-percentile/m-p/270550#M53780</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2016-05-14T22:23:35Z</dc:date>
    </item>
    <item>
      <title>Re: Using SAS ARRAYS to replace extreme values with a percentile value</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-SAS-ARRAYS-to-replace-extreme-values-with-a-percentile/m-p/270551#M53781</link>
      <description>Thank you very much for your detailed explanation.&lt;BR /&gt;&lt;BR /&gt;It has been most helpful and I have applied your code to my SAS program and&lt;BR /&gt;it has worked perfectly.&lt;BR /&gt;&lt;BR /&gt;After many frustrating hours of trial and error, it feels good to have a&lt;BR /&gt;simple solution which I understand and could adapt in future if I need to&lt;BR /&gt;do so.&lt;BR /&gt;&lt;BR /&gt;##- Please type your reply above this line. Simple formatting, no&lt;BR /&gt;attachments. -##</description>
      <pubDate>Sat, 14 May 2016 22:32:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-SAS-ARRAYS-to-replace-extreme-values-with-a-percentile/m-p/270551#M53781</guid>
      <dc:creator>JonDickens1607</dc:creator>
      <dc:date>2016-05-14T22:32:11Z</dc:date>
    </item>
  </channel>
</rss>

