I am trying to write syntax that will identify outliers. I am using these instructions as guidance.
I am running into some errors in the DATA step, including "Apparent symbolic reference AMP not resolved." I know nothing about Macros in SAS.
This is my syntax:
/* 3 Easy Ways to Find Outliers in SAS */
/* https://sasexamplecode.com/3-easy-ways-to-find-outliers-in-sas/ */
/* 1. Test the Assumption of Normality */
/* The first step if to test the normality assumption. */
/* In SAS, you can use PROC UNIVARIATE to check if your data follow a normal distribution. */
/* You do this by adding the NORMAL option to the UNIVARIATE statement. */
ods output TestsForNormality = work.normal_test;
ods output BasicMeasures = work.measures;
proc univariate data=meta.data_01 normal;
var age;
histogram age / normal;
run;
proc print data=work.normal_test noobs;
run;
proc print data=work.measures noobs;
run;
/* 2. Save the Mean and Standard Deviation as Macro Variables */
/* The second step to find outliers is to save the Mean and Standard Deviation as macro variables. */
/* PROC UNIVARIATE can also create a dataset with summary statistics such as the p-value of the normality test, the mean, and the standard deviation. */
/* To do so, we use the ODS OUTPUT statement. */
/* See above */
/* To make your code reusable and to find the outliers more efficiently, we save the p-value of the Shapiro-Wilk test, */
/* the mean, and the standard deviation as three macro variables with a SELECT INTO statement. */
/* See below */
proc sql;
select pValue label= 'p-value' into :pvalue from work.normal_test where test = 'Shapiro-Wilk';
select LocValue label = 'Mean' into :mean from work.measures where LocMeasure ='Mean';
select VarValue label = 'Std Dev' into :stddev from work.measures where VarMeasure ='Std Deviation';
quit;
/* 3. Filter the Outliers */
/* The third step to find outliers in SAS is filtering all observations that are 3 standard deviations above or below the mean. */
data work.outliers_normaldistr;
set meta.data_01;
if age lt (&mean. - 3*&stddev.)
or age gt (&mean. + 3*&stddev.) then output;
run;
proc print data=work.outliers_normaldistr noobs;
run;
Here is the log:
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
68
69 data work.outliers_normaldistr;
70 set meta.data_01;
71
72 if age lt (&mean. - 3*&stddev.)
_ _____
386 180
76
WARNING: Apparent symbolic reference AMP not resolved.
WARNING: Apparent symbolic reference AMP not resolved.
72 if age lt (&mean. - 3*&stddev.)
_______
180
ERROR 386-185: Expecting an arithmetic expression.
ERROR 180-322: Statement is not valid or it is used out of proper order.
ERROR 76-322: Syntax error, statement will be ignored.
73 or age gt (&mean. + 3*&stddev.) then output;
_____ _______
180 180
WARNING: Apparent symbolic reference AMP not resolved.
WARNING: Apparent symbolic reference AMP not resolved.
ERROR 180-322: Statement is not valid or it is used out of proper order.
74 run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.OUTLIERS_NORMALDISTR may be incomplete. When this step was stopped there were 0 observations and 164
variables.
WARNING: Data set WORK.OUTLIERS_NORMALDISTR was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 1424.18k
OS Memory 36008.00k
Timestamp 11/17/2021 09:54:53 PM
Step Count 282 Switch Count 0
Page Faults 0
Page Reclaims 92
Page Swaps 0
Voluntary Context Switches 6
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 8
75
76 proc print data=work.outliers_normaldistr noobs;
77 run;
NOTE: No variables in data set WORK.OUTLIERS_NORMALDISTR.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 478.43k
OS Memory 35488.00k
Timestamp 11/17/2021 09:54:53 PM
Step Count 283 Switch Count 0
Page Faults 0
Page Reclaims 16
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 0
78
79 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
89
Here is a sample data set:
data WORK.DATA_04;
infile datalines dsd truncover;
input Participant_ID:$3. Group:$1. Sex:$2. Age:BEST12. Efflux_V1:BEST12. Efflux_V2:BEST12. Efflux_V3:BEST12. ApoA1_V1:BEST12. ApoA1_V2:BEST12. ApoA1_V3:BEST12. ApoC1_V1:BEST12. ApoC1_V2:BEST12. ApoC1_V3:BEST12.;
format Age BEST12. Efflux_V1 BEST12. Efflux_V2 BEST12. Efflux_V3 BEST12. ApoA1_V1 BEST12. ApoA1_V2 BEST12. ApoA1_V3 BEST12. ApoC1_V1 BEST12. ApoC1_V2 BEST12. ApoC1_V3 BEST12.;
datalines;
1 A M 52 11.68 12.59 11.21 238.65 279.72 171.58 41.22 62.36 36.07
10 B M 68 9.58 9.18 10.78 215.79 214.9 253.98 47.33 38 50.52
11 A F 71 12.26 9.17 9.94 282.3 227.08 282.3 44.13 44.21 44.13
12 B M 71 5.88 9.45 10.55 173.07 230.49 174.09 47.8 51.28 37.81
13 A F 71 13.17 12.69 11.33 259.03 265.83 255.03 61.34 67.46 73.5
14 B M 54 10.51 7.96 8.28 211.39 192.76 192.17 41.14 36.83 34.86
15 A F 66 7.34 6.74 8.69 240.58 160.97 205.72 35.8 25.89 44.28
16 B F 69 11.07 13.44 10.08 236.45 242.66 214.03 54.07 55.34 37.61
17 A F 58 8.1 7.62 8.03 188.51 159.8 164.22 36.04 32.35 30.78
18 B F 63 10.14 10.06 10.78 229.05 252.06 228.63 57.49 63.17 50.44
;;;;
Run;
Finally, if possible, it would be helpful to list the outliers by Participant_ID. Is there a way to modify this syntax to do so?
Thank you.
... View more