BookmarkSubscribeRSS Feed
Anthony_eng
Obsidian | Level 7

Hello,

 

I am trying to identify outliers in my data and then create a table using only the outlier data (data below 4 and above 15).  I have included a sample of the code that I am trying to use.  I intend on displaying the outlier data graphically once I am able to correct the code.  

 

I commented out the case statement that I was trying to use to identify data, I keep receiving the following syntax error when applying it.  

 

ERROR 22-322: Syntax error, expecting one of the following: a quoted string, ',', AS, FORMAT, FROM, INFORMAT, INTO, LABEL, LEN,
LENGTH, TRANSCODE.

ERROR 76-322: Syntax error, statement will be ignored.

 

PROC SQL;
CREATE TABLE WORK.AVGTRANSLATED AS
SELECT EXPERIMENTID, TRIALKEY, LOCATIONID,
AVG(TRANSLATED) AS AvgOfTranslated format=comma12.2,
MIN(TRANSLATED) AS MinTranslated format=comma12.2,
MAX(TRANSLATED) AS MaxTranslated format=comma12.2,
COUNT(EXPERIMENTID) AS KeyCounts format=comma12.2
/*CASE
WHEN MinTranslated <= 14.5 OR MaxTranslated >= 4.5 THEN COUNT(EXPERIMENTID) AS OutCount format=comma12.2
END AS OutlierCount*/
FROM WORK.QUERY_FOR_VOLTAGE_TRANSLATED
GROUP BY EXPERIMENTID, TRIALKEY, LOCATIONID
ORDER BY TRIALKEY;
QUIT;

 

 

Thank you.

8 REPLIES 8
PaigeMiller
Diamond | Level 26

Please show us the entire log for this PROC SQL, every single line, every single character, all of it, for this PROC SQL. Please do not separate the ERRORs in the log from the code as shown in the log.

 

When you show us the log, please maintain the formatting of the log, by copying the log as text and pasting it into the window that appears when you click on the </> icon here in the SAS Community.

Insert Log Icon in SAS Communities.png

--
Paige Miller
Anthony_eng
Obsidian | Level 7
1                                                          The SAS System                          13:02 Thursday, December 16, 2021

1          ;*';*";*/;quit;run;
2          OPTIONS PAGENO=MIN;
3          %LET SYSLAST=WORK.QUERY_FOR_VOLTAGE_TRANSLATED;
4          %LET _CLIENTTASKLABEL='FindAvgTranslated';
5          %LET _CLIENTPROCESSFLOWNAME='Process Flow';
6          %LET _CLIENTPROJECTPATH='/Shared Data/Shared Projects/dVoltage.egp';
7          %LET _CLIENTPROJECTPATHHOST='';
8          %LET _CLIENTPROJECTNAME='dVoltage.egp';
9          %LET _SASPROGRAMFILE='';
10         %LET _SASPROGRAMFILEHOST='';
11         
12         ODS _ALL_ CLOSE;
13         OPTIONS DEV=SVG;
14         GOPTIONS XPIXELS=0 YPIXELS=0;
15         %macro HTML5AccessibleGraphSupported;
16             %if %_SAS_VERCOMP_FV(9,4,4, 0,0,0) >= 0 %then ACCESSIBLE_GRAPH;
17         %mend;
18         FILENAME EGHTML TEMP;
19         ODS HTML5(ID=EGHTML) FILE=EGHTML
20             OPTIONS(BITMAP_MODE='INLINE')
21             %HTML5AccessibleGraphSupported
22             ENCODING='utf-8'
23             STYLE=HtmlBlue
24             NOGTITLE
25             NOGFOOTNOTE
26             GPATH=&sasworklocation
27         ;
NOTE: Writing HTML5(EGHTML) Body file: EGHTML
28         
29         DATA ;
30         	
31         RUN;

NOTE: Compression was disabled for data set WORK.DATA1 because compression overhead would increase the size of the data set.
NOTE: The data set WORK.DATA1 has 1 observations and 0 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      

32         
33         PROC SQL;
34         	CREATE TABLE WORK.AVGTRANSLATED AS
35         	SELECT EXPERIMENTID, TRIALKEY, LOCATIONID,
36         		AVG(TRANSLATED) AS AvgOfTranslated format=comma12.2,
37         		MIN(TRANSLATED) AS MinTranslated format=comma12.2,
38         		MAX(TRANSLATED) AS MaxTranslated format=comma12.2,
39         		COUNT(EXPERIMENTID) AS KeyCounts format=comma12.2
40         		/*WHERE (MaxTranslated LE 12) OR (MinTranslated GE 14.95)*/
41         		CASE
             ____
             22
             76
ERROR 22-322: Syntax error, expecting one of the following: a quoted string, ',', AS, FORMAT, FROM, INFORMAT, INTO, LABEL, LEN, 
              LENGTH, TRANSCODE.  

ERROR 76-322: Syntax error, statement will be ignored.

2                                                          The SAS System                          13:02 Thursday, December 16, 2021

42         		WHEN MaxTranslated <= 4.5 OR MinTranslated >= 14.5 THEN COUNT(EXPERIMENTID) AS OutCount format=comma12.2
43         		END AS OutlierCount
44         	FROM WORK.QUERY_FOR_VOLTAGE_TRANSLATED
45         	GROUP BY EXPERIMENTID, TRIALKEY, LOCATIONID
46         	ORDER BY TRIALKEY;
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
47         QUIT;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      
48         
49         %LET _CLIENTTASKLABEL=;
50         %LET _CLIENTPROCESSFLOWNAME=;
51         %LET _CLIENTPROJECTPATH=;
52         %LET _CLIENTPROJECTPATHHOST=;
53         %LET _CLIENTPROJECTNAME=;
54         %LET _SASPROGRAMFILE=;
55         %LET _SASPROGRAMFILEHOST=;
56         
57         ;*';*";*/;quit;run;
58         ODS _ALL_ CLOSE;
59         
60         
61         QUIT; RUN;
62         
PaigeMiller
Diamond | Level 26

Thank you. There is a comma missing at the end of the line above the error.

 

If I may be so bold, SQL is a relatively poor choice for finding outliers. PROC UNIVARIATE (and even PROC MEANS) offers many more features that might be useful in identifying outliers.

--
Paige Miller
Anthony_eng
Obsidian | Level 7

Thank you, I will try the methods you mentioned.

PaigeMiller
Diamond | Level 26

As stated above, a better problem description would help.

 

However, you say:

 

My end goal is to depict the data using a scatter plot once isolated.

 

Scatterplot requires two variables (or more) and you have never mentioned more than one variable. So I don't really know what you want to do here.

--
Paige Miller
Reeza
Super User
PROC SQL;
CREATE TABLE WORK.AVGTRANSLATED AS
SELECT EXPERIMENTSKEY, MATCHKEY, LOCATIONID,
AVG(TRANSLATED) AS AvgOfTranslated format=comma12.2,
MIN(TRANSLATED) AS MinTranslated format=comma12.2,
MAX(TRANSLATED) AS MaxTranslated format=comma12.2,
COUNT(EXPERIMENTSKEY) AS KeyCounts format=comma12.2
sum(calculated MinTranslated <= 14.5 OR calculated MaxTranslated >= 4.5)  as OutCount format=comma12.2
FROM WORK.QUERY_FOR_VOLTAGE_TRANSLATED
GROUP BY EXPERIMENTSKEY, MATCHKEY, LOCATIONID
ORDER BY MATCHKEY;
QUIT;

 

That'll get your query working somewhat but I'm uncertain about your methodology. Why would you want your minimum/maximum values of the group to be used to identify outliers, isn't it usually at an individual level rather than at an aggregate level? 

 

EDIT: I think your other post answered why - you're looking for groups where all values are outside the range. 

I merged your questions, as they are identical. 

Anthony_eng
Obsidian | Level 7

Hello,

 

I am trying to determine the best way to isolate data that is above or below certain thresholds.  I am using SAS EG.  I am not sure on the best method to isolate the desired data.  I am trying to isolate all ExperimentSKeys where are all of the data in each is outside of the 4.5 to 14.5 range.  For example, ExperimentID number 1 has 10 pieces of data.  I want to create a table that shows this data only if all 10 observations were outside of the range.  My end goal is to depict the data using a scatter plot once isolated.  

 

PROC SQL;
	CREATE TABLE WORK.AVGTRANSLATED AS
	SELECT EXPERIMENTID, TRIALKEY, LOCATIONID,
		AVG(TRANSLATED) AS AvgOfTranslated format=comma12.2,
		MIN(TRANSLATED) AS MinTranslated format=comma12.2,
		MAX(TRANSLATED) AS MaxTranslated format=comma12.2,
		COUNT(EXPERIMENTID) AS KeyCounts format=comma12.2,
		/*WHERE (MaxTranslated LE 12) OR (MinTranslated GE 14.95)*/
		CASE   /* Should I just use an if-then statement here instead?  */
		WHEN MaxTranslated <= 4.5 OR MinTranslated >= 14.5 THEN COUNT(EXPERIMENTID) AS OutCount format=comma12.2,
		END
	FROM WORK.QUERY_FOR_VOLTAGE_TRANSLATED
	GROUP BY EXPERIMENTID, TRIALKEY, LOCATIONID
	ORDER BY TRIALKEY;
QUIT;

 

 

maguiremq
SAS Super FREQ

Please provide us with some example data. There is a macro here that will pop out a DATALINES statement with the variables configured to run for SAS users.

 

If your data are sensitive, please enter a contrived example using the DATALINES statement.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 971 views
  • 2 likes
  • 4 in conversation