Hello friends,
I run the following query in order to calculate max for each group.
Please note that the raw data set is very big (500 million rows).
First question:
I ask to calculate max only for group that is not null or zero.
In resulted data set I see one row with value null (for field Agreement_Account_Id)-WHY???
Second question:
Here are warnings I recieve:
NOTE: SQL generation will be used to perform the initial summarization.
WARNING: The format F was not located on the database. In-database processing will proceed without it.
WARNING: Syntax error or access violation Syntax error: expected something between '(' and the string 'F'.
WARNING: The format F was not located on the database. In-database processing will proceed without it.
What should I do to prevent these warnings?Why these warnings exist?
proc summary data=TeraData.VBM410_ACTIVE_LOAN_BAL(Where=(Agreement_Account_Id not in (0,.)));
var Fund_Actual_Payment_Count;
class Agreement_Account_Id;
output out=VBM410_ACTIVE_LOAN_BAL max=Mis_Tash_Paid;
run;
First question: look at a part of your data WITH YOUR OWN EYES and see if you can figure out why you only get one row in the output. Please look at the log and see what it is telling you.
Second question: please learn that we can't help you at all if you show us a partial log, where we cannot see the code as it appears in the log.
It really looks like you are accessing an external DBMS table. Which means that perhaps it doesn't understand what your intent was when passing the . for missing to it.
Also, does Teradata (have guess this is the DBMS as you haven't stated) support formats natively? The warning about the F format not found sounds likely to be a pretty fundamental problem especially with this
Syntax error: expected something between '(' and the string 'F'
as there is no place that you are passing a string F after ( so that may be the symptom from the
Agreement_Account_Id not in (0,.)
Just how many unique values of Agreement_Account_Id
are there? 500 million "rows" may be a bit much for Class processing and may want to be sorted and use By processing instead.
In a RDMS, null is different than in SAS.
Are you using a Viya installation or some other type that will pass a proc summary to SQL summarization? I cannot recall ever seeing that note for a proc summary/means before.
@Ronein wrote:
Hello friends,
I run the following query in order to calculate max for each group.
Please note that the raw data set is very big (500 million rows).
First question:
I ask to calculate max only for group that is not null or zero.
In resulted data set I see one row with value null (for field
Agreement_Account_Id)-WHY???
Second question:
Here are warnings I recieve:
NOTE: SQL generation will be used to perform the initial summarization.
WARNING: The format F was not located on the database. In-database processing will proceed without it.
WARNING: Syntax error or access violation Syntax error: expected something between '(' and the string 'F'.
WARNING: The format F was not located on the database. In-database processing will proceed without it.
What should I do to prevent these warnings?Why these warnings exist?
proc summary data=TeraData.VBM410_ACTIVE_LOAN_BAL(Where=(Agreement_Account_Id not in (0,.))); var Fund_Actual_Payment_Count; class Agreement_Account_Id; output out=VBM410_ACTIVE_LOAN_BAL max=Mis_Tash_Paid; run;
I strongly believe that no database designer worth their money will store an ID as a number. So I guess your WHERE is faulty.
You should be able to replace the CLASS statement with a BY statement.
The only difference in the output would be that the CLASS statement without the NWAY option on the PROC SUMMARY statement will include and extra observation with the overall maximum. If you need it you can generate it by taking the MAX of the resulting summary dataset.
So see if that fixes your issue.
You might also want to just move the WHERE filtering to apply to the output instead of the input. There might be a performance hit on the generated Teradata query to summarize those two extra groups, but it should be simple to remove them from the resulting output dataset instead.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.