Despite me specifying the "message" option in the IC CREATE statement, SAS fails to print the custom error message to the log. Here's my code (note that the red x is an intentional error so that the integrity constraint is triggered):
data exampledata; length ID $8 DATE 4 RECORD_QUALITY $1; input ID $ DATE :yymmdd10. RECORD_QUALITY $; format DATE yymmddd10.; datalines; 11111111 2019-04-25 E 22222222 2021-09-16 E 33333333 2021-02-12 F 44444444 2020-05-02 X 55555555 2019-11-21 P ; proc datasets; modify exampledata; ic create id_notnull = not null(ID) message = 'ID must not be missing'; ic create id_unique = unique(ID) message = 'Table must contain one record per ID'; ic create date_check = check(where=(DATE >= '01jan2019'd and DATE <= '31dec2021'd)) message = 'DATE is outside the expected range of 2019-01-01 to 2021-12-31'; ic create record_quality_check = check(where=(RECORD_QUALITY in ('E','F','P'))) message = 'Valid values for RECORD_QUALITY are either E, F, or P'; quit;
And here's what I get in the log:
Why is SAS not printing my custom error message?
Once the "bad" data is in the data set the constraint is basically invalid, as the error message shows. It would require the removal of either the value or the observation and that is not the role that proc datasets plays.
If you create a clean data set that will allow the constraint and then attempt to add values that violate the constraint then the message will appear:(which is I believe the intent of these constraints)
data exampledata; length ID $8 DATE 4 RECORD_QUALITY $1; input ID $ DATE :yymmdd10. RECORD_QUALITY $; format DATE yymmddd10.; datalines; 11111111 2019-04-25 E 22222222 2021-09-16 E 33333333 2021-02-12 F 55555555 2019-11-21 P ; data otherdata; length ID $8 DATE 4 RECORD_QUALITY $1; input ID $ DATE :yymmdd10. RECORD_QUALITY $; format DATE yymmddd10.; datalines; 44444444 2020-05-02 X ; proc datasets; modify exampledata; ic create id_notnull = not null(ID) message = 'ID must not be missing'; ic create id_unique = unique(ID) message = 'Table must contain one record per ID'; ic create date_check = check(where=(DATE >= '01jan2019'd and DATE <= '31dec2021'd)) message = 'DATE is outside the expected range of 2019-01-01 to 2021-12-31'; ic create record_quality_check = check(where=(RECORD_QUALITY in ('E','F','P'))) message = 'Valid values for RECORD_QUALITY are either E, F, or P'; quit; proc append base=exampledata data=otherdata; run;
Which is more inline with the documentation of Proc Datasets in the definition of constraints:
Hi, thank you for you reply. What you are saying here makes sense, but my issue is I was hoping to use these integrity constraints to basically validate a dataset after it is created. For example, I send SAS code to a colleague that performs a Proc SQL pull from a database (what the datalines step is a stand in for in my example), I was hoping that after the pull is performed, the proc dataset statement could be used to then check the pulled data to make sure it is all correct. From the SAS 9.4 documentation I thought this would be a valid use of it, but it seems not?
"Before a constraint is added to a table that contains existing data, all the data is checked to determine that it satisfies the constraints."
I know that these integrity constants can also be created using Proc SQL, but I was hoping to provide examples of both the Proc Dataset and Proc SQL methods, since this is for a training.
Create an empty set, apply the constraints to that set then append the data you want to check;
See if this comes close to what you need:
data exampledata; length ID $8 DATE 4 RECORD_QUALITY $1; input ID $ DATE :yymmdd10. RECORD_QUALITY $; format DATE yymmddd10.; datalines; 11111111 2019-04-25 E 22222222 2021-09-16 E 33333333 2021-02-12 F 44444444 2020-05-02 X 55555555 2019-11-21 P ; /* the Obs=0 means that set Pattern will have variable names and properties the same but no values */ data pattern; set exampledata (obs=0); run; proc datasets; modify pattern; ic create id_notnull = not null(ID) message = 'ID must not be missing'; ic create id_unique = unique(ID) message = 'Table must contain one record per ID'; ic create date_check = check(where=(DATE >= '01jan2019'd and DATE <= '31dec2021'd)) message = 'DATE is outside the expected range of 2019-01-01 to 2021-12-31'; ic create record_quality_check = check(where=(RECORD_QUALITY in ('E','F','P'))) message = 'Valid values for RECORD_QUALITY are either E, F, or P'; quit; proc append base=pattern data=exampledata; run;
Hi, this is great thank you! I was actually just coming up with something similar (though I was using an empty datalines statement instead, I like your version better as it doesn't require repeating the column formatting twice). Follow-up question: is there anyway to make integrity constraints return an ERROR in the log, rather than a WARNING as it currently does?
I didn't find option how to turn on Error. But I observed 2 things:
1) the warning is not true warning:
2) if you need just the "ERROR:" word at the beginning of the line you could "trick" SAS like this (notice LS=64):
options ls=64;
proc datasets;
modify pattern;
ic create id_notnull = not null(ID)
message = 'ID must not be missing';
ic create id_unique = unique(ID)
message = 'Table must contain one record per ID';
ic create date_check = check(where=(DATE >= '01jan2019'd and DATE <= '31dec2021'd))
message = 'DATE is outside the expected range of 2019-01-01 to 2021-12-31';
ic create record_quality_check = check(where=(RECORD_QUALITY in ('E','F','P')))
message = '************************************************** ERROR: Valid values for RECORD_QUALITY are either E, F, or P.' MSGTYPE=USER;
run;
quit;
but as I wrote it's still not a real error:
:-):-)
Bart
Oh thank you for this, it is so strange it does not generate a true warning, what is SAS thinking with that?
I normally don't bother with constraints for SAS tables not the least because when loading into such a table and there is a violation then the record gets rejected but not even a warning condition gets created.
There was in the last years one occasion where I did bother because the integrity constraints where part of the input data model of a SAS solution.
In our case the input data model were SAS tables but they could also have been DB tables where constraints do make sense (like a PK constraint).
The solution provided a DDL to create the tables.
The approach I've taken:
1. Execute the solution DDL to create empty table structures
2. Using Proc Datasets add an audit trail to the tables to capture any rejected records
3. Load the tables (proc append).
4. If any data in the error audit trail export to an error table (err_<table name>)
5. Delete the audit trail (=restore table back exactly as per original DDL)
6. Check if there are any error tables (err_<table name>).
- if error tables found create error report and stop further processing
- if no error tables found move data tables to solution input area
Above approach sounds more complicated than it was to design and implement (using some dynamic code of course).
The advantage of above approach was that I didn't have to write code to check the data prior to loading and that any changes to the DDL (i.e. added column in later versions with more constraints like NOT NULL) get automatically reflected.
Why would you work so hard to use a complicated tool like that to do a simple job?
data _null_;
set pattern;
by id_unique;
if missing(id_unique) then put 'ERROR: ID must not be missing. ' _n_= ;
if not (first.id_unique and last.id_unique) then
put 'ERROR: Duplicate ID. ' _n_= id_unique= ;
if not ('01JAN2019'd <= date <= '21DEC2021'd) then
put 'ERROR: DATE is outside the expected range of 2019-01-01 to 2021-12-31. ' _n_= id_unique= date=;
if not RECORD_QUALITY in ('E' 'F' 'P')) then
put 'ERROR: Valid values for RECORD_QUALITY are either E, F, or P. ' _n_= id_unique= record_quality=;
run;
Oh gosh thank you so much, I hadn't thought about doing it this way as I was focused on the integrity constraints method my coworker had shared with me and the corresponding SAS documentation. Thank you! I wonder if you can explain why integrity constraints even exist in SAS if this method is also available?
Edit: Actually just tried it and I get errors, where are you getting id_unique from, it is not a column?
@Fable wrote:
Oh gosh thank you so much, I hadn't thought about doing it this way as I was focused on the integrity constraints method my coworker had shared with me and the corresponding SAS documentation. Thank you! I wonder if you can explain why integrity constraints even exist in SAS if this method is also available?
Edit: Actually just tried it and I get errors, where are you getting id_unique from, it is not a column?
Use the variables names that exist in your dataset. I suspect I accidentally copied a name you were using to name the integrity constraint instead of the name of the variable you were trying use the constraint to test.
IC's are intended to prevent values that would make the database confusing or inaccurate from entering the database. Which is a different purpose than cleaning data that you already have in a SAS dataset.
I have no idea why anyone would use IC in SAS. If they really need that level of sophistication in their database then they should probably be storing the data in an actual DBMS and just connect to that with SAS to run their analyses.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.