BookmarkSubscribeRSS Feed
Fable
Calcite | Level 5

Despite me specifying the "message" option in the IC CREATE statement, SAS fails to print the custom error message to the log. Here's my code (note that the red x is an intentional error so that the integrity constraint is triggered):

data exampledata;
  length ID $8 DATE 4 RECORD_QUALITY $1;
  input ID $ DATE :yymmdd10. RECORD_QUALITY $;
  format DATE yymmddd10.;
datalines;
11111111  2019-04-25  E
22222222  2021-09-16  E
33333333  2021-02-12  F 
44444444  2020-05-02  X
55555555  2019-11-21  P
;

proc datasets;
  modify exampledata;
    ic create id_notnull = not null(ID)
       message = 'ID must not be missing';
    ic create id_unique = unique(ID)
       message = 'Table must contain one record per ID';
    ic create date_check = check(where=(DATE >= '01jan2019'd and DATE <= '31dec2021'd))
       message = 'DATE is outside the expected range of 2019-01-01 to 2021-12-31';
    ic create record_quality_check = check(where=(RECORD_QUALITY in ('E','F','P')))
       message = 'Valid values for RECORD_QUALITY are either E, F, or P'; 
quit;

And here's what I get in the log:

Fable_0-1701375057354.png

Why is SAS not printing my custom error message?

 

 

10 REPLIES 10
ballardw
Super User

Once the "bad" data is in the data set the constraint is basically invalid, as the error message shows. It would require the removal of either the value or the observation and that is not the role that proc datasets plays.

 

If you create a clean data set that will allow the constraint and then attempt to add values that violate the constraint then the message will appear:(which is I believe the intent of these constraints)

data exampledata;
  length ID $8 DATE 4 RECORD_QUALITY $1;
  input ID $ DATE :yymmdd10. RECORD_QUALITY $;
  format DATE yymmddd10.;
datalines;
11111111  2019-04-25  E
22222222  2021-09-16  E
33333333  2021-02-12  F 
55555555  2019-11-21  P
;

data otherdata;
  length ID $8 DATE 4 RECORD_QUALITY $1;
  input ID $ DATE :yymmdd10. RECORD_QUALITY $;
  format DATE yymmddd10.;
datalines;
44444444  2020-05-02  X
;



proc datasets;
  modify exampledata;
    ic create id_notnull = not null(ID)
       message = 'ID must not be missing';
    ic create id_unique = unique(ID)
       message = 'Table must contain one record per ID';
    ic create date_check = check(where=(DATE >= '01jan2019'd and DATE <= '31dec2021'd))
       message = 'DATE is outside the expected range of 2019-01-01 to 2021-12-31';
    ic create record_quality_check = check(where=(RECORD_QUALITY in ('E','F','P')))
       message = 'Valid values for RECORD_QUALITY are either E, F, or P'; 
quit;

proc append base=exampledata data=otherdata;
run;

Which is more inline with the documentation of Proc Datasets in the definition of constraints:

Integrity constraints are a set of data validation rules that you can specify in order to restrict the data values that can be stored for a variable in a SAS data file.
 
If this is not sufficient for your purpose you may have to provide some more details as to exactly what you need.
Fable
Calcite | Level 5

Hi, thank you for you reply. What you are saying here makes sense, but my issue is I was hoping to use these integrity constraints to basically validate a dataset after it is created. For example, I send SAS code to a colleague that performs a Proc SQL pull from a database (what the datalines step is a stand in for in my example), I was hoping that after the pull is performed, the proc dataset statement could be used to then check the pulled data to make sure it is all correct. From the SAS 9.4 documentation I thought this would be a valid use of it, but it seems not? 

 

"Before a constraint is added to a table that contains existing data, all the data is checked to determine that it satisfies the constraints."

I know that these integrity constants can also be created using Proc SQL, but I was hoping to provide examples of both the Proc Dataset and Proc SQL methods, since this is for a training. 

ballardw
Super User

Create an empty set, apply the constraints to that set then append the data you want to check;

See if this comes close to what you need:

data exampledata;
  length ID $8 DATE 4 RECORD_QUALITY $1;
  input ID $ DATE :yymmdd10. RECORD_QUALITY $;
  format DATE yymmddd10.;
datalines;
11111111  2019-04-25  E
22222222  2021-09-16  E
33333333  2021-02-12  F 
44444444  2020-05-02  X
55555555  2019-11-21  P
;
/* the Obs=0 means that set Pattern
will have variable names and properties the same but no values
*/
data pattern;
   set exampledata (obs=0);
run;

proc datasets;
  modify pattern;
    ic create id_notnull = not null(ID)
       message = 'ID must not be missing';
    ic create id_unique = unique(ID)
       message = 'Table must contain one record per ID';
    ic create date_check = check(where=(DATE >= '01jan2019'd and DATE <= '31dec2021'd))
       message = 'DATE is outside the expected range of 2019-01-01 to 2021-12-31';
    ic create record_quality_check = check(where=(RECORD_QUALITY in ('E','F','P')))
       message = 'Valid values for RECORD_QUALITY are either E, F, or P'; 
quit;

proc append base=pattern data=exampledata;
run;
Fable
Calcite | Level 5

Hi, this is great thank you! I was actually just coming up with something similar (though I was using an empty datalines statement instead, I like your version better as it doesn't require repeating the column formatting twice). Follow-up question: is there anyway to make integrity constraints return an ERROR in the log, rather than a WARNING as it currently does?

yabwon
Onyx | Level 15

I didn't find option how to turn on Error. But I observed 2 things:

 

1) the warning is not true warning:

yabwon_0-1701381828015.png

 

2) if you need just the "ERROR:" word at the beginning of the line you could "trick" SAS like this (notice LS=64):

options ls=64;
proc datasets;
  modify pattern;
    ic create id_notnull = not null(ID)
       message = 'ID must not be missing';
    ic create id_unique = unique(ID)
       message = 'Table must contain one record per ID';
    ic create date_check = check(where=(DATE >= '01jan2019'd and DATE <= '31dec2021'd))
       message = 'DATE is outside the expected range of 2019-01-01 to 2021-12-31';
    ic create record_quality_check = check(where=(RECORD_QUALITY in ('E','F','P')))
       message = '************************************************** ERROR: Valid values for RECORD_QUALITY are either E, F, or P.' MSGTYPE=USER;
    run;
quit;

but as I wrote it's still not a real error:

yabwon_1-1701382054899.png

:-):-)

 

 

Bart

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Fable
Calcite | Level 5

Oh thank you for this, it is so strange it does not generate a true warning, what is SAS thinking with that?

Patrick
Opal | Level 21

I normally don't bother with constraints for SAS tables not the least because when loading into such a table and there is a violation then the record gets rejected but not even a warning condition gets created. 

There was in the last years one occasion where I did bother because the integrity constraints where part of the input data model of a SAS solution. 

 

In our case the input data model were SAS tables but they could also have been DB tables where constraints do make sense (like a PK constraint). 

The solution provided a DDL to create the tables. 

 

The approach I've taken:

1. Execute the solution DDL to create empty table structures

2. Using Proc Datasets add an audit trail to the tables to capture any rejected records

3. Load the tables (proc append).

4. If any data in the error audit trail export to an error table (err_<table name>)

5. Delete the audit trail (=restore table back exactly as per original DDL)

6. Check if there are any error tables (err_<table name>). 

 - if error tables found create error report and stop further processing

 - if no error tables found move data tables to solution input area

 

Above approach sounds more complicated than it was to design and implement (using some dynamic code of course). 

The advantage of above approach was that I didn't have to write code to check the data prior to loading and that any changes to the DDL (i.e. added column in later versions with more constraints like NOT NULL) get automatically reflected.

Tom
Super User Tom
Super User

Why would you work so hard to use a complicated tool like that to do a simple job?

data _null_;
  set pattern;
  by id_unique;
  if missing(id_unique) then put 'ERROR: ID must not be missing. ' _n_= ;
  if not (first.id_unique and last.id_unique) then
   put 'ERROR: Duplicate ID. ' _n_= id_unique= ;
  if not ('01JAN2019'd <= date <= '21DEC2021'd) then
   put 'ERROR: DATE is outside the expected range of 2019-01-01 to 2021-12-31. ' _n_= id_unique= date=;
  if not RECORD_QUALITY in ('E' 'F' 'P')) then 
   put 'ERROR: Valid values for RECORD_QUALITY are either E, F, or P. ' _n_= id_unique= record_quality=; 
run;
Fable
Calcite | Level 5

Oh gosh thank you so much, I hadn't thought about doing it this way as I was focused on the integrity constraints method my coworker had shared with me and the corresponding SAS documentation. Thank you! I wonder if you can explain why integrity constraints even exist in SAS if this method is also available? 

 

Edit: Actually just tried it and I get errors, where are you getting id_unique from, it is not a column?

Tom
Super User Tom
Super User

@Fable wrote:

Oh gosh thank you so much, I hadn't thought about doing it this way as I was focused on the integrity constraints method my coworker had shared with me and the corresponding SAS documentation. Thank you! I wonder if you can explain why integrity constraints even exist in SAS if this method is also available? 

 

Edit: Actually just tried it and I get errors, where are you getting id_unique from, it is not a column?


Use the variables names that exist in your dataset.  I suspect I accidentally copied a name you were using to name the integrity constraint instead of the name of the variable you were trying use the constraint to test.

 

IC's are intended to prevent values that would make the database confusing or inaccurate from entering the database.  Which is a different purpose than cleaning data that you already have in a SAS dataset.

 

I have no idea why anyone would use IC in SAS.  If they really need that level of sophistication in their database then they should probably be storing the data in an actual DBMS and just connect to that with SAS to run their analyses.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1103 views
  • 6 likes
  • 5 in conversation