BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
michan22
Quartz | Level 8

Hi guys,

 

I was wondering if I can use SAS to detect any data entry errors, such as entering characters in numerical field, numbers in date fields, or entering something not in a certain format (e.g. xx/xxx/xx instead of just xx/xxx). Right now I just use proc print to see if anything looks out of place, but if the dataset becomes large then an automated way would be nicer.

Any advice or where I can find some clue would be greatly appreciated!

Thank you. 

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

Yes, SAS is great for finding invalid data.  Look into all the SAS functions. 

 

For example, imagine you have a character variable that should only have letters in it, you can check that with:

 

data want;
  input var $3.;
  if notalpha(var) then put "Invalid value " var=;
  cards;
AAA
B1B
CCC
;
run;

 

Ron Cody wrote an excellent book on using SAS for data cleaning.  Thir edition just came out.

https://www.sas.com/store/books/categories/usage-and-reference/cody-s-data-cleaning-techniques-using...

The Boston Area SAS Users Group is hosting free webinars!
Next up: Bart Jablonski and I present 53 (+3) ways to do a table lookup on Wednesday Sep 18.
Register now at https://www.basug.org/events.

View solution in original post

5 REPLIES 5
Quentin
Super User

Yes, SAS is great for finding invalid data.  Look into all the SAS functions. 

 

For example, imagine you have a character variable that should only have letters in it, you can check that with:

 

data want;
  input var $3.;
  if notalpha(var) then put "Invalid value " var=;
  cards;
AAA
B1B
CCC
;
run;

 

Ron Cody wrote an excellent book on using SAS for data cleaning.  Thir edition just came out.

https://www.sas.com/store/books/categories/usage-and-reference/cody-s-data-cleaning-techniques-using...

The Boston Area SAS Users Group is hosting free webinars!
Next up: Bart Jablonski and I present 53 (+3) ways to do a table lookup on Wednesday Sep 18.
Register now at https://www.basug.org/events.
michan22
Quartz | Level 8

Thank you for the advice!

I found a few chapters of the second edition online, will try to read it and see what I can find!

Astounding
PROC Star

Here's the method I like for detecting invalid data when a character field is supposed to contain a valid number:

 

if (varname > ' ') and input(varname, ?? 12.) = .;

kiranv_
Rhodochrosite | Level 12

Instead of doing corrections at later stage it is good to have Integrity constaints up front. So that data entry errors even do not happen. It is very difficult to analyse data entry errors and becomes really cumbersome after a point. 

 

In my previous project we used to have regex table(a lookup table) with valid values for a particular column. If the values from your table matches to that of lookup table then it marked as good record with a flag value 'Y' else it was marked as a bad record with a Flag value 'N'. This helps your team to understand what are good records and what are bad reords.

  This was handled within a macro, so  similar tables could be tested in production.

For this things to have you need to define your business rules too(i.e. what does a bad record mean).

 

Link regarding integrity constraints

https://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000403555.htm

ballardw
Super User

In addition to @kiranv_ I routinely create custom informats to trap many classes of invalid data: Range of numerics, specific values of numeric or character values and even turn text such as Yes, No Null into 1/0/. .

This works best when you read external data with a data step where you assign the informats. If you have an informat with an other = _error_ for the out of range responses the LOG will show the errors (unless you have so many that you exceed the number of displayed errors your system is set to display).

The _error_ will also result in a missing value in the data set.

 

This approach does indicate either a willingness to correct errors manually in the input data and re-read or leave missing data for unexpected/ invalid values.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1084 views
  • 3 likes
  • 5 in conversation