Use SAS to detect data entry error

Accepted Solution Solved
Reply
Contributor
Posts: 54
Accepted Solution

Use SAS to detect data entry error

Hi guys,

 

I was wondering if I can use SAS to detect any data entry errors, such as entering characters in numerical field, numbers in date fields, or entering something not in a certain format (e.g. xx/xxx/xx instead of just xx/xxx). Right now I just use proc print to see if anything looks out of place, but if the dataset becomes large then an automated way would be nicer.

Any advice or where I can find some clue would be greatly appreciated!

Thank you. 


Accepted Solutions
Solution
‎06-14-2017 05:50 PM
PROC Star
Posts: 1,291

Re: Use SAS to detect data entry error

Yes, SAS is great for finding invalid data.  Look into all the SAS functions. 

 

For example, imagine you have a character variable that should only have letters in it, you can check that with:

 

data want;
  input var $3.;
  if notalpha(var) then put "Invalid value " var=;
  cards;
AAA
B1B
CCC
;
run;

 

Ron Cody wrote an excellent book on using SAS for data cleaning.  Thir edition just came out.

https://www.sas.com/store/books/categories/usage-and-reference/cody-s-data-cleaning-techniques-using...

View solution in original post


All Replies
Solution
‎06-14-2017 05:50 PM
PROC Star
Posts: 1,291

Re: Use SAS to detect data entry error

Yes, SAS is great for finding invalid data.  Look into all the SAS functions. 

 

For example, imagine you have a character variable that should only have letters in it, you can check that with:

 

data want;
  input var $3.;
  if notalpha(var) then put "Invalid value " var=;
  cards;
AAA
B1B
CCC
;
run;

 

Ron Cody wrote an excellent book on using SAS for data cleaning.  Thir edition just came out.

https://www.sas.com/store/books/categories/usage-and-reference/cody-s-data-cleaning-techniques-using...

Contributor
Posts: 54

Re: Use SAS to detect data entry error

Thank you for the advice!

I found a few chapters of the second edition online, will try to read it and see what I can find!

Super User
Posts: 5,366

Re: Use SAS to detect data entry error

Here's the method I like for detecting invalid data when a character field is supposed to contain a valid number:

 

if (varname > ' ') and input(varname, ?? 12.) = .;

PROC Star
Posts: 283

Re: Use SAS to detect data entry error

Instead of doing corrections at later stage it is good to have Integrity constaints up front. So that data entry errors even do not happen. It is very difficult to analyse data entry errors and becomes really cumbersome after a point. 

 

In my previous project we used to have regex table(a lookup table) with valid values for a particular column. If the values from your table matches to that of lookup table then it marked as good record with a flag value 'Y' else it was marked as a bad record with a Flag value 'N'. This helps your team to understand what are good records and what are bad reords.

  This was handled within a macro, so  similar tables could be tested in production.

For this things to have you need to define your business rules too(i.e. what does a bad record mean).

 

Link regarding integrity constraints

https://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000403555.htm

Super User
Posts: 11,134

Re: Use SAS to detect data entry error

In addition to @kiranv_ I routinely create custom informats to trap many classes of invalid data: Range of numerics, specific values of numeric or character values and even turn text such as Yes, No Null into 1/0/. .

This works best when you read external data with a data step where you assign the informats. If you have an informat with an other = _error_ for the out of range responses the LOG will show the errors (unless you have so many that you exceed the number of displayed errors your system is set to display).

The _error_ will also result in a missing value in the data set.

 

This approach does indicate either a willingness to correct errors manually in the input data and re-read or leave missing data for unexpected/ invalid values.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 125 views
  • 3 likes
  • 5 in conversation