BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
michan22
Quartz | Level 8

Hi guys,

 

I was wondering if I can use SAS to detect any data entry errors, such as entering characters in numerical field, numbers in date fields, or entering something not in a certain format (e.g. xx/xxx/xx instead of just xx/xxx). Right now I just use proc print to see if anything looks out of place, but if the dataset becomes large then an automated way would be nicer.

Any advice or where I can find some clue would be greatly appreciated!

Thank you. 

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

Yes, SAS is great for finding invalid data.  Look into all the SAS functions. 

 

For example, imagine you have a character variable that should only have letters in it, you can check that with:

 

data want;
  input var $3.;
  if notalpha(var) then put "Invalid value " var=;
  cards;
AAA
B1B
CCC
;
run;

 

Ron Cody wrote an excellent book on using SAS for data cleaning.  Thir edition just came out.

https://www.sas.com/store/books/categories/usage-and-reference/cody-s-data-cleaning-techniques-using...

BASUG is hosting free webinars Next up: Mike Sale presenting Data Warehousing with SAS April 10 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

View solution in original post

5 REPLIES 5
Quentin
Super User

Yes, SAS is great for finding invalid data.  Look into all the SAS functions. 

 

For example, imagine you have a character variable that should only have letters in it, you can check that with:

 

data want;
  input var $3.;
  if notalpha(var) then put "Invalid value " var=;
  cards;
AAA
B1B
CCC
;
run;

 

Ron Cody wrote an excellent book on using SAS for data cleaning.  Thir edition just came out.

https://www.sas.com/store/books/categories/usage-and-reference/cody-s-data-cleaning-techniques-using...

BASUG is hosting free webinars Next up: Mike Sale presenting Data Warehousing with SAS April 10 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
michan22
Quartz | Level 8

Thank you for the advice!

I found a few chapters of the second edition online, will try to read it and see what I can find!

Astounding
PROC Star

Here's the method I like for detecting invalid data when a character field is supposed to contain a valid number:

 

if (varname > ' ') and input(varname, ?? 12.) = .;

kiranv_
Rhodochrosite | Level 12

Instead of doing corrections at later stage it is good to have Integrity constaints up front. So that data entry errors even do not happen. It is very difficult to analyse data entry errors and becomes really cumbersome after a point. 

 

In my previous project we used to have regex table(a lookup table) with valid values for a particular column. If the values from your table matches to that of lookup table then it marked as good record with a flag value 'Y' else it was marked as a bad record with a Flag value 'N'. This helps your team to understand what are good records and what are bad reords.

  This was handled within a macro, so  similar tables could be tested in production.

For this things to have you need to define your business rules too(i.e. what does a bad record mean).

 

Link regarding integrity constraints

https://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000403555.htm

ballardw
Super User

In addition to @kiranv_ I routinely create custom informats to trap many classes of invalid data: Range of numerics, specific values of numeric or character values and even turn text such as Yes, No Null into 1/0/. .

This works best when you read external data with a data step where you assign the informats. If you have an informat with an other = _error_ for the out of range responses the LOG will show the errors (unless you have so many that you exceed the number of displayed errors your system is set to display).

The _error_ will also result in a missing value in the data set.

 

This approach does indicate either a willingness to correct errors manually in the input data and re-read or leave missing data for unexpected/ invalid values.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 795 views
  • 3 likes
  • 5 in conversation