BookmarkSubscribeRSS Feed
C_L
Calcite | Level 5 C_L
Calcite | Level 5

Hi,

 

I have a dataset with one column named "EAN". The data is not cleaned beforehand and I want to clean it by selecting all the valid ones I would like to keep and creating a new table for the output.

 

This EAN column is a character type column as for now and is containing all kinds of unformatted characters and numbers (like 0098652312313, 00006231, 446723, abc123def, 123abc445566, etc.). What I want to keep are the variables that contain numerical data ONLY. That means I want to EXCLUDE whatever variables that contain any characters other than numbers (like abc123def and 123abc445566 I just mentioned). The output shall thus only include numbers and created as a new table. 

 

In fact, to keep a record, I also want to keep the original table but now adding a new column "FLAG". FLAG invalid data I just excluded from my new table as 1, and flag the data I just kept in the new table as 0.

 

Then, from the new table, I want to select all EAN that have EITHER 8, 12, 13, OR 14 digits (including the 0s if there's any). For example, the data 0098652312313 is seen as a 13-digital number and shall be selected, and 446723 is a 6-digital number and shall be excluded. I don't know which function I should use to select numbers that have a certain number of digits only. Hope there's a function in SAS that you could kindly teach me to solve this problem.

 

I'm a new learner and would really appreciate a detailed answer. Thanks a lot if you can help!

 

1 REPLY 1
Kurt_Bremser
Super User

I take it that you want to select observations(rows) where the EAN column(variable) contains only digits.

Selecting variables is done with KEEP or DROP statements, or KEEP= or DROP= dataset options, and I am quite confident that is not what you have in mind.

 

You can search for characters that are not digits with the NOTDIGIT function. Use STRIP to remove trailing and leading blanks, as these are also considered non-digits.

 

So the first step will look like this:

data
  valid
  invalid
;
set have;
if notdigit(strip(ean)) /* a value of 0 (false) means no non-digits were found */
then output invalid;
else output valid;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 2481 views
  • 0 likes
  • 2 in conversation