Help using Base SAS procedures

Remove invalid characters

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 90
Accepted Solution

Remove invalid characters

Hi,

I have a dataset with vendor number containing invalid characters.I would not select any vendor numbers that contain characters other than A-Z,0-9 or dash(-).We can use compress function, but not sure what are the invalid characters in the data.

Example:

data test;
input vendor ;
cards;
111948722-070Ž
1119789^78A
789567890
908765TYR
;
RUN;

REQUIRED OUTPUT :

Vendor

789567890

908765TYR

Please let me know.

Thanks in Advance.


Accepted Solutions
Solution
‎12-08-2011 11:38 AM
PROC Star
Posts: 7,363

Re: Remove invalid characters

Sort of like FE's suggestion, but because 'a' will accept non-English characters, I'd suggest the following to limit it to only English characters.  However, this assumes that underscores are also valid for your purpose. Otherwise, one additional check would be needed:

data test;

  informat vendor $30.;

  input vendor &;

  if compress(vendor,'-','dfk') eq vendor;

cards;

111948722-070Ž

1119789^78A

789567890

908765TYR

;

View solution in original post


All Replies
Trusted Advisor
Posts: 1,300

Remove invalid characters

compress(vendor,,'adk');

Solution
‎12-08-2011 11:38 AM
PROC Star
Posts: 7,363

Re: Remove invalid characters

Sort of like FE's suggestion, but because 'a' will accept non-English characters, I'd suggest the following to limit it to only English characters.  However, this assumes that underscores are also valid for your purpose. Otherwise, one additional check would be needed:

data test;

  informat vendor $30.;

  input vendor &;

  if compress(vendor,'-','dfk') eq vendor;

cards;

111948722-070Ž

1119789^78A

789567890

908765TYR

;

Trusted Advisor
Posts: 1,300

Remove invalid characters

data foo;

informat vendor $30.;

input vendor & $30.;

if notalnum(strip(vendor))=0;

cards;

111948722-070Ž

1119789^78A

789567890

908765TYR

;

run;

PROC Star
Posts: 7,363

Remove invalid characters

FE, That wouldn't correctly handle an entry like 908765-TYR

Trusted Advisor
Posts: 1,300

Remove invalid characters

Art,

I tested and it works for me?

NOTALNUM should provide value >0 for any character that is not a letter or digit.

data foo;

informat vendor $30.;

input vendor & $30.;

if notalnum(strip(vendor))=0;

cards;

111948722-070Ž

1119789^78A

789567890

908765TYR

908765-TYR

;

run;

789567890

908765TYR

PROC Star
Posts: 7,363

Remove invalid characters

The OP considered a dash as a valid character.

Trusted Advisor
Posts: 1,300

Remove invalid characters

That's what I get for skimming the post.

Super User
Posts: 9,676

Remove invalid characters

data test;
  informat vendor $30.;
  input vendor &;
  if not findc(strip(vendor),'-','duk');
cards;
111948722-070^
1119789^78A
789567890
908765TYR
;
run;



Ksharp

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 914 views
  • 3 likes
  • 4 in conversation