DATA Step, Macro, Functions and more

Identifying not alphanumeric characters in a string

Reply
Contributor
Posts: 31

Identifying not alphanumeric characters in a string

[ Edited ]

Count and display the number, type and position of special character present in the sample string (space is an exception hence it can be ignored).

 

Create 3 additional variable

  1. Number - to display the unique number of special character present in the string
  2. Type – Print each special character in the order they appear in the string
  3. Position – Print all the position number in which each special character is present

 

See first observation for example

 

String

Number

Type

Position

S%pecial C*haract`er

3

% * !

2, 10, 17

No Special Character

 

 

 

This li`st of

 

 

 

random! words

 

 

 

will he@lp spark

 

 

 

your creative

 

 

 

@imagination if you're

 

 

 

$looking to think

 

 

 

%up a d$omain n%ame,

 

 

 

a ban%d name,

 

 

 

a %project name

 

 

 

or wh%atever.

 

 

 

This %tool uses

 

 

 

thousands of

 

 

 

hand%picked

 

 

 

r%andom nouns

 

 

 

and ver!bs to max^im&ize

 

 

 

t%he possibility

 

 

 

of creating interesting

 

 

 

!b@r#a$i%n^s&t*o*r(m)i-n*g/ -i=d~e#a$s^.

 

 

 

 

 

I tried answering this question, but have been facing a lot of issues.....

 

This has been my code so far:

data one;
length String $100.;
infile 'new1.txt' dlm='09'x;
input String$;
run;
Data two;
set one;
l=length(String);
array a{40} _temporary_; array Position{40} P1 - P40;
do i=1 to l;
a{i}=anypunct(String);
if a{i}=0 then Position{i}=compress(put(i,best.));
end;
do x=1 to 39;
Pos1=Compress(put(Position{x},best.)||", "||put(Position{x+1},best.));

Pos=Compress(Pos1 ||", ");
end;
run;
proc print data=two;
run;

 

Please advice........

Super User
Posts: 23,332

Re: Identifying not alphanumeric characters in a string

I don't have time to finish this up, but this gets you a bit closer:

 

data want;
	set have;
	len_word=length(rText);
	length position symbols $50.;
	position='';
	symbols='';
	count=0;

	do i=1 to len_word;
		entry=char(rText, i);

		if notalnum(entry) and entry ne ' ' then
			do;
				position=catx(',', strip(position), put(i, 8. -l));
				symbols=catx(',', strip(symbols), entry);
				count+1;
			end;
	end;
run;

Note that you could probably make this more efficient by using the NOTALNUM on the string (rather than character) to identify non alphanumeric characters though it still identifies spaces. 

 


@DOBBINHO wrote:

Count and display the number, type and position of special character present in the sample string (space is an exception hence it can be ignored).

 

Create 3 additional variable

  1. Number - to display the unique number of special character present in the string
  2. Type – Print each special character in the order they appear in the string
  3. Position – Print all the position number in which each special character is present

 

See first observation for example

 

String

Number

Type

Position

S%pecial C*haract`er

3

% * !

2, 10, 17

No Special Character

 

 

 

This li`st of

 

 

 

random! words

 

 

 

will he@lp spark

 

 

 

your creative

 

 

 

@imagination if you're

 

 

 

$looking to think

 

 

 

%up a d$omain n%ame,

 

 

 

a ban%d name,

 

 

 

a %project name

 

 

 

or wh%atever.

 

 

 

This %tool uses

 

 

 

thousands of

 

 

 

hand%picked

 

 

 

r%andom nouns

 

 

 

and ver!bs to max^im&ize

 

 

 

t%he possibility

 

 

 

of creating interesting

 

 

 

!b@r#a$i%n^s&t*o*r(m)i-n*g/ -i=d~e#a$s^.

 

 

 

 

 

I tried answering this question, but have been facing a lot of issues.....

 

This has been my code so far:

data one;
length String $100.;
infile 'new1.txt' dlm='09'x;
input String$;
run;
Data two;
set one;
l=length(String);
array a{40} _temporary_; array Position{40} P1 - P40;
do i=1 to l;
a{i}=anypunct(String);
if a{i}=0 then Position{i}=compress(put(i,best.));
end;
do x=1 to 39;
Pos1=Compress(put(Position{x},best.)||", "||put(Position{x+1},best.));

Pos=Compress(Pos1 ||", ");
end;
run;
proc print data=two;
run;

 

Please advice........


 

PROC Star
Posts: 1,593

Re: Identifying not alphanumeric characters in a string

[ Edited ]
data have;
infile cards truncover;
input String $100. ;
cards;
S%pecial C*haract`er	
No Special Character			
This li`st of			
random! words			
will he@lp spark			
your creative			
@imagination if you're			
$looking to think			
%up a d$omain n%ame,			
a ban%d name,			
a %project name			
or wh%atever.			
This %tool uses			
thousands of			
hand%picked			
r%andom nouns			
and ver!bs to max^im&ize			
t%he possibility			
of creating interesting			
!b@r#a$i%n^s&t*o*r(m)i-n*g/ -i=d~e#a$s^.		
;	


data want;
set have;
length position type $50;
type=compress(string,' ','ani');
do _n_=1 to length(type);
Position=catx(',',position,put(index(string,substr(type,_n_,1)),8.));
end;
Position=ifc(missing(type),' ',position);
Number=lengthn(strip(type));
run;
Super User
Posts: 13,347

Re: Identifying not alphanumeric characters in a string

I would start with a clear definition of "special". If my value is email addresses then @ and . would not be special characters but any other punctuation would be.

 

If my data is were medical dosage information with drug and dose then % might not be a special character as drugs often contain a percentage solution.

 

If my data were social security numbers than - might not be a "special" character but anything else except the digits 0 through 9 would be.

Respected Advisor
Posts: 4,687

Re: Identifying not alphanumeric characters in a string

[ Edited ]

@DOBBINHO

Taking it from where you've started below should do.

data have;
  infile cards truncover;
  input String $100.;
  cards;
S%pecial C*haract`er  
No Special Character      
This li`st of     
random! words     
will he@lp spark      
your creative     
@imagination if you're      
$looking to think     
%up a d$omain n%ame,      
a ban%d name,     
a %project name     
or wh%atever.     
This %tool uses     
thousands of      
hand%picked     
r%andom nouns     
and ver!bs to max^im&ize      
t%he possibility      
of creating interesting     
!b@r#a$i%n^s&t*o*r(m)i-n*g/ -i=d~e#a$s^.    
;
run;

data want(drop=_:);
  set have;
  length number 8 type $100 position $400;

  array _type {100} $100  _temporary_;
  array _pos  {100} $3    _temporary_; 
  call missing(of _type[*], of _pos[*]);
  number=0;

  _l=length(String);

  do _i=1 to _l;
    _found=findc(string,' ',_i,'ka');
    if _found>0 then
      do;
        _i=_found;
        number=sum(number,1);
        _type[_i]=substr(string,_i,1);
        _pos[_i]=put(_i,f3.);
      end;
    else leave;
  end; 

  type=cats(of _type[*});
  position=catx(', ',of _pos[*]);

run;

 

What constitutes a "special character" would need further definition. In the above code anything except a letter or a blank is considered special. If you need to change this then simply amend the selections in the findc() function accordingly.

findc(string,' ',_i,'ka')

http://support.sas.com/documentation/cdl//en/lefunctionsref/69762/HTML/default/viewer.htm#n1mdh2gvd5...  

Ask a Question
Discussion stats
  • 4 replies
  • 125 views
  • 1 like
  • 5 in conversation