BookmarkSubscribeRSS Feed
Arjumand
Calcite | Level 5

Hi,

I want to determine the ascii value of some special characters like greek symbols α, η, ω etc. using below code snippet:

 

data ascii (encoding="utf-8");
ascii_val=rank('Φ');
put ascii_val=;
run;

 

But these symbols get manipulated and I obtain wrong ascii values in log.
I actually want to read a file using the infile statement and scan every character present in a line and If it is found to be a special symobl(ascii value more than 127) then it should prompt an error message and terminate the execution of the data step.

 

 

5 REPLIES 5
Patrick
Opal | Level 21

A variation of below code should give you what you're after.

data have;
  length string $256;
  string = collate(0,255);
  output;
  string = collate(0,127);
  output;
run;

data 
  want (keep=string)
  error (keep=rownum string)
  exceptions (keep=rownum pos ASCII_Col_Seq)
  ;

  set have;
  rownum=_n_;

  retain prxid;
  if _n_=1 then
    prxid=prxparse('/[\x80-\xFF]/');

  _start=1;
  _stop=lengthn(string);
  call prxnext(prxid, _start, _stop, string, pos);
  if pos>0 then
    do;
      output error;
      do while (pos > 0);
        ASCII_Col_Seq = rank(substr(string, pos,1));
        output exceptions;
/*        put ASCII_Col_Seq= pos=;*/
        call prxnext(prxid, _start, _stop, string, pos);
      end;
    end;
  else
    output want;
run;
Arjumand
Calcite | Level 5

Could you please explain this code a bit?

Patrick
Opal | Level 21

Hi,

 

Some brief explanations as requested.

My thinking behind this code was that it's often very helpful to get a list of all "forbidden" characters together with their positions in the string. This supports best further investigation and problem resolution.

 

- Data Want: Two rows get created, the first one contains "forbidden" characters (in real life: You might also want to exclude characters in the low range, eg. HEX 00).

 

- The ERROR table simply contains a copy of records from WANT where an issue has been found.

- The EXCEPTION table contains a row per issue found in the row from WANT. It gives you the exact position in the string for the hurting character as well as the HEX value. ROWNUM will allow you to trace back where the issue originates from (you could also use the business key columns instead; if there are any).

 

- prxid=prxparse('/[\x80-\xFF]/');  This compiles a Regular Expression which allows searching for characters in the Hex range of 80 to FF (=single byte encoded characters greater Decimal 127).

 

- The way I've used PRXNEXT() allows you to implement a loop over the source string from have searching for forbidden characters one-by-one. The syntax used is pretty close to the example from the doc and explained there.  https://support.sas.com/documentation/cdl/en/lefunctionsref/67960/HTML/default/viewer.htm#n1obc9u7z3...

 

Hope this sheds some light on the code I've posted.

 

Thanks,

Patrick

MikeZdeb
Rhodochrosite | Level 12

Hi, here's another idea.  First data step creates a data set with one variable that might contain one or more ASCII characters with a value of more than 127.  Next data step checks the length of the string to the length after removibg all characters ASCII value 128+.

 

data have;
length x $10;
do i=1 to 20;
x=' ';
do j=1 to 10;
   y = ceil(100*ranuni(99)) + 32;
   x = catt(x,byte(y));
end;
output;
end;
keep x;
run;

 

* check for characters with ASCII value 128+ ... OK = 1 means there are none;

data want;
set have;
ok = length(x) eq length(compress(x,collate(128,255)));
run;

 

data set WANT ...

Obs        x         ok

  1    @$[e8€l`}\     0
  2    ,wQbkKƒ?fK     0
  3    .w>L+brMdA     1
  4    +04ƒFKhn<a     0
  5    Q8[U/?{K_y     1
  6    Lt(Bzl{+Wy     1
  7    h„9Zm0kZ7C     0
  8    _xb+RLpa_k     1
  9    4C6Qs&M^#]     1
 10    q3$ypchlqC     1
 11    4SrN>?Xspa     1
 12    jvr|1_X}fT     1
 13    {1|Q}DWQ0i     1
 14    ~f]>Yjz7Gm     1
 15    86K08O*€g1     0
 16    H/sE6ITbSi     1
 17    +b_8J5I?=v     1
 18    O4~vtC3ZPw     1
 19    MFr€-CuAeI     0
 20    h„D24xiM}i     0

 

If you want the data step to just stop when characters 128+ are encountered, you could just use ...

 

if length(x) ne length(compress(x,collate(128,255))) then stop;

 

If you are reading raw data rather than a data set, you could use ...

 

data want;
infile 'z:\ascii.txt';
input;
if length(_infile_) ne length(compress(_infile_,collate(128,255))) then stop;
run;

 

 

MikeZdeb
Rhodochrosite | Level 12

Hi, you can also get a list of "bad" characters in an ERROR data set without resorting to PRX functions (for those of us who have never "gotten the hang of PRX") ... that smiley face in data set WANT is an HTML thing ...

 

* make variable X length 20 to increase the chance of 2+ bad characters;

data have;
length x $20;
do i=1 to 20;
x=' ';
do j=1 to 20;
y = ceil(100*ranuni(99)) + 32;
x = catt(x,byte(y));
end;
output;
end;
keep x;
run;

 

data error (keep=rec pos character ascii) want(keep=x ok);
set have;
ok = length(x) eq length(compress(x,collate(128,255)));
output want;
rec=_n_;
start=1;
do j=1 to length(x);
   pos = findc(x,collate(128,255),start);
   if pos then do;

      character=char(x,pos); ascii=rank(character); start+pos; output error;

   end;
end;
run;

 

the ERROR data set ...

Obs rec pos character ascii

1    1   6    €       128
2    1  17    ƒ       131
3    2  14    ƒ       131
4    4   2    „       132
5    8   8    €       128
6   10   4    €       128
7   10  12    „       132
8   11  17    „       132
9   18  10            129
10  19   4    ‚       130
11  19   8    €       128
12  19  13    ƒ       131
13  20   2    €       128

 

the WANT data set ...

Obs       x            ok

1  @$[e8€l`}\,wQbkKƒ?fK 0
2  .w>L+brMdA+04ƒFKhn<a 0
3  Q8[U/?{K_yLt(Bzl{+Wy 1
4  h„9Zm0kZ7C_xb+RLpa_k 0
5  4C6Qs&M^#]q3$ypchlqC 1
6  4SrN>?Xspajvr|1_X}fT 1
7  {1|Q}DWQ0i~f]>Yjz7Gm 1
8  86K08O*€g1H/sE6ITbSi 0
9  +b_8J5I?=vO4~vtC3ZPw 1
10 MFr€-CuAeIh„D24xiM}i 0
11 n>n$/_a[|.1oK!78„sMx 0
12 S?m-{2]|&6'i$oI{3#T  1
13 X$4Cq#igu&'*eJIgqLw# 1
14 ]5ewe:ppE3't$}DsII0$ 1
15 U;N=cO*iy}xQ_%uICg^` 1
16 Vw.H\=Y?rI[]u^4g/M)O 1
17 M<pLy)%3wg\?bef`76"^ 1
18 +aJFURsNc_/|HMvDsL   0
19 t1q‚C@S€WCq7ƒK]ƒi3\" 0
20 T€S3sWm">P,!*k1NC08C 0

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1562 views
  • 0 likes
  • 3 in conversation