BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pdhokriya
Pyrite | Level 9

Currently, I am facing special characters issue, how shall I replace these special char with 1 space.

 

data cm0_;
cmtrt = "Metformin Hydrochloride Tablets"; output;
cmtrt = "Newdatarequired";output;
cmtrt = "";output;
run;

 

Requirement: Metformin Hydrochloride Tablets

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Typically I just replace the strange things with spaces and then use COMPBL() to collapse the multiple spaces.

What do you consider "special"?

Perhaps anything that is not between a space and a tilde?

data test;
cmtrt = "Metformin Hydrochloride Tablets"; 
want = compbl(translate(cmtrt,' ',collate(0,31)||collate(127,255)));
put (_all_) (=/);
run;

 

View solution in original post

10 REPLIES 10
maguiremq
SAS Super FREQ

Is that the only pattern in your dataset? If so, this works.

data want;
	set cm0_;
		cmtrt_2 = tranwrd(cmtrt, " ", " ");
run;
cmtrt 	                                                cmtrt_2
Metformin Hydrochloride Tablets 	Metformin Hydrochloride Tablets
Newdatarequired 	                                Newdatarequired
  	 
pdhokriya
Pyrite | Level 9

Hi, No this is hardcoding, I have other special chars also.

Tom
Super User Tom
Super User

Typically I just replace the strange things with spaces and then use COMPBL() to collapse the multiple spaces.

What do you consider "special"?

Perhaps anything that is not between a space and a tilde?

data test;
cmtrt = "Metformin Hydrochloride Tablets"; 
want = compbl(translate(cmtrt,' ',collate(0,31)||collate(127,255)));
put (_all_) (=/);
run;

 

pdhokriya
Pyrite | Level 9

This is working:


data test; cmtrt = "Metformin Hydrochloride Tablets"; want = compbl(translate(cmtrt,' ',collate(127,255))); put (_all_) (=/); run;

 

I want to know more about this: what is happening when 0,31 and 127,255 considered?

 

collate(0,31)||collate(127,255)

Tom
Super User Tom
Super User

In a single byte encoding there are only 256 possible codes from 0 to 255. The COLLATE() function just makes it easy to generate a series of characters by there code.

 

The ASCII code for a space is 32.  Any byte less than that is a control character, like a tab or a linefeed.

The ASCII code for a tilde is 126.  127 is a DELETE character.  Anything from 128 to 255 has the 8th bit and so is not part of the normal 7-bit ASCII codes.  That is where all of the accented characters and other strange glyphs live.

 

FreelanceReinh
Jade | Level 19

Hello @pdhokriya,

 

This works for your example in a SAS session using WLATIN1 encoding  ...

data want;
set cm0_;
cmtrt=kpropdata(cmtrt,' ','utf-8');
run;

... and suggests that the ' ' string originally was a UTF-8 character of some sort. Therefore, the same solution might work for other "unprintable" UTF-8 characters as well. You can specify 'wlatin1' explicitly as the fourth argument if needed (see documentation of the KPROPDATA function).

pdhokriya
Pyrite | Level 9
Hi , Thank you for reply, this does not work.
FreelanceReinh
Jade | Level 19

@pdhokriya wrote:
Hi , Thank you for reply, this does not work.

What happened when you tried it?

 

See how it works on my workstation:

1    proc options option=encoding;
2    run;

    SAS (r) Proprietary Software Release 9.4  TS1M5

 ENCODING=WLATIN1  Specifies the default character-set encoding for the SAS session.
NOTE: PROCEDURE OPTIONS used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds


3
4    data cm0_;
5    cmtrt = "Metformin Hydrochloride Tablets"; output;
6    cmtrt = "Newdatarequired";output;
7    cmtrt = "";output;
8    run;

NOTE: The data set WORK.CM0_ has 3 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.03 seconds


9
10   data want;
11   set cm0_;
12   cmtrt=kpropdata(cmtrt,' ','utf-8');
13   run;

NOTE: There were 3 observations read from the data set WORK.CM0_.
NOTE: The data set WORK.WANT has 3 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


14
15   proc print data=want;
16   run;

NOTE: There were 3 observations read from the data set WORK.WANT.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


17
18   proc print data=want;
19   format cmtrt $hex70.;
20   run;

Results:

Obs    cmtrt

 1     Metformin Hydrochloride Tablets
 2     Newdatarequired
 3
Obs    cmtrt

 1     4D6574666F726D696E20487964726F63686C6F72696465205461626C65747320202020
 2     4E65776461746172657175697265642020202020202020202020202020202020202020
 3     2020202020202020202020202020202020202020202020202020202020202020202020

The two '20'x characters highlighted above in obs. 1 are the blanks that replaced the original 3-byte UTF-8 characters.

 

Meanwhile I found out that ' ' = 'E38080'x is indeed a space character in UTF-8, called "ideographic space" (code point U+3000) and used in the context of Chinese, Japanese and Korean languages.

References: https://unicode.org/charts/nameslist/n_3000.html and (for the conversion between code points like U+3000 and hexadecimal UTF-8 codes like E38080) https://en.wikipedia.org/wiki/UTF-8

ChrisNZ
Tourmaline | Level 20

Another way:

data T;
  OLD = "Metformin Hydrochloride Tablets";
  NEW = compbl(prxchange('s/[^a-zA-Z]/ /',-1,OLD));
  put (_ALL_) (=/);
run;

OLD=Metformin Hydrochloride Tablets
NEW=Metformin Hydrochloride Tablets


Note that ã€€ is UTF-8 for IDEOGRAPHIC SPACE

When reading an UTF-8 string, you should use UTF-8 encoding.

pdhokriya
Pyrite | Level 9
THank you for ur input

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2306 views
  • 8 likes
  • 5 in conversation