BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
J_J_J
Obsidian | Level 7

Hi

I want to extract characters between two symbols.

Number of characters is always different, it can be 3, 4 or more: #ABC# or #ABCD# etc.  

Example:

blablablabla #ABC# blablabla

the output i want to get is ABC

 

I tried to use:

scan(t1.COMMENT_TXT, 2, "##"), but output was blablablabla blablalba

Maybe it should be combination on FIND and SUBSTRING, but I don't know how to use unknown number of characters in SUBSTRING function.

Can anybody help me please?

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Alternatively..

 

data _null_;
    string='blablablabla #ABC# blablabla';
    newstring=prxchange('s/.*#(.+)#.*/\1/', -1, string);
    put newstring=;
run;

Result:

 

newstring=ABC

View solution in original post

11 REPLIES 11
Jagadishkatam
Amethyst | Level 16

you are almost right, but a small modification to scan(t1.COMMENT_TXT, 2, "#")

 

data have;
input text&$100.;
news=scan(text,2,'#');
cards;
blablablabla #ABC# blablabla
;
run;
Thanks,
Jag
J_J_J
Obsidian | Level 7

Thanks, Jag

Jagadishkatam
Amethyst | Level 16

they will not , see the output i am getting below

 

image.png

Thanks,
Jag
J_J_J
Obsidian | Level 7

I don't use cards, because there are thousands of row with different data

data have;
input text&$100.;
news=scan(text,2,'#');
cards;blablablabla #ABC# blablabla;
run;

I use ode:

 

PROC SQL;
CREATE TABLE WANT AS
SELECT DISTINCT

t1.ID,
t1.CREATED,
t1.COMMENT_TXT,
(scan(t1.COMMENT_TXT, 2, "#")) AS ABC
FROM data.COMMENT t1
;QUIT;

ballardw
Super User

@J_J_J wrote:

I don't use cards, because there are thousands of row with different data

data have;
input text&$100.;
news=scan(text,2,'#');
cards;blablablabla #ABC# blablabla;
run;

I use ode:

 

PROC SQL;
CREATE TABLE WANT AS
SELECT DISTINCT

t1.ID,
t1.CREATED,
t1.COMMENT_TXT,
(scan(t1.COMMENT_TXT, 2, "#")) AS ABC
FROM data.COMMENT t1
;QUIT;


A cards statement was used to provide data as an example. There is no implication that you need to use cards.

 

BTW, you need to pay very close attention when using cards or datalines in line data. The data step you posted in response:

data have;
input text&$100.;
news=scan(text,2,'#');
cards;blablablabla #ABC# blablabla;
run;

Does not run for several reasons. The only thing that can be on a line with CARDS; or DATALINES;  or the Cards4; version is the cards and dataline and ending semicolon.

Also the semicolon to end the data MUST be on a separate line or the last line of data is not read.

1    data have;
2    input text&$100.;
3    news=scan(text,2,'#');
4    cards;

NOTE: The data set USER.HAVE has 0 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.00 seconds


4  !       blablablabla #ABC# blablabla;
           ------------
           180

ERROR 180-322: Statement is not valid or it is used out of proper order.

5    run;

So that data step really needs to be

data have;
input text&$100.;
news=scan(text,2,'#');
cards;
blablablabla #ABC# blablabla
;

the Run; is actually optional as the semicolon at the end of the data lines ends the data step. I include "run" only to make the editor show the code folding and start/end of procedures properly.

 

You also if using scan in a data step should set the length of the new variable to be the longest expected value. Otherwise you may have either truncated data for very long values or lots of wasted storage space with the default length of the new variable matching that of the scanned text.

Tom
Super User Tom
Super User

the Run; is actually optional as the semicolon at the end of the data lines ends the data step. I include "run" only to make the editor show the code folding and start/end of procedures properly.

That sounds like a bug in the Editor.  Perhaps you should raise it as an SAS ballot option.  

 

PeterClemmensen
Tourmaline | Level 20

Alternatively..

 

data _null_;
    string='blablablabla #ABC# blablabla';
    newstring=prxchange('s/.*#(.+)#.*/\1/', -1, string);
    put newstring=;
run;

Result:

 

newstring=ABC
J_J_J
Obsidian | Level 7

Draycut, it works! Thank you!!!

J_J_J
Obsidian | Level 7
Could you please explain metecharacters you've used: prxchange('s/.*#(.+)#.*/\1/', -1, string)
Tom
Super User Tom
Super User

@J_J_J wrote:

Hi

I want to extract characters between two symbols.

Number of characters is always different, it can be 3, 4 or more: #ABC# or #ABCD# etc.  

Example:

blablablabla #ABC# blablabla

the output i want to get is ABC

 

I tried to use:

scan(t1.COMMENT_TXT, 2, "##"), but output was blablablabla blablalba

Maybe it should be combination on FIND and SUBSTRING, but I don't know how to use unknown number of characters in SUBSTRING function.

Can anybody help me please?


You have listed the same delimiter character twice in your list of delimiter characters. The extra # in the list of delimiters does nothing. SCAN() will treat any occurrence of any of the delimiter characters the same.

 

You cannot get that result from that function call on your posted input.  To get that result the input string would need to look like:

some other text#blablablabla blablalba

Or 

some other text#blablablabla blablalba#more other text

Do all of your examples have # characters? 

blablablabla blablalba

Do they have 2 and only 2 # characters? Could there be just one?

some other text#blablablabla blablalba

Could there be three?

some other text#ABC#CDE#more text

Could there be four or more?

some other text#ABC#more text#CDE#and even more text

Is the # character ever the first character?

#AB#some other text
J_J_J
Obsidian | Level 7

Do all of your examples have # characters? 

blablablabla blablalba

Not all examples have # characters.

Do they have 2 and only 2 # characters? Could there be just one?

some other text#blablablabla blablalba

No, some lines have more than 2 # characters, but not less than 2

Could there be three?

some other text#ABC#CDE#more text

It could be text #abc# text #abs# text

Could there be four or more?

some other text#ABC#more text#CDE#and even more text

Yes

Is the # character ever the first character?

#AB#some other text

# character can be first, in the middle of the text or last in the sentence:

#abc# text

text #abc# text

Text #abc#

text #abc# text #abc# text

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 5905 views
  • 1 like
  • 5 in conversation