Hi
I want to extract characters between two symbols.
Number of characters is always different, it can be 3, 4 or more: #ABC# or #ABCD# etc.
Example:
blablablabla #ABC# blablabla
the output i want to get is ABC
I tried to use:
scan(t1.COMMENT_TXT, 2, "##"), but output was blablablabla blablalba
Maybe it should be combination on FIND and SUBSTRING, but I don't know how to use unknown number of characters in SUBSTRING function.
Can anybody help me please?
Alternatively..
data _null_;
string='blablablabla #ABC# blablabla';
newstring=prxchange('s/.*#(.+)#.*/\1/', -1, string);
put newstring=;
run;
Result:
newstring=ABC
you are almost right, but a small modification to scan(t1.COMMENT_TXT, 2, "#")
data have;
input text&$100.;
news=scan(text,2,'#');
cards;
blablablabla #ABC# blablabla
;
run;
Thanks, Jag!
scan(t1.COMMENT_TXT, 2, "#") works, but the result is ABC blablabla, all characters after last # stay too 😞
they will not , see the output i am getting below
I don't use cards, because there are thousands of row with different data
data have;
input text&$100.;
news=scan(text,2,'#');
cards;blablablabla #ABC# blablabla;
run;
I use ode:
PROC SQL;
CREATE TABLE WANT AS
SELECT DISTINCT
t1.ID,
t1.CREATED,
t1.COMMENT_TXT,
(scan(t1.COMMENT_TXT, 2, "#")) AS ABC
FROM data.COMMENT t1
;QUIT;
@J_J_J wrote:
I don't use cards, because there are thousands of row with different data
data have; input text&$100.; news=scan(text,2,'#'); cards;blablablabla #ABC# blablabla; run;
I use ode:
PROC SQL;
CREATE TABLE WANT AS
SELECT DISTINCTt1.ID,
t1.CREATED,
t1.COMMENT_TXT,
(scan(t1.COMMENT_TXT, 2, "#")) AS ABC
FROM data.COMMENT t1
;QUIT;
A cards statement was used to provide data as an example. There is no implication that you need to use cards.
BTW, you need to pay very close attention when using cards or datalines in line data. The data step you posted in response:
data have; input text&$100.; news=scan(text,2,'#'); cards;blablablabla #ABC# blablabla; run;
Does not run for several reasons. The only thing that can be on a line with CARDS; or DATALINES; or the Cards4; version is the cards and dataline and ending semicolon.
Also the semicolon to end the data MUST be on a separate line or the last line of data is not read.
1 data have; 2 input text&$100.; 3 news=scan(text,2,'#'); 4 cards; NOTE: The data set USER.HAVE has 0 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.00 seconds 4 ! blablablabla #ABC# blablabla; ------------ 180 ERROR 180-322: Statement is not valid or it is used out of proper order. 5 run;
So that data step really needs to be
data have; input text&$100.; news=scan(text,2,'#'); cards; blablablabla #ABC# blablabla ;
the Run; is actually optional as the semicolon at the end of the data lines ends the data step. I include "run" only to make the editor show the code folding and start/end of procedures properly.
You also if using scan in a data step should set the length of the new variable to be the longest expected value. Otherwise you may have either truncated data for very long values or lots of wasted storage space with the default length of the new variable matching that of the scanned text.
the Run; is actually optional as the semicolon at the end of the data lines ends the data step. I include "run" only to make the editor show the code folding and start/end of procedures properly.
That sounds like a bug in the Editor. Perhaps you should raise it as an SAS ballot option.
Alternatively..
data _null_;
string='blablablabla #ABC# blablabla';
newstring=prxchange('s/.*#(.+)#.*/\1/', -1, string);
put newstring=;
run;
Result:
newstring=ABC
Draycut, it works! Thank you!!!
@J_J_J wrote:
Hi
I want to extract characters between two symbols.
Number of characters is always different, it can be 3, 4 or more: #ABC# or #ABCD# etc.
Example:
blablablabla #ABC# blablabla
the output i want to get is ABC
I tried to use:
scan(t1.COMMENT_TXT, 2, "##"), but output was blablablabla blablalba
Maybe it should be combination on FIND and SUBSTRING, but I don't know how to use unknown number of characters in SUBSTRING function.
Can anybody help me please?
You have listed the same delimiter character twice in your list of delimiter characters. The extra # in the list of delimiters does nothing. SCAN() will treat any occurrence of any of the delimiter characters the same.
You cannot get that result from that function call on your posted input. To get that result the input string would need to look like:
some other text#blablablabla blablalba
Or
some other text#blablablabla blablalba#more other text
Do all of your examples have # characters?
blablablabla blablalba
Do they have 2 and only 2 # characters? Could there be just one?
some other text#blablablabla blablalba
Could there be three?
some other text#ABC#CDE#more text
Could there be four or more?
some other text#ABC#more text#CDE#and even more text
Is the # character ever the first character?
#AB#some other text
Do all of your examples have # characters?
blablablabla blablalba
Not all examples have # characters.
Do they have 2 and only 2 # characters? Could there be just one?
some other text#blablablabla blablalba
No, some lines have more than 2 # characters, but not less than 2
Could there be three?
some other text#ABC#CDE#more text
It could be text #abc# text #abs# text
Could there be four or more?
some other text#ABC#more text#CDE#and even more text
Yes
Is the # character ever the first character?
#AB#some other text
# character can be first, in the middle of the text or last in the sentence:
#abc# text
text #abc# text
Text #abc#
text #abc# text #abc# text
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.