- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi
I want to extract characters between two symbols.
Number of characters is always different, it can be 3, 4 or more: #ABC# or #ABCD# etc.
Example:
blablablabla #ABC# blablabla
the output i want to get is ABC
I tried to use:
scan(t1.COMMENT_TXT, 2, "##"), but output was blablablabla blablalba
Maybe it should be combination on FIND and SUBSTRING, but I don't know how to use unknown number of characters in SUBSTRING function.
Can anybody help me please?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Alternatively..
data _null_;
string='blablablabla #ABC# blablabla';
newstring=prxchange('s/.*#(.+)#.*/\1/', -1, string);
put newstring=;
run;
Result:
newstring=ABC
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
you are almost right, but a small modification to scan(t1.COMMENT_TXT, 2, "#")
data have;
input text&$100.;
news=scan(text,2,'#');
cards;
blablablabla #ABC# blablabla
;
run;
Jag
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, Jag!
scan(t1.COMMENT_TXT, 2, "#") works, but the result is ABC blablabla, all characters after last # stay too 😞
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
they will not , see the output i am getting below
Jag
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't use cards, because there are thousands of row with different data
data have;
input text&$100.;
news=scan(text,2,'#');
cards;blablablabla #ABC# blablabla;
run;
I use ode:
PROC SQL;
CREATE TABLE WANT AS
SELECT DISTINCT
t1.ID,
t1.CREATED,
t1.COMMENT_TXT,
(scan(t1.COMMENT_TXT, 2, "#")) AS ABC
FROM data.COMMENT t1
;QUIT;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@J_J_J wrote:
I don't use cards, because there are thousands of row with different data
data have; input text&$100.; news=scan(text,2,'#'); cards;blablablabla #ABC# blablabla; run;
I use ode:
PROC SQL;
CREATE TABLE WANT AS
SELECT DISTINCTt1.ID,
t1.CREATED,
t1.COMMENT_TXT,
(scan(t1.COMMENT_TXT, 2, "#")) AS ABC
FROM data.COMMENT t1
;QUIT;
A cards statement was used to provide data as an example. There is no implication that you need to use cards.
BTW, you need to pay very close attention when using cards or datalines in line data. The data step you posted in response:
data have; input text&$100.; news=scan(text,2,'#'); cards;blablablabla #ABC# blablabla; run;
Does not run for several reasons. The only thing that can be on a line with CARDS; or DATALINES; or the Cards4; version is the cards and dataline and ending semicolon.
Also the semicolon to end the data MUST be on a separate line or the last line of data is not read.
1 data have; 2 input text&$100.; 3 news=scan(text,2,'#'); 4 cards; NOTE: The data set USER.HAVE has 0 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.00 seconds 4 ! blablablabla #ABC# blablabla; ------------ 180 ERROR 180-322: Statement is not valid or it is used out of proper order. 5 run;
So that data step really needs to be
data have; input text&$100.; news=scan(text,2,'#'); cards; blablablabla #ABC# blablabla ;
the Run; is actually optional as the semicolon at the end of the data lines ends the data step. I include "run" only to make the editor show the code folding and start/end of procedures properly.
You also if using scan in a data step should set the length of the new variable to be the longest expected value. Otherwise you may have either truncated data for very long values or lots of wasted storage space with the default length of the new variable matching that of the scanned text.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
the Run; is actually optional as the semicolon at the end of the data lines ends the data step. I include "run" only to make the editor show the code folding and start/end of procedures properly.
That sounds like a bug in the Editor. Perhaps you should raise it as an SAS ballot option.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Alternatively..
data _null_;
string='blablablabla #ABC# blablabla';
newstring=prxchange('s/.*#(.+)#.*/\1/', -1, string);
put newstring=;
run;
Result:
newstring=ABC
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Draycut, it works! Thank you!!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@J_J_J wrote:
Hi
I want to extract characters between two symbols.
Number of characters is always different, it can be 3, 4 or more: #ABC# or #ABCD# etc.
Example:
blablablabla #ABC# blablabla
the output i want to get is ABC
I tried to use:
scan(t1.COMMENT_TXT, 2, "##"), but output was blablablabla blablalba
Maybe it should be combination on FIND and SUBSTRING, but I don't know how to use unknown number of characters in SUBSTRING function.
Can anybody help me please?
You have listed the same delimiter character twice in your list of delimiter characters. The extra # in the list of delimiters does nothing. SCAN() will treat any occurrence of any of the delimiter characters the same.
You cannot get that result from that function call on your posted input. To get that result the input string would need to look like:
some other text#blablablabla blablalba
Or
some other text#blablablabla blablalba#more other text
Do all of your examples have # characters?
blablablabla blablalba
Do they have 2 and only 2 # characters? Could there be just one?
some other text#blablablabla blablalba
Could there be three?
some other text#ABC#CDE#more text
Could there be four or more?
some other text#ABC#more text#CDE#and even more text
Is the # character ever the first character?
#AB#some other text
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do all of your examples have # characters?
blablablabla blablalba
Not all examples have # characters.
Do they have 2 and only 2 # characters? Could there be just one?
some other text#blablablabla blablalba
No, some lines have more than 2 # characters, but not less than 2
Could there be three?
some other text#ABC#CDE#more text
It could be text #abc# text #abs# text
Could there be four or more?
some other text#ABC#more text#CDE#and even more text
Yes
Is the # character ever the first character?
#AB#some other text
# character can be first, in the middle of the text or last in the sentence:
#abc# text
text #abc# text
Text #abc#
text #abc# text #abc# text