- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 02-24-2010 05:11 AM
(2609 views)
Hi all
I have a set of strings which end with "_".
I want to extract the substrings without the "_".
I don't know how the rest of the strings will look like.
I.e.
have: TESTDATAV1_20100110
want: TESTDATAV1
I would prefer to do it with SQL but I just can't figure out an elegant solution.
Any suggestions very welcome.
data have;
input TestString $20.;
datalines;
TESTDATAV1_20100110
TESTDATAV1_20100111
TESTDATAV2_20100110
TESTDATAV2_20100111
TEST_DATAV1_20100110
TEST_DATAV1_20100111
TEST_DATAV2_20100110
TEST_DATAV2_20100111
;
run;
proc sql;
select ???(TestString) as TestStringRoot
from have
;
quit;
Thanks
Patrick
P.S:
The real world problem I have to solve is creating views with data sets following the pattern as given in the example.
I.e: TESTDATAV1_20100110 and TESTDATAV1_20100111 should be selected to build the view TESTDATAV1.
The only thing I know for sure is that the library will only contain tables following the naming pattern (=end with "_").
I think I can solve this problem - but suggestions/experience from the field/a code example are very welcome.
I have a set of strings which end with "_
I want to extract the substrings without the "_
I don't know how the rest of the strings will look like.
I.e.
have: TESTDATAV1_20100110
want: TESTDATAV1
I would prefer to do it with SQL but I just can't figure out an elegant solution.
Any suggestions very welcome.
data have;
input TestString $20.;
datalines;
TESTDATAV1_20100110
TESTDATAV1_20100111
TESTDATAV2_20100110
TESTDATAV2_20100111
TEST_DATAV1_20100110
TEST_DATAV1_20100111
TEST_DATAV2_20100110
TEST_DATAV2_20100111
;
run;
proc sql;
select ???(TestString) as TestStringRoot
from have
;
quit;
Thanks
Patrick
P.S:
The real world problem I have to solve is creating views with data sets following the pattern as given in the example.
I.e: TESTDATAV1_20100110 and TESTDATAV1_20100111 should be selected to build the view TESTDATAV1.
The only thing I know for sure is that the library will only contain tables following the naming pattern (=end with "_
I think I can solve this problem - but suggestions/experience from the field/a code example are very welcome.
7 REPLIES 7
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can easily do this with the proper combination of text functions.
One possibility would be:
[pre]
proc sql noprint;
select reverse(scan(reverse(TESTSTRING),2,'_')) as TESTSTRING from HAVE;
quit;
[/pre]
See the following info about reverse and scan functions:
http://support.sas.com/documentation/cdl/en/lrdict/62618/HTML/default/a000245941.htm
http://support.sas.com/documentation/cdl/en/lrdict/62618/HTML/default/a000214639.htm
Cheers from Portugal
Daniel Santos @ www.cgd.pt
One possibility would be:
[pre]
proc sql noprint;
select reverse(scan(reverse(TESTSTRING),2,'_')) as TESTSTRING from HAVE;
quit;
[/pre]
See the following info about reverse and scan functions:
http://support.sas.com/documentation/cdl/en/lrdict/62618/HTML/default/a000245941.htm
http://support.sas.com/documentation/cdl/en/lrdict/62618/HTML/default/a000214639.htm
Cheers from Portugal
Daniel Santos @ www.cgd.pt
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Daniel
The only thing I know for sure is that the string ends with "_".
There can be 1 to n underscores in the string.
The solution you proposed results for "TEST_DATAV1_20100110" in "DATAV1".
The result I need is "TEST_DATAV1".
It's about substracting a text pattern from a string - but only if the pattern is found at the end of the string.
PRXCHANGE() comes to my mind for something like this. I'm just asking myself if there isn't a simple solution which can be used within PROC SQL and I just don't get it.
Cheers
Patrick Message was edited by: Patrick
The only thing I know for sure is that the string ends with "_
There can be 1 to n underscores in the string.
The solution you proposed results for "TEST_DATAV1_20100110" in "DATAV1".
The result I need is "TEST_DATAV1".
It's about substracting a text pattern from a string - but only if the pattern is found at the end of the string.
PRXCHANGE() comes to my mind for something like this. I'm just asking myself if there isn't a simple solution which can be used within PROC SQL and I just don't get it.
Cheers
Patrick Message was edited by: Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Consider using the FIND function with a negative third argument of some max value like -99 -- this resulting value (minus 1) will resolve your third argument to SUBSTR.
Scott Barry
SBBWorks, Inc.
Scott Barry
SBBWorks, Inc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Scott
That works.
Thanks
Patrick
data have;
input TestString $20.;
datalines;
TESTDATAV1_20100110
TESTDATAV1_20100111
TESTDATAV2_20100110
TESTDATAV2_20100111
TEST_DATAV1_20100110
TEST_DATAV1_20100111
TEST_DATAV2_20100110
TEST_DATAV2_20100111
;
run;
proc sql;
select substr(TestString,1,find(TestString,'_',-length(TestString))-1) as TestStringRoot
from have
;
quit;
That works.
Thanks
Patrick
data have;
input TestString $20.;
datalines;
TESTDATAV1_20100110
TESTDATAV1_20100111
TESTDATAV2_20100110
TESTDATAV2_20100111
TEST_DATAV1_20100110
TEST_DATAV1_20100111
TEST_DATAV2_20100110
TEST_DATAV2_20100111
;
run;
proc sql;
select substr(TestString,1,find(TestString,'_',-length(TestString))-1) as TestStringRoot
from have
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Patrick.
OK, then, if it is possible to have more than a underscore, then switch the scan function to a combination of the substr/indexc functions, like this:
[pre]
reverse(substr(reverse(TESTSTRING),indexc(reverse(TESTSTRING),'_')+1))
[/pre]
Same logic, which is reverse the string to find the first underscore, trim, then reverse the result again.
Both FIND, INDEXC or even INDEX will work.
FIND/INDEX searchs a substring in a string, while INDEXC searchs a character (or list of) in a string.
Cheers from Portugal.
Daniel Santos @ www.cgd.pt
OK, then, if it is possible to have more than a underscore, then switch the scan function to a combination of the substr/indexc functions, like this:
[pre]
reverse(substr(reverse(TESTSTRING),indexc(reverse(TESTSTRING),'_')+1))
[/pre]
Same logic, which is reverse the string to find the first underscore, trim, then reverse the result again.
Both FIND, INDEXC or even INDEX will work.
FIND/INDEX searchs a substring in a string, while INDEXC searchs a character (or list of) in a string.
Cheers from Portugal.
Daniel Santos @ www.cgd.pt
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Daniel
Thanks. Scott and you are now in sync 🙂
By the way: Since SAS 9 (I think) strings can also be processed backwards by many functions.
I.e:
reverse(scan(reverse(TESTSTRING),2,'_'))
Could also be written as:
scan(TESTSTRIN),-2,'_')
Cheers from Downunder
Thanks. Scott and you are now in sync 🙂
By the way: Since SAS 9 (I think) strings can also be processed backwards by many functions.
I.e:
reverse(scan(reverse(TESTSTRING),2,'_'))
Could also be written as:
scan(TESTSTRIN),-2,'_')
Cheers from Downunder
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Even this will work..
proc sql;
select reverse(substr(reverse(trim(teststring)),10))
from have;
quit;
Reverse scan mentioned by you is perfect.
Jignesh
proc sql;
select reverse(substr(reverse(trim(teststring)),10))
from have;
quit;
Reverse scan mentioned by you is perfect.
Jignesh