Hi everyone, I am wondering if there is a function that can help identify observations for a variable that contain all the valid charaters.
For example, I have a dataset called test which is comprised of two variables. 1. observation, 2 code. Below is the table of info:
Observation Code
1 abccccccccccc
2 aaaaaaaaaaaa
3 bbbcccccccccc
4 aaaaacccccccc
5 bbbbbbbbbbbb
6 bbbbbbbaaaaa
7 cccccbbbbbbbc
I would like to identify the oberversation 3 and observation 7 that contain all the characters "b" and "c" only NOT "b" or "c" only. What function(s) can help me get that?
I tried to use verify (trim(Code),"bc"), and it ends up giving me the observation 3, observation 5 and observation 7, but I only want to have observation 3 and observation 7 in my result.
Note: I know I could use find( trim(code),"bbbcccccccccc") to find obersavtion 3 and find(trim(code),"cccccbbbbbbbc") to find obersavtion 7, but that requires the exact match not just in terms of all characters in a string, but also in terms of their positions in a string.
Is there a more efficient function that can help identify the observation 3 and observation 7 without specifying the entire characters in a string as well as the exact positioning of those characters in a string?
Any help on this would be greatly appreciated.
Thank you!
There is probably an easier way, but I think that the following will do what you want:
data want; set have; test=verify(strip(code),'acdefghijklmnopqrstuvwxyz')>0 and verify(strip(code),'abdefghijklmnopqrstuvwxyz') > 0 and verify(strip(code),'bc') eq 0; run;
Art, CEO, AnalystFinder.com
Somebody will come up with a pxparse solution that I won't understand, but here is a reasonable way:
data want;
set have;
test_code = compress(code, 'b');
if 0 < lengthn(test_code) < lengthn(code);
if compress(test_code, 'c') = ' ';
drop test_code;
run;
Here's another option
data want;
set have;
where indexc(code,'b')>0 and indexc(code,'c')>0 and indexc(code,'adefghijklmnopqrstuvwxyz') = 0;
run;
:
Just use the function (compress, not trim) twice:
data want;
bcode=compress(code,'b');
if bcode=' ' then delete;
if compress(code,'c')=' ';
if compress(bcode,'c')=' ';
run;
Editted, thanks to Art's observation.
I'd have to think that @mkeintz meant:
data want (drop=bcode); set have; bcode=compress(code,'b'); if bcode=' ' then delete; if compress(bcode,'c')=' '; run;
Art, CEO, AnalystFinder.com
I think @mkeintz meant that too. Will edit.
Hi,
Here is my solution to your question
data have;
length Observation 3 Code $20;
input Observation Code;
datalines;
1 abccccccccccc
2 aaaaaaaaaaaa
3 bbbcccccccccc
4 aaaaacccccccc
5 bbbbbbbbbbbb
6 bbbbbbbaaaaa
7 cccccbbbbbbbc
;
run;
data want;
set have(where=(prxmatch('/[^bc]/',strip(code))=0)); /* Only read observation with nothing but b/c/both */
if (findc(code,'c') AND findc(code,'b')); /* Only keep observations with both b&c */
run;
Contains both B and C
indexc(string,'b') and indexc(string,'c')
But doesn't have something other than b and c.
and not verify(string,'b c')
Try it.
data test;
input string $10. ;
ok = indexc(string,'b') and indexc(string,'c') and not verify(string,' bc') ;
put (_all_) (=);
cards;
abccccccccccc
aaaaaaaaaaaa
bbbcccccccccc
aaaaacccccccc
bbbbbbbbbbbb
bbbbbbbaaaaa
cccccbbbbbbbc
abc
bbbcc
bc
.
;
Thank you so much for all the ideas, which really helped stimulate my thinking, I built upon some of your ideas, and arriived at the following codes for the task.
data test; input Obs Code $20. ; cards; 1 abccccccccccc 2 aaaaaaaaaaaa 3 bbbcccccccccc 4 aaaaacccccccc 5 bbbbbbbbbbbb 6 bbbbbbbbaaaaa 7 cccccbbbbbbbc ; run; data final; set test; if verify(strip(code),"b")>0 and verify(strip(code),"c")>0 and verify(strip(code),"bc")=0; run;
Once again, thanks so much for all your help on this task.
Best,
data have;
input Observation Code :$20.;
datalines;
1 abccccccccccc
2 aaaaaaaaaaaa
3 bbbcccccccccc
4 aaaaacccccccc
5 bbbbbbbbbbbb
6 bbbbbbbaaaaa
7 cccccbbbbbbbc
;
data want;
check = "bc";
set have;
ok = verify(check, trim(code))=0 and verify(trim(code), check)=0;
run;
proc print; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.