Hi everyone, I am wondering if there is a function that can help identify observations for a variable that contain all the valid charaters.
For example, I have a dataset called test which is comprised of two variables. 1. observation, 2 code. Below is the table of info:
Observation Code
1 abccccccccccc
2 aaaaaaaaaaaa
3 bbbcccccccccc
4 aaaaacccccccc
5 bbbbbbbbbbbb
6 bbbbbbbaaaaa
7 cccccbbbbbbbc
I would like to identify the oberversation 3 and observation 7 that contain all the characters "b" and "c" only NOT "b" or "c" only. What function(s) can help me get that?
I tried to use verify (trim(Code),"bc"), and it ends up giving me the observation 3, observation 5 and observation 7, but I only want to have observation 3 and observation 7 in my result.
Note: I know I could use find( trim(code),"bbbcccccccccc") to find obersavtion 3 and find(trim(code),"cccccbbbbbbbc") to find obersavtion 7, but that requires the exact match not just in terms of all characters in a string, but also in terms of their positions in a string.
Is there a more efficient function that can help identify the observation 3 and observation 7 without specifying the entire characters in a string as well as the exact positioning of those characters in a string?
Any help on this would be greatly appreciated.
Thank you!
There is probably an easier way, but I think that the following will do what you want:
data want; set have; test=verify(strip(code),'acdefghijklmnopqrstuvwxyz')>0 and verify(strip(code),'abdefghijklmnopqrstuvwxyz') > 0 and verify(strip(code),'bc') eq 0; run;
Art, CEO, AnalystFinder.com
Somebody will come up with a pxparse solution that I won't understand, but here is a reasonable way:
data want;
set have;
test_code = compress(code, 'b');
if 0 < lengthn(test_code) < lengthn(code);
if compress(test_code, 'c') = ' ';
drop test_code;
run;
Here's another option
data want;
set have;
where indexc(code,'b')>0 and indexc(code,'c')>0 and indexc(code,'adefghijklmnopqrstuvwxyz') = 0;
run;
:
Just use the function (compress, not trim) twice:
data want;
bcode=compress(code,'b');
if bcode=' ' then delete;
if compress(code,'c')=' ';
if compress(bcode,'c')=' ';
run;
Editted, thanks to Art's observation.
I'd have to think that @mkeintz meant:
data want (drop=bcode); set have; bcode=compress(code,'b'); if bcode=' ' then delete; if compress(bcode,'c')=' '; run;
Art, CEO, AnalystFinder.com
I think @mkeintz meant that too. Will edit.
Hi,
Here is my solution to your question
data have;
length Observation 3 Code $20;
input Observation Code;
datalines;
1 abccccccccccc
2 aaaaaaaaaaaa
3 bbbcccccccccc
4 aaaaacccccccc
5 bbbbbbbbbbbb
6 bbbbbbbaaaaa
7 cccccbbbbbbbc
;
run;
data want;
set have(where=(prxmatch('/[^bc]/',strip(code))=0)); /* Only read observation with nothing but b/c/both */
if (findc(code,'c') AND findc(code,'b')); /* Only keep observations with both b&c */
run;
Contains both B and C
indexc(string,'b') and indexc(string,'c')
But doesn't have something other than b and c.
and not verify(string,'b c')
Try it.
data test;
input string $10. ;
ok = indexc(string,'b') and indexc(string,'c') and not verify(string,' bc') ;
put (_all_) (=);
cards;
abccccccccccc
aaaaaaaaaaaa
bbbcccccccccc
aaaaacccccccc
bbbbbbbbbbbb
bbbbbbbaaaaa
cccccbbbbbbbc
abc
bbbcc
bc
.
;
Thank you so much for all the ideas, which really helped stimulate my thinking, I built upon some of your ideas, and arriived at the following codes for the task.
data test; input Obs Code $20. ; cards; 1 abccccccccccc 2 aaaaaaaaaaaa 3 bbbcccccccccc 4 aaaaacccccccc 5 bbbbbbbbbbbb 6 bbbbbbbbaaaaa 7 cccccbbbbbbbc ; run; data final; set test; if verify(strip(code),"b")>0 and verify(strip(code),"c")>0 and verify(strip(code),"bc")=0; run;
Once again, thanks so much for all your help on this task.
Best,
data have;
input Observation Code :$20.;
datalines;
1 abccccccccccc
2 aaaaaaaaaaaa
3 bbbcccccccccc
4 aaaaacccccccc
5 bbbbbbbbbbbb
6 bbbbbbbaaaaa
7 cccccbbbbbbbc
;
data want;
check = "bc";
set have;
ok = verify(check, trim(code))=0 and verify(trim(code), check)=0;
run;
proc print; run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.