BookmarkSubscribeRSS Feed
Dancer_on_data
Obsidian | Level 7

Hi everyone, I am wondering if there is a function that can help identify observations for a variable that contain all the valid charaters.

 

For example, I have a dataset called test which is comprised of two variables. 1. observation, 2 code. Below is the table of info:

 

Observation Code

1                  abccccccccccc

2                  aaaaaaaaaaaa

3                  bbbcccccccccc

4                  aaaaacccccccc

5                  bbbbbbbbbbbb

6                  bbbbbbbaaaaa

7                  cccccbbbbbbbc

 

I would like to identify the oberversation 3 and observation 7  that contain all the characters "b" and "c"  only NOT "b" or "c" only. What function(s) can help me get that?

 

I tried to use verify (trim(Code),"bc"), and it ends up giving me the observation 3, observation 5  and observation 7, but I only want to have observation 3 and  observation 7 in my result.

 

Note:  I know I could use find( trim(code),"bbbcccccccccc") to find obersavtion 3 and find(trim(code),"cccccbbbbbbbc") to find obersavtion 7, but that requires the exact match not just in terms of all characters in a string, but also in terms of their positions in a string.

 

Is there a more efficient function that can help identify the observation 3 and observation 7 without specifying the entire characters in a string as well as the exact positioning of those characters in a string?

 

Any help on this would be greatly appreciated. Smiley Happy

 

Thank you!

11 REPLIES 11
art297
Opal | Level 21

There is probably an easier way, but I think that the following will do what you want:

 

data want;
  set have;
  test=verify(strip(code),'acdefghijklmnopqrstuvwxyz')>0 and
    verify(strip(code),'abdefghijklmnopqrstuvwxyz') > 0 and
    verify(strip(code),'bc') eq 0;
run;

Art, CEO, AnalystFinder.com

 

Astounding
PROC Star

Somebody will come up with a pxparse solution that I won't understand, but here is a reasonable way:

 

data want;

set have;

test_code = compress(code, 'b');

if 0 < lengthn(test_code) < lengthn(code);

if compress(test_code, 'c') = ' ';

drop test_code;

run;

nehalsanghvi
Pyrite | Level 9

Here's another option

data want;
set have;
where indexc(code,'b')>0 and indexc(code,'c')>0 and indexc(code,'adefghijklmnopqrstuvwxyz') = 0;
run;

:

 

mkeintz
PROC Star

Just use the function (compress, not trim) twice:

 

 

data want;

   bcode=compress(code,'b');

   if bcode=' ' then delete;

   if compress(code,'c')=' ';

   if compress(bcode,'c')=' ';

run;

 

Editted, thanks to Art's observation.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
art297
Opal | Level 21

I'd have to think that @mkeintz meant:

 

data want (drop=bcode);
  set have;
  bcode=compress(code,'b');
  if bcode=' ' then delete;
  if compress(bcode,'c')=' ';
run;

Art, CEO, AnalystFinder.com

 

mkeintz
PROC Star

@art297

 

I think @mkeintz meant that too.  Will edit.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
AhmedAl_Attar
Rhodochrosite | Level 12

Hi,

 

Here is my solution to your question

 

data have;
length Observation 3 Code $20;
input Observation Code;
datalines;
1                  abccccccccccc
2                  aaaaaaaaaaaa
3                  bbbcccccccccc
4                  aaaaacccccccc
5                  bbbbbbbbbbbb
6                  bbbbbbbaaaaa
7                  cccccbbbbbbbc
;
run;

 

data want;
    set have(where=(prxmatch('/[^bc]/',strip(code))=0));  /* Only read observation with nothing but b/c/both */
    if (findc(code,'c') AND findc(code,'b')); /* Only keep observations with both b&c */
run;

Tom
Super User Tom
Super User

Contains both B and C

indexc(string,'b') and indexc(string,'c')

But doesn't have something other than b and c.

and not verify(string,'b c')

Try it.

data test;
  input string $10. ;
  ok = indexc(string,'b') and indexc(string,'c') and not verify(string,' bc') ;
  put (_all_) (=);
cards;
abccccccccccc
aaaaaaaaaaaa
bbbcccccccccc
aaaaacccccccc
bbbbbbbbbbbb
bbbbbbbaaaaa
cccccbbbbbbbc
abc
bbbcc
bc
.
;
Dancer_on_data
Obsidian | Level 7

Thank you so much for all the ideas, which really helped stimulate my thinking, I built upon some of your ideas, and arriived at the following codes for the task.

 

data test;
  input Obs Code $20. ;
cards;
1 abccccccccccc
2 aaaaaaaaaaaa
3 bbbcccccccccc
4 aaaaacccccccc
5 bbbbbbbbbbbb
6 bbbbbbbbaaaaa
7 cccccbbbbbbbc
;
run;

data final;
set test;
if verify(strip(code),"b")>0 and
verify(strip(code),"c")>0 and
verify(strip(code),"bc")=0;
run;

 

 

Once again, thanks so much for all your help on this task.Smiley Happy

 

Best,

PGStats
Opal | Level 21
data have;
input Observation Code :$20.;
datalines;
1                  abccccccccccc
2                  aaaaaaaaaaaa
3                  bbbcccccccccc
4                  aaaaacccccccc
5                  bbbbbbbbbbbb
6                  bbbbbbbaaaaa
7                  cccccbbbbbbbc
;

data want;
check = "bc";
set have;
ok = verify(check, trim(code))=0 and verify(trim(code), check)=0;
run;

proc print; run;
PG
Yavuz
Quartz | Level 8
I am reading all creative solutions admiringly.
Thanks a lot for all.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1255 views
  • 9 likes
  • 9 in conversation