BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
cosmid
Lapis Lazuli | Level 10

Hi,

 

Sorry, the Subject wasn't clear. So, what I am trying to do is, I have a variable with values like the following:

ID     exampleVar

1       dog

2      cat

3      apple

 

I want to return a list of IDs if the variable contains any of the following values, ('o', 'g', 'e'). So the result should be 1 and 3. I thought of verify function and prxmatch, but I couldn't quite come up with a solution with them. Is manual search the only solution?

 

Thanks

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

I think that you may want to look at the KINDEXC or KINDEXB (if using a double-byte character set).

data example;
   x='This value contains an é in the middle.';
   y=kindexc(x,'àäèéëïijöü');
run;

Kindexc searches the first parameter, the source, for any of the characters in the second and returns the first position that one of them is found

 

OR if the next step is replace those letters with something more English looking skip straight to the Translate function:

data example2;
   x='à ä è é ë ï ij ö ü';
   y=translate(x,'aAeEeiIou','àäèéëïijöü');
run;

Translate uses the characters in position order of the second parameter to replace those found in the third position. Note: be very careful to align them. I used changing case for the repetitive ones just as an example.

 

 

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26
data want;
    set have;
    if find(examplevar,'o') or find(examplevar,'g') or find(examplevar,'e') then output;
run;

I'm going to guess this doesn't work on your real problem, even though it does work on your made-up problem. So, please tell us the real problem you are trying to solve.

--
Paige Miller
cosmid
Lapis Lazuli | Level 10
Thanks for the find function. To answer your question, the real problem is:
There is this variable with foreign characters, something like àäèéëïijöü The objective is to list out all of the observations with at least one of these characters. However, the actual list is about 30+ characters, so I can write 30+ find () or. As I am writing this reply, I just realized I may be able to use the verify function and just pick out observations with normal alphanumerics.
ballardw
Super User

I think that you may want to look at the KINDEXC or KINDEXB (if using a double-byte character set).

data example;
   x='This value contains an é in the middle.';
   y=kindexc(x,'àäèéëïijöü');
run;

Kindexc searches the first parameter, the source, for any of the characters in the second and returns the first position that one of them is found

 

OR if the next step is replace those letters with something more English looking skip straight to the Translate function:

data example2;
   x='à ä è é ë ï ij ö ü';
   y=translate(x,'aAeEeiIou','àäèéëïijöü');
run;

Translate uses the characters in position order of the second parameter to replace those found in the third position. Note: be very careful to align them. I used changing case for the repetitive ones just as an example.

 

 

PaigeMiller
Diamond | Level 26

@cosmid wrote:
Thanks for the find function. To answer your question, the real problem is:
There is this variable with foreign characters, something like àäèéëïijöü The objective is to list out all of the observations with at least one of these characters. However, the actual list is about 30+ characters, so I can write 30+ find () or. As I am writing this reply, I just realized I may be able to use the verify function and just pick out observations with normal alphanumerics.

Always helpful to start by telling us the real problem, rather than a fake problem that is different from the real problem in important ways

--
Paige Miller
Tom
Super User Tom
Super User

VERIFY does the opposite test from what you describe.  It is used to find the first occurrence of character that is NOT in the list.  Instead you want the INDEXC() function (or if the case does not matter perhaps the FINDC() function.

where indexc(examplevar,'dog');

But in your later comments you mention non-ASCII characters.  If you are not using a single byte encoding then you will need to use Kindexc() instead.  That is because INDEXC() will treat each byte of the multiple byte characters as something to find and you will get a lot of false positive results.

 

Ksharp
Super User
/*The best choice is using Perl Regular Expression*/
data have;
input ID     exampleVar $;
cards;
1       dog
2      cat
3      apple
;
option noquotelenmax;
data want;
 set have;
 if prxmatch('/o|g|e/i',exampleVar);
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2049 views
  • 5 likes
  • 5 in conversation