Hi,
Sorry, the Subject wasn't clear. So, what I am trying to do is, I have a variable with values like the following:
ID exampleVar
1 dog
2 cat
3 apple
I want to return a list of IDs if the variable contains any of the following values, ('o', 'g', 'e'). So the result should be 1 and 3. I thought of verify function and prxmatch, but I couldn't quite come up with a solution with them. Is manual search the only solution?
Thanks
I think that you may want to look at the KINDEXC or KINDEXB (if using a double-byte character set).
data example; x='This value contains an é in the middle.'; y=kindexc(x,'àäèéëïijöü'); run;
Kindexc searches the first parameter, the source, for any of the characters in the second and returns the first position that one of them is found
OR if the next step is replace those letters with something more English looking skip straight to the Translate function:
data example2; x='à ä è é ë ï ij ö ü'; y=translate(x,'aAeEeiIou','àäèéëïijöü'); run;
Translate uses the characters in position order of the second parameter to replace those found in the third position. Note: be very careful to align them. I used changing case for the repetitive ones just as an example.
data want;
set have;
if find(examplevar,'o') or find(examplevar,'g') or find(examplevar,'e') then output;
run;
I'm going to guess this doesn't work on your real problem, even though it does work on your made-up problem. So, please tell us the real problem you are trying to solve.
I think that you may want to look at the KINDEXC or KINDEXB (if using a double-byte character set).
data example; x='This value contains an é in the middle.'; y=kindexc(x,'àäèéëïijöü'); run;
Kindexc searches the first parameter, the source, for any of the characters in the second and returns the first position that one of them is found
OR if the next step is replace those letters with something more English looking skip straight to the Translate function:
data example2; x='à ä è é ë ï ij ö ü'; y=translate(x,'aAeEeiIou','àäèéëïijöü'); run;
Translate uses the characters in position order of the second parameter to replace those found in the third position. Note: be very careful to align them. I used changing case for the repetitive ones just as an example.
@cosmid wrote:
Thanks for the find function. To answer your question, the real problem is:
There is this variable with foreign characters, something like àäèéëïijöü The objective is to list out all of the observations with at least one of these characters. However, the actual list is about 30+ characters, so I can write 30+ find () or. As I am writing this reply, I just realized I may be able to use the verify function and just pick out observations with normal alphanumerics.
Always helpful to start by telling us the real problem, rather than a fake problem that is different from the real problem in important ways
VERIFY does the opposite test from what you describe. It is used to find the first occurrence of character that is NOT in the list. Instead you want the INDEXC() function (or if the case does not matter perhaps the FINDC() function.
where indexc(examplevar,'dog');
But in your later comments you mention non-ASCII characters. If you are not using a single byte encoding then you will need to use Kindexc() instead. That is because INDEXC() will treat each byte of the multiple byte characters as something to find and you will get a lot of false positive results.
/*The best choice is using Perl Regular Expression*/
data have;
input ID exampleVar $;
cards;
1 dog
2 cat
3 apple
;
option noquotelenmax;
data want;
set have;
if prxmatch('/o|g|e/i',exampleVar);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.