Hello,
Maybe my question is silly but I have spent several hours trying to solve my problem and it still doesn't work!:smileycry:
I have a list of domains and a list of numbers / numbers-letters combinations. If domain contains any of those numbers or numbers-letters combination, I have to exclude it from the table.
Example:
Forbidden combination: "hotnail";
Thus, domains with typo such as e.g. hotnail.co.uk have to be excluded from the table.
If I use
t1.Domain NOT CONTAINS 'hotnail', then it not only excludes "hotnail" domains but also those which contain HOT in them, such as correct "hotmail" ones.
Which operator should I use in order to exclude only the exact phrase? CONTAINS work in a similar way with LIKE which is not suitable for me.
Thank you very much in advance!
NOT (t1.Domain CONTAINS 'hotnail' or t1.Domain CONTAINS 'notmail' or substr(t1.Domain,1,2) = 'it' )
index or find functions?
The CONTAINS operator searches for the string specified, so your expression (t1.Domain NOT CONTAINS 'hotnail') should not be excluding strings containing "hotmail" if it is looking for "hotnail".
You can confirm this simply:
data test ;
input var $7.;
cards;
hotmail
hotnail
hot
hotmall
hotmaii
hotness
;
run;
proc sql;
create table want as
select * from test
where var NOT contains 'hotnail';
;
quit;
Are you using explicit pass-through?
Fugue, what is "explicit pass-through"? I am not very experienced in SAS programming. The problem is that my code eliminates strings that are not specified in NOT CONTAINS expressions. I don't know how to solve this problem.
The reason I asked about explicit pass-through is because the CONTAINS operator is SAS specific. It will not work in explicit pass-through.
I suspect that you are making a logic error when trying filter for multiple excluded terms.
If you code (X NE 'Y') or (X NE 'Z') then nothing will be eliminate because for any value of X one of those two conditions is true.
You probably want to code something like this:
NOT (t1.Domain CONTAINS 'hotnail' or t1.Domain CONTAINS 'notmail' or .... )
Thank you Tom, I have understood what is the problem. First of all I used AND instead of OR. Then corrected the code as you suggested. I have also found out why does the code eliminate strings that don't have to be eliminated. For example, I want to eliminate all emails which prefix is "it". Like it@hotmail.com. But if I state NOT (t1.Prefix CONTAINS 'it' OR...), it will also eliminate 'white', because white has 'IT' in it. How to handle this?
NOT (t1.Domain CONTAINS 'hotnail' or t1.Domain CONTAINS 'notmail' or substr(t1.Domain,1,2) = 'it' )
Thank you friends, it is now more clear for me how to combine different functions in the code. Thank you for the help! I don't close this topic for some time, maybe new questions will arise.
If I want to eliminate all email prefixes that contain only numbers, is it somehow possible to write it in a code? For example, I have
I separate prefixes and get:
Prefix
12345
34958409
Is it possible to declare in a code that if t1:Prefix = numeric, then exclude? I cannot assign numeric type to the whole variable, because it can also contain nominal values and its type is character. t1.Prefix IN ('1', '2', '3', '4', '5', '6', '7', '8', '9') doesn't work.
Thanks.
Have you taken a look at the regular expression functions?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.