01-20-2014 11:56 AM
Maybe my question is silly but I have spent several hours trying to solve my problem and it still doesn't work!:smileycry:
I have a list of domains and a list of numbers / numbers-letters combinations. If domain contains any of those numbers or numbers-letters combination, I have to exclude it from the table.
Forbidden combination: "hotnail";
Thus, domains with typo such as e.g. hotnail.co.uk have to be excluded from the table.
If I use
t1.Domain NOT CONTAINS 'hotnail', then it not only excludes "hotnail" domains but also those which contain HOT in them, such as correct "hotmail" ones.
Which operator should I use in order to exclude only the exact phrase? CONTAINS work in a similar way with LIKE which is not suitable for me.
Thank you very much in advance!
01-20-2014 02:08 PM
The CONTAINS operator searches for the string specified, so your expression (t1.Domain NOT CONTAINS 'hotnail') should not be excluding strings containing "hotmail" if it is looking for "hotnail".
You can confirm this simply:
data test ;
input var $7.;
create table want as
select * from test
where var NOT contains 'hotnail';
01-26-2014 12:44 PM
Fugue, what is "explicit pass-through"? I am not very experienced in SAS programming. The problem is that my code eliminates strings that are not specified in NOT CONTAINS expressions. I don't know how to solve this problem.
01-26-2014 04:21 PM
I suspect that you are making a logic error when trying filter for multiple excluded terms.
If you code (X NE 'Y') or (X NE 'Z') then nothing will be eliminate because for any value of X one of those two conditions is true.
You probably want to code something like this:
NOT (t1.Domain CONTAINS 'hotnail' or t1.Domain CONTAINS 'notmail' or .... )
01-27-2014 02:59 AM
Thank you Tom, I have understood what is the problem. First of all I used AND instead of OR. Then corrected the code as you suggested. I have also found out why does the code eliminate strings that don't have to be eliminated. For example, I want to eliminate all emails which prefix is "it". Like email@example.com. But if I state NOT (t1.Prefix CONTAINS 'it' OR...), it will also eliminate 'white', because white has 'IT' in it. How to handle this?
01-28-2014 03:51 AM
Thank you friends, it is now more clear for me how to combine different functions in the code. Thank you for the help! I don't close this topic for some time, maybe new questions will arise.
01-30-2014 03:22 AM
If I want to eliminate all email prefixes that contain only numbers, is it somehow possible to write it in a code? For example, I have
I separate prefixes and get:
Is it possible to declare in a code that if t1refix = numeric, then exclude? I cannot assign numeric type to the whole variable, because it can also contain nominal values and its type is character. t1.Prefix IN ('1', '2', '3', '4', '5', '6', '7', '8', '9') doesn't work.