How to delete rows based on different fields using contains or like op...

buddha_d · Posted 05-04-2020 11:53 PM

Dear Community,

I have a messy real time data and that almost looks like this data. I am trying to delete rows where caller (name) is distributed among 4 fields on the same row (caller_info1 - caller_info4). If this caller name is there in between concatenated 4 fields then I want to delete those rows.

(eg: TULSA REAL ESTATES HOLDING COMPANY LLC is distributed with additional information in caller_info1 - caller_info4 fields).

I am proving a program and I listed the output as well .

data customer;
INFILE CARDS dsd DeLiMiter= '  ' missover ;
input caller & $38. caller_info1 & $18. caller_info2 & $10. caller_info3 & $10. caller_info4 & $20. ; 
cards;
TULSA REAL ESTATES HOLDING COMPANY LLC  FOR:TULSA REAL EST  ATES HOLDI  NG COMPANY  LLC
Siva T Brown  For further credit  SIVA BROWN  1110  1234
*Christopher F Lindner 2010 special Tr  RE:Christopher F L  indner 201  0 special   Trust dtd 11-4-02
SILVER LINING PARTNERS LLC  FOR FINAL CREDIT  FOR SILVER   LINING   PARTNERS LLC
RONNIE B MILLER  MEMO: RON MILLER 
Warren K Buffet  Reference #1234
;
run;

/*
I want to see the output window which contains only this row

Warren K Buffet  Reference #1234
*/

I used compress function with 'ka' option and other stuff. But, my attempts were futile. Any solutions or thoughts to solve this issue please.

Thanks

Shmuel · Posted 05-05-2020 12:55 AM

Use option firstobs=n to skip n lines as in:

data customer;
 set customer(firstobs=6);
run;

or if you want only this line use where to select required lines:

data customer;
 set customer;
    where index(name,"Warren") > 0; 
run;

buddha_d · Posted 05-05-2020 05:20 AM

Shmuel, I am not just trying to go to line 6 and get the result. I have more data and I am trying to filter it out by matching word pattern. Thanks for your response.

SASKiwi · Posted 05-05-2020 03:01 AM

I don't think this is easy. However, first off I would read each data line into one variable so you have a single text string. By reading a line with multiple variables you are making assumptions about where word boundaries are, so it is best not to create a problem like this.

Then I would try to write a rule in plain English like: find the first word, delimited by blanks, in the string and then search for the same pattern of characters in the rest of the string. This rule works for data lines 1,2,4 and 6, if you ignore case, but not 3 (*Christopher doesn't match with Christopher) and 5 (RONNIE doesn't match with RON).

If you add - * - as a word delimiter to the rule then that will fix line 3. To match line 5 you would have to compare the third word in the string (MILLER) and see if the same character pattern exists in the rest of the string.

I'm guessing that there are many examples in your data like this that you haven't posted that will cause even more problems trying to match. If that is the case then it is going to be extremely difficult to cater for all possible scenarios and you may have to live with a solution that works less than 100 percent of the time.

buddha_d · Posted 05-05-2020 05:19 AM

SASKiwi, Thanks for the reply. Even if it works for 1,2,4 and 6 that would help me a lot. Like I said, I am trying to delete most of the matching data. I read your comments, but didn't get what you are trying to convey ("I don't think this is easy. However, first off I would read each data line into one variable so you have a single text string. By reading a line with multiple variables you are making assumptions about where word boundaries are, so it is best not to create a problem like this.

Then I would try to write a rule in plain English like: find the first word, delimited by blanks, in the string and then search for the same pattern of characters in the rest of the string"). Could you help me here.

Thanks !!

ballardw · Posted 05-05-2020 10:42 AM

If possible I might go back to the source of this and ask if it can be prepared in a more consistent form such as a CSV file.

Or if there is a document that describes what the content of this file should look like such as start/end columns for values, lengths of variables or something.

IF there is a specific value (key word such as "Reference #") that occurs in the data you might be able to search for that such as

data customer;
INFILE CARDS dsd DeLiMiter= '  ' missover ;
input string $ 1-100; 
if index(string,"Reference #");
cards;
TULSA REAL ESTATES HOLDING COMPANY LLC  FOR:TULSA REAL EST  ATES HOLDI  NG COMPANY  LLC
Siva T Brown  For further credit  SIVA BROWN  1110  1234
*Christopher F Lindner 2010 special Tr  RE:Christopher F L  indner 201  0 special   Trust dtd 11-4-02
SILVER LINING PARTNERS LLC  FOR FINAL CREDIT  FOR SILVER   LINING   PARTNERS LLC
RONNIE B MILLER  MEMO: RON MILLER 
Warren K Buffet  Reference #1234
;
run;

How to delete rows based on different fields using contains or like operators to a single field

Re: How to delete rows based on different fields using contains or like operators to a single field

Re: How to delete rows based on different fields using contains or like operators to a single field

Re: How to delete rows based on different fields using contains or like operators to a single field

Re: How to delete rows based on different fields using contains or like operators to a single field

Re: How to delete rows based on different fields using contains or like operators to a single field

How to delete rows based on different fields using contains or like operators to a single field

Re: How to delete rows based on different fields using contains or like operators to a single field

Re: How to delete rows based on different fields using contains or like operators to a single field

Re: How to delete rows based on different fields using contains or like operators to a single field

Re: How to delete rows based on different fields using contains or like operators to a single field

Re: How to delete rows based on different fields using contains or like operators to a single field

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!