DATA Step, Macro, Functions and more

Operator Contain not working while searching text in CLOB column containing special character

Accepted Solution Solved
Reply
Contributor
Posts: 63
Accepted Solution

Operator Contain not working while searching text in CLOB column containing special character

Hi,

From the below dataset, I want to search all text from table "event" in 'paragraph' column of table Master.

As the event table in my project contain around 10000 adv_event which need to be search against 5 crores CLOB data, I am not getting the data on time.

Is there any alternative without using below approach so that i can get the output without using Hash table.

 

Data Master;
input pid paragraph : & $500.;
cards;
101 patient suffereing with "celc disease" (coeliac disease). Continued drug therapy started Osteochondrosis.
102 Treatement discontinue after 'Asthma)
103 no single intervention strategy can improve the adherence of all patient
104 Mutual collaboration fosters greater patient satisfaction with Osteocndrosis
105 "Osteochondrosis in patient
107 'Adrenal insufficiency' may happen
;
run;

data event;
input ADV_EVENT : & $20.;
cards;
asthma
Adrenal insufficiency
coeliac disease
Osteochondrosis
;
run;

proc sql;
create table final_data as
select pid, adv_event from master as a,event as b
where upper(compress(paragraph)) contains compress(upper(adv_event));
quit;

 

 

 


Accepted Solutions
Solution
‎05-25-2017 06:07 AM
Super User
Super User
Posts: 7,942

Re: Operator Contain not working while searching text in CLOB column containing special character

Call execute() needs a string, this string is put into the compiler after the datastep finishes.  The concatenates are buidling a valid SAS statement:

if <condition> then <result>;

with specific logic.  So in this case the condition is, if adv_event found in paragraph (index() is descried in the docs), then set result to 1.

If you look at the log of what is generated you will see this.

View solution in original post


All Replies
Super User
Super User
Posts: 7,942

Re: Operator Contain not working while searching text in CLOB column containing special character

If you want to avoid merging, which is where the resource is being drained, then try this.  It takes your find dataset, and uses that to create a datastep which executes after the data _null_ and has one if statement for each adv_event to find.  Try it out and see what is genreated in the log.  This way you just have one base datastep.

data master;
  input pid paragraph : & $500.;
cards;
101 patient suffereing with "celc disease" (coeliac disease). Continued drug therapy started Osteochondrosis.
102 Treatement discontinue after 'Asthma)
103 no single intervention strategy can improve the adherence of all patient
104 Mutual collaboration fosters greater patient satisfaction with Osteocndrosis
105 "Osteochondrosis in patient
107 'Adrenal insufficiency' may happen
;
run;
data event;
  input adv_event : & $20.;
cards;
asthma
Adrenal insufficiency
coeliac disease
Osteochondrosis
;
run;

data _null_;
  set event end=last;
  if _n_=1 then call execute('data want; set master;');
  call execute('if index(paragraph,"'||strip(adv_event)||'") > 0 then result=1;');
  if last then call execute('run;');
run;

I have just set result to 1 if found, but you could filter, or do something else, and use perl if index() doesn't fulfil your needs.

Contributor
Posts: 63

Re: Operator Contain not working while searching text in CLOB column containing special character

Thank you RW for you quick response.

 

I am not able to understand the below line. Could you please explain me. How index work here and few concatenation you have added.

 

 call execute('if index(paragraph,"'||strip(adv_event)||'") > 0 then result=1;');

 Thanks in advance

Solution
‎05-25-2017 06:07 AM
Super User
Super User
Posts: 7,942

Re: Operator Contain not working while searching text in CLOB column containing special character

Call execute() needs a string, this string is put into the compiler after the datastep finishes.  The concatenates are buidling a valid SAS statement:

if <condition> then <result>;

with specific logic.  So in this case the condition is, if adv_event found in paragraph (index() is descried in the docs), then set result to 1.

If you look at the log of what is generated you will see this.

Respected Advisor
Posts: 4,173

Re: Operator Contain not working while searching text in CLOB column containing special character

[ Edited ]

@Abraham

And just as a small amendment to @RW9s code adding an ELSE to the generated code and using find() with option "i" to make the search case insensitive.

data _null_;
  set event end=last;
  if _n_=1 then 
    do;
      call execute('data want; set master;');
      call execute('if find(paragraph,"'||strip(adv_event)||'","i") > 0 then result=1;');
    end;
  else  
    do;
      call execute('else if find(paragraph,"'||strip(adv_event)||'","i") > 0 then result=1;');
    end;
  if last then call execute('run;');
run;

This generates then the following code

NOTE: CALL EXECUTE generated line.
1         + data want; set master;
2         + if find(paragraph,"asthma","i") > 0 then result=1;
3         + else if find(paragraph,"Adrenal insufficienc","i") > 0 then result=1;
4         + else if find(paragraph,"coeliac disease","i") > 0 then result=1;
5         + else if find(paragraph,"Osteochondrosis","i") > 0 then result=1;
6         + run;

 

An alternative approach would be a Regular Expression.

data master;
  input pid paragraph : & $500.;
cards;
101 patient suffereing with "celc disease" (coeliac disease). Continued drug therapy started Osteochondrosis.
102 Treatement discontinue after 'Asthma)
103 no single intervention strategy can improve the adherence of all patient
104 Mutual collaboration fosters greater patient satisfaction with Osteocndrosis
105 "Osteochondrosis in patient
107 'Adrenal insufficiency' may happen
;
run;
data event;
  input adv_event : & $20.;
cards;
asthma
Adrenal insufficiency
coeliac disease
Osteochondrosis
;
run;

proc sql noprint;
  select adv_event into :regex separated by '|'
  from event;
quit;

data want;
  set master;
  find_pos=prxmatch("/\b&regex\b/i",paragraph);
  find_flg= (prxmatch("/\b&regex\b/i",paragraph)>0);
run;
Contributor
Posts: 63

Re: Operator Contain not working while searching text in CLOB column containing special character

Thank you RW and Patrick for the solution.

I could learn a new trick in regular expression. Thanks Patrick once again

 

Contributor
Posts: 63

Re: Operator Contain not working while searching text in CLOB column containing special character

Hi Patrick,
When I use the Prxmatch function, I got below error
ERROR: The text expression length (65538) exceeds maximum length (65534). The text expression has been truncated to
65534 characters.

The event table contain >7000 adv_events.
Respected Advisor
Posts: 4,173

Re: Operator Contain not working while searching text in CLOB column containing special character

@Abraham

I didn't expect such volumes....

That means also for the If..Then..Else approach that you're tendentially going to scan the "Clob" many times before you get a hit.

 

You could split up your Event data into multiple RegEx. Not sure which one of the two approaches would perform better. I believe it's "if..then..else".

 

If you've got any idea which events should happen more often (i.e. you've got some additonal distribution info) then make sure to pre-sort the event table accordingly from most common to most rare events.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 175 views
  • 3 likes
  • 3 in conversation