Hi Everyone!
In my below example, I would like to search the ID where the clob variable "Inform" contain only the text as "bypoll".
If the Id contain strings as both "bypoll" and "poll" , then those id should be excluded.
I am not getting the output.
data medi_history;
input pid inform : & $200.;
cards;
201 Tunisians went to the bypoll on Sunday to pick their first directly elected president
202 Apple unlike itself revealed the pricing details of iPad Ai 2 and iPad Mini 3 in bypoll and poll election
203 Lewis Hamilton clinched his second bypolldrivers world championship in poll section.
204 Defending leopards in human populated areas as not necessarily or bypoll
;
run;
proc sql;
create table a1 as
select pid, compress(inform) as inform
from medi_history;
quit;
proc sql;
create table a2 as
select * from a1
where inform like '%bypoll%'
and inform not like '%poll%'
;
quit;
Advance thanks for
Which of those lines do you want output?
Thank you Arthur.
I want the output as below.
201 Tunisians went to the bypoll on Sunday to pick their first directly elected president
204 Defending leopards in human populated areas as not necessarily or bypoll
The remaining two Id contain strings as bypoll and poll. So i don't want those output.
I can't test this at the moment, but wouldn't the following do what you want?
proc sql;
create table want as
select pid, compress(inform) as inform
from medi_history
having findw(inform,'bypoll') gt 0 and
findw(inform,'poll') le 0
;
quit;
A data step approach ( can also use index/find function family):
data medi_history;
input pid inform : & $200.;
cards;
201 Tunisians went to the bypoll on Sunday to pick their first directly elected president
202 Apple unlike itself revealed the pricing details of iPad Ai 2 and iPad Mini 3 in bypoll and poll election
203 Lewis Hamilton clinched his second bypolldrivers world championship in poll section.
204 Defending leopards in human populated areas as not necessarily or bypoll
;
run;
data want;
set medi_history;
if prxmatch('/\bbypoll\b/i', inform) then if not prxmatch('/\bpoll\b/i', inform) then
output;
run;
And, of course, if you prefer prx expressions like suggested you could also use them in proc sql:
proc sql;
create table want as
select pid, compress(_inform) as inform
from medi_history (rename=(inform=_inform))
having prxmatch('/\bbypoll\b/i', _inform) and
not prxmatch('/\bpoll\b/i', _inform)
;
quit;
Art,
I am puzzled by your decision of using 'having' instead of 'where'. It seems to me that the former is most likely to be seen when there is a summary call.
BTW, do you have a 'Turkey Day' in Canada?
: You should be puzzled. I've never really given it much thought since they function the same (mostly) in this situation. However, since where would process before the table is read, the rename option wouldn't be necessary. In short, "better" code would have been:
proc sql;
create table want as
select pid, compress(inform) as inform
from medi_history
where prxmatch('/\bbypoll\b/i', inform) and
not prxmatch('/\bpoll\b/i', inform)
;
quit;
:smileylaugh: Nice, Art! In my book, heaven is where we celebrate every holiday on this planet, and you are 1 step closer!
Because your logic is incorrect.
inform not like '%poll%'
will NOT match any line that has bypoll in it since poll is just a subset of bypoll.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.