Hey guys,
I have two tables - one with list of names and another one with some strings like this:
Tom has a cat
John has a dog
Anna has a parrot
Frank has a dog
Anna has a cow
John has a tiger
Tom AB has a tiger
Tom A41 has a lion
Tom B12 has a dolphin
Tom 2 has a horse
Tom-32 has a dog
And I want to count occurrence of each name (that exists in the table with list of names) in table with strings, however it shouldn't count name "Tom" when there is a string "AB"
or any number or "Ax" or "Bx" (where "x" is a number) after the name
So the result should be:
Tom 1
John 2
Frank 1
Anna 2
I know how to count number of names (using for example findw) but I have no idea how to exclude from counting name "Tom" with conditions I've written above.
Any guidance will be greatly appreciated.
Hi,
maybe this will help you, I added some "special" cases for testing.
Bart
data have;
input string $ 50.;
cards4;
Tom has a cat
John has a dog
Anna has a parrot
Frank has a dog
Anna has a cow
John has a tiger
Tom AB has a tiger
Tom A41 has a lion
Tom B12 has a dolphin
Tom 2 has a horse
Tom-32 has a dog
Bob has a parrot and Cedrick has a parrot
Alex has a cat and Alex has a dog and Alex AB13 has a pig
;;;;
run;
data names;
input name $ 12.;
cards;
Tom
John
Frank
Anna
Bob
Cedrick
Alex
;
run;
data _NULL_;
call symputX("namesNo", nobs);
stop;
set names nobs=nobs;
run;
data _null_;
declare hash H();
H.defineKey('name');
H.defineData('name', 'cnt');
H.defineDone();
do until(eoN);
set names end = eoN;
cnt = 0;
_RC_ = H.add();
end;
do until(eof);
set have end = eof;
_RC_ = 1;
do until(name = "");
name = scan(string, _RC_, "-", "S");
if H.find(key:name) = 0 then
do;
nextWord = scan(string, _RC_+1, "-", "S");
if not(prxmatch('/^(AB|A\d|B\d|\d)/', nextWord)) then
do;
cnt + 1;
H.replace();
end;
end;
_RC_ + 1;
end;
end;
H.output(dataset:"want");
run;
proc print data = want;
run;
Here is a sample.
data namelist;
input name $ 10.;
datalines;
Tom
John
Frank
Anna
;
run;
data have;
input strings $ 40.;
datalines;
Tom has a cat
John has a dog
Anna has a parrot
Frank has a dog
Anna has a cow
John has a tiger
Tom AB has a tiger
Tom A41 has a lion
Tom B12 has a dolphin
Tom 2 has a horse
Tom-32 has a dog
;
run;
data _NULL_;
length id 8 name $10 strings $40 count 8;
call missing(name, strings, count);
declare hash h(suminc:"count", multidata:"y", ordered:"y");
rc=h.definekey("id");
rc=h.definedata("id", "strings", "name", "count");
rc=h.definedone();
declare hash nl(suminc:"count", multidata:"y", ordered:"y");
rc=nl.definekey("name");
rc=nl.definedata("name", "id");
rc=nl.definedone();
do until(aaa);
set namelist end=aaa;
id+1;
nl.add(key:name, data:name, data:id);
end;
do until(eof);
set have end=eof;
count=1;
name=scan(strings, 1, " ");
rc=nl.find();
if rc=0 and prxmatch("s/(Tom)?(A|B)\d+|(AB)-?|( \d)/match/i", strings)=0 then
do;
h.ref();
h.sum(sum:count);
h.replace();
end;
end;
if eof then
h.output(dataset:"work.want(drop=strings id)");
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.