Hey guys,
I have two tables - one with list of names and another one with some strings like this:
Tom has a cat
John has a dog
Anna has a parrot
Frank has a dog
Anna has a cow
John has a tiger
Tom AB has a tiger
Tom A41 has a lion
Tom B12 has a dolphin
Tom 2 has a horse
Tom-32 has a dog
And I want to count occurrence of each name (that exists in the table with list of names) in table with strings, however it shouldn't count name "Tom" when there is a string "AB"
or any number or "Ax" or "Bx" (where "x" is a number) after the name
So the result should be:
Tom 1
John 2
Frank 1
Anna 2
I know how to count number of names (using for example findw) but I have no idea how to exclude from counting name "Tom" with conditions I've written above.
Any guidance will be greatly appreciated.
Hi,
maybe this will help you, I added some "special" cases for testing.
Bart
data have;
input string $ 50.;
cards4;
Tom has a cat
John has a dog
Anna has a parrot
Frank has a dog
Anna has a cow
John has a tiger
Tom AB has a tiger
Tom A41 has a lion
Tom B12 has a dolphin
Tom 2 has a horse
Tom-32 has a dog
Bob has a parrot and Cedrick has a parrot
Alex has a cat and Alex has a dog and Alex AB13 has a pig
;;;;
run;
data names;
input name $ 12.;
cards;
Tom
John
Frank
Anna
Bob
Cedrick
Alex
;
run;
data _NULL_;
call symputX("namesNo", nobs);
stop;
set names nobs=nobs;
run;
data _null_;
declare hash H();
H.defineKey('name');
H.defineData('name', 'cnt');
H.defineDone();
do until(eoN);
set names end = eoN;
cnt = 0;
_RC_ = H.add();
end;
do until(eof);
set have end = eof;
_RC_ = 1;
do until(name = "");
name = scan(string, _RC_, "-", "S");
if H.find(key:name) = 0 then
do;
nextWord = scan(string, _RC_+1, "-", "S");
if not(prxmatch('/^(AB|A\d|B\d|\d)/', nextWord)) then
do;
cnt + 1;
H.replace();
end;
end;
_RC_ + 1;
end;
end;
H.output(dataset:"want");
run;
proc print data = want;
run;
Here is a sample.
data namelist;
input name $ 10.;
datalines;
Tom
John
Frank
Anna
;
run;
data have;
input strings $ 40.;
datalines;
Tom has a cat
John has a dog
Anna has a parrot
Frank has a dog
Anna has a cow
John has a tiger
Tom AB has a tiger
Tom A41 has a lion
Tom B12 has a dolphin
Tom 2 has a horse
Tom-32 has a dog
;
run;
data _NULL_;
length id 8 name $10 strings $40 count 8;
call missing(name, strings, count);
declare hash h(suminc:"count", multidata:"y", ordered:"y");
rc=h.definekey("id");
rc=h.definedata("id", "strings", "name", "count");
rc=h.definedone();
declare hash nl(suminc:"count", multidata:"y", ordered:"y");
rc=nl.definekey("name");
rc=nl.definedata("name", "id");
rc=nl.definedone();
do until(aaa);
set namelist end=aaa;
id+1;
nl.add(key:name, data:name, data:id);
end;
do until(eof);
set have end=eof;
count=1;
name=scan(strings, 1, " ");
rc=nl.find();
if rc=0 and prxmatch("s/(Tom)?(A|B)\d+|(AB)-?|( \d)/match/i", strings)=0 then
do;
h.ref();
h.sum(sum:count);
h.replace();
end;
end;
if eof then
h.output(dataset:"work.want(drop=strings id)");
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.