Hello everyone,
I have a dataset that looks pretty much like this:
data have;
infile datalines dsd;
input word $ article ;
datalines;
word1, 123456
word2, 123456
word3, 123456
word4, 123456
word5, 123456
word1, 234567
word2, 234567
word3, 234567
word5, 234567
word1, 345678
word3, 345678
word6, 345678
;Now, I want to know which word appears with another word in how many articles. For example word1 and word2 appear 2 times together, word1 and word3 3 times.
word1 word2 word3 word4 word5 word6
word1 3 2 3 1 2 1
word2 2 2 2 1 2 0
word3 3 3 3 1 2 1
The output I want to generate should look like this or similar, you get the idea 🙂
I have tried this for a long time now, but everything I managed just resulted in error messages.
Is there some easy way to do it? Or do you have an idea how I could approach the problem?
Thanks a lot in advance.
If you want symmetry Matrix :
data have;
infile datalines dsd;
input word $ article ;
datalines;
word1, 123456
word2, 123456
word3, 123456
word4, 123456
word5, 123456
word1, 234567
word2, 234567
word3, 234567
word5, 234567
word1, 345678
word3, 345678
word6, 345678
;
run;
data temp;
array x{9999} $ 50 _temporary_;
length a b $ 50;
do i=1 by 1 until(last.article);
set have;
by article;
x{i}=word;
end;
do m=1 to i-1;
do n=m+1 to i;
a=x{m};b=x{n};output;
a=x{n};b=x{m};output;
end;
end;
keep a b;
run;
proc sql;
create table levels as
select a.word as a length=50,b.word as b length=50
from
(select distinct word from have) as a,
(select distinct word from have) as b
order by 1,2;
quit;
proc tabulate data=temp classdata=levels;
class a b;
table a='',b=''*n=''/misstext='0';
run;
If you want symmetry Matrix :
data have;
infile datalines dsd;
input word $ article ;
datalines;
word1, 123456
word2, 123456
word3, 123456
word4, 123456
word5, 123456
word1, 234567
word2, 234567
word3, 234567
word5, 234567
word1, 345678
word3, 345678
word6, 345678
;
run;
data temp;
array x{9999} $ 50 _temporary_;
length a b $ 50;
do i=1 by 1 until(last.article);
set have;
by article;
x{i}=word;
end;
do m=1 to i-1;
do n=m+1 to i;
a=x{m};b=x{n};output;
a=x{n};b=x{m};output;
end;
end;
keep a b;
run;
proc sql;
create table levels as
select a.word as a length=50,b.word as b length=50
from
(select distinct word from have) as a,
(select distinct word from have) as b
order by 1,2;
quit;
proc tabulate data=temp classdata=levels;
class a b;
table a='',b=''*n=''/misstext='0';
run;
Hi,
Odd request. Not sure what you want to do when words are missing. This bit of code gets you the combinations of words from the list, maybe elaborate a bit more on the problem if this doesn't get you there.
data have;
infile datalines dsd;
input word $ article ;
datalines;
word1, 123456
word2, 123456
word3, 123456
word4, 123456
word5, 123456
word1, 234567
word2, 234567
word3, 234567
word5, 234567
word1, 345678
word3, 345678
word6, 345678
;
run;
proc transpose data=have out=t_have;
by article;
var word;
id word;
run;
data want (keep=combination);
set t_have (drop=article _name_);
array word{*} word:;
do i=1 to dim(word)-1;
combination=catx('/',word{i},word{i+1});
output;
end;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.