Hello everyone,
I have a dataset that looks pretty much like this:
data have;
infile datalines dsd;
input word $ article ;
datalines;
word1, 123456
word2, 123456
word3, 123456
word4, 123456
word5, 123456
word1, 234567
word2, 234567
word3, 234567
word5, 234567
word1, 345678
word3, 345678
word6, 345678
;
Now, I want to know which word appears with another word in how many articles. For example word1 and word2 appear 2 times together, word1 and word3 3 times.
word1 word2 word3 word4 word5 word6
word1 3 2 3 1 2 1
word2 2 2 2 1 2 0
word3 3 3 3 1 2 1
The output I want to generate should look like this or similar, you get the idea 🙂
I have tried this for a long time now, but everything I managed just resulted in error messages.
Is there some easy way to do it? Or do you have an idea how I could approach the problem?
Thanks a lot in advance.
If you want symmetry Matrix :
data have;
infile datalines dsd;
input word $ article ;
datalines;
word1, 123456
word2, 123456
word3, 123456
word4, 123456
word5, 123456
word1, 234567
word2, 234567
word3, 234567
word5, 234567
word1, 345678
word3, 345678
word6, 345678
;
run;
data temp;
array x{9999} $ 50 _temporary_;
length a b $ 50;
do i=1 by 1 until(last.article);
set have;
by article;
x{i}=word;
end;
do m=1 to i-1;
do n=m+1 to i;
a=x{m};b=x{n};output;
a=x{n};b=x{m};output;
end;
end;
keep a b;
run;
proc sql;
create table levels as
select a.word as a length=50,b.word as b length=50
from
(select distinct word from have) as a,
(select distinct word from have) as b
order by 1,2;
quit;
proc tabulate data=temp classdata=levels;
class a b;
table a='',b=''*n=''/misstext='0';
run;
If you want symmetry Matrix :
data have;
infile datalines dsd;
input word $ article ;
datalines;
word1, 123456
word2, 123456
word3, 123456
word4, 123456
word5, 123456
word1, 234567
word2, 234567
word3, 234567
word5, 234567
word1, 345678
word3, 345678
word6, 345678
;
run;
data temp;
array x{9999} $ 50 _temporary_;
length a b $ 50;
do i=1 by 1 until(last.article);
set have;
by article;
x{i}=word;
end;
do m=1 to i-1;
do n=m+1 to i;
a=x{m};b=x{n};output;
a=x{n};b=x{m};output;
end;
end;
keep a b;
run;
proc sql;
create table levels as
select a.word as a length=50,b.word as b length=50
from
(select distinct word from have) as a,
(select distinct word from have) as b
order by 1,2;
quit;
proc tabulate data=temp classdata=levels;
class a b;
table a='',b=''*n=''/misstext='0';
run;
Hi,
Odd request. Not sure what you want to do when words are missing. This bit of code gets you the combinations of words from the list, maybe elaborate a bit more on the problem if this doesn't get you there.
data have; infile datalines dsd; input word $ article ; datalines; word1, 123456 word2, 123456 word3, 123456 word4, 123456 word5, 123456 word1, 234567 word2, 234567 word3, 234567 word5, 234567 word1, 345678 word3, 345678 word6, 345678 ; run; proc transpose data=have out=t_have; by article; var word; id word; run; data want (keep=combination); set t_have (drop=article _name_); array word{*} word:; do i=1 to dim(word)-1; combination=catx('/',word{i},word{i+1}); output; end; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.