Hello everyone,
I have a dataset that looks pretty much like this:
data have;
infile datalines dsd;
input word $ article ;
datalines;
word1, 123456
word2, 123456
word3, 123456
word4, 123456
word5, 123456
word1, 234567
word2, 234567
word3, 234567
word5, 234567
word1, 345678
word3, 345678
word6, 345678
;
Now, I want to know which word appears with another word in how many articles. For example word1 and word2 appear 2 times together, word1 and word3 3 times.
word1 word2 word3 word4 word5 word6
word1 3 2 3 1 2 1
word2 2 2 2 1 2 0
word3 3 3 3 1 2 1
The output I want to generate should look like this or similar, you get the idea 🙂
I have tried this for a long time now, but everything I managed just resulted in error messages.
Is there some easy way to do it? Or do you have an idea how I could approach the problem?
Thanks a lot in advance.
If you want symmetry Matrix :
data have;
infile datalines dsd;
input word $ article ;
datalines;
word1, 123456
word2, 123456
word3, 123456
word4, 123456
word5, 123456
word1, 234567
word2, 234567
word3, 234567
word5, 234567
word1, 345678
word3, 345678
word6, 345678
;
run;
data temp;
array x{9999} $ 50 _temporary_;
length a b $ 50;
do i=1 by 1 until(last.article);
set have;
by article;
x{i}=word;
end;
do m=1 to i-1;
do n=m+1 to i;
a=x{m};b=x{n};output;
a=x{n};b=x{m};output;
end;
end;
keep a b;
run;
proc sql;
create table levels as
select a.word as a length=50,b.word as b length=50
from
(select distinct word from have) as a,
(select distinct word from have) as b
order by 1,2;
quit;
proc tabulate data=temp classdata=levels;
class a b;
table a='',b=''*n=''/misstext='0';
run;
If you want symmetry Matrix :
data have;
infile datalines dsd;
input word $ article ;
datalines;
word1, 123456
word2, 123456
word3, 123456
word4, 123456
word5, 123456
word1, 234567
word2, 234567
word3, 234567
word5, 234567
word1, 345678
word3, 345678
word6, 345678
;
run;
data temp;
array x{9999} $ 50 _temporary_;
length a b $ 50;
do i=1 by 1 until(last.article);
set have;
by article;
x{i}=word;
end;
do m=1 to i-1;
do n=m+1 to i;
a=x{m};b=x{n};output;
a=x{n};b=x{m};output;
end;
end;
keep a b;
run;
proc sql;
create table levels as
select a.word as a length=50,b.word as b length=50
from
(select distinct word from have) as a,
(select distinct word from have) as b
order by 1,2;
quit;
proc tabulate data=temp classdata=levels;
class a b;
table a='',b=''*n=''/misstext='0';
run;
Hi,
Odd request. Not sure what you want to do when words are missing. This bit of code gets you the combinations of words from the list, maybe elaborate a bit more on the problem if this doesn't get you there.
data have; infile datalines dsd; input word $ article ; datalines; word1, 123456 word2, 123456 word3, 123456 word4, 123456 word5, 123456 word1, 234567 word2, 234567 word3, 234567 word5, 234567 word1, 345678 word3, 345678 word6, 345678 ; run; proc transpose data=have out=t_have; by article; var word; id word; run; data want (keep=combination); set t_have (drop=article _name_); array word{*} word:; do i=1 to dim(word)-1; combination=catx('/',word{i},word{i+1}); output; end; run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.