I have a dataset like
words | c1 | c2 | c3 | c4 |
ally | 21 | 32 | 45 | 65 |
ally traitor | 56 | 76 | 45 | 67 |
barriar overcome | 87 | 98 | 21 | 76 |
barriar overcome instruction | 32 | 42 | 12 | 32 |
cat | 67 | 36 | 43 | 85 |
cat pat | 18 | 92 | 18 | 27 |
cat pa | 82 | 54 | 21 | 34 |
I want to group these data by words if they have similar words and calculate their sum of c1 c2 c3 c4 seperately.
For example, I want 'the first level' group calculation if the first word in the first column are similar. that is to say, something like:
first level | c1 | c2 | c3 | c4 |
ally | 77 | 108 | 90 | 132 |
barriar | 119 | 140 | 33 | 108 |
cat | 167 | 182 | 82 | 146 |
How do I do this? I come up with some ideas about 'like' operator, but there are thousands of words in the first column, I don't know whether there is some loop that I can do with this process, can some one help me out?
Hi hx,
I have a solution that fits the data you have presented, however it is dependent on you wanting to group your data based on a match of the first word in the "words" variable (you could edit the wordmtch variable to include the first two words, or write a macro to change this for you, if you also require that) and I'm sure there is a better way to go about doing this, but maybe it will help you.
data have;
input words $ 1-28 c1 c2 c3 c4;
wordmtch = scan(words,1);
cards;
ally 21 32 45 65
ally traitor 56 76 45 67
barriar overcome 87 98 21 76
barriar overcome instruction 32 42 12 32
cat 67 36 43 85
cat pat 18 92 18 27
cat pa 82 54 21 34
;
run;
data want;
set have;
by wordmtch;
retain c1tot c2tot c3tot c4tot;
if first.wordmtch then do
c1tot = c1;
c2tot = c2;
c3tot = c3;
c4tot = c4;
end;
else do;
c1tot = c1tot+c1;
c2tot = c2tot+c2;
c3tot = c3tot+c3;
c4tot = c4tot+c4;
end;
if last.wordmtch then output;
drop c1 c2 c3 c4 words;
rename c1tot = c1 c2tot = c2 c3tot = c3 c4tot = c4 wordmtch = words;
run;
Hope this helps,
Lee
Hi hx,
I have a solution that fits the data you have presented, however it is dependent on you wanting to group your data based on a match of the first word in the "words" variable (you could edit the wordmtch variable to include the first two words, or write a macro to change this for you, if you also require that) and I'm sure there is a better way to go about doing this, but maybe it will help you.
data have;
input words $ 1-28 c1 c2 c3 c4;
wordmtch = scan(words,1);
cards;
ally 21 32 45 65
ally traitor 56 76 45 67
barriar overcome 87 98 21 76
barriar overcome instruction 32 42 12 32
cat 67 36 43 85
cat pat 18 92 18 27
cat pa 82 54 21 34
;
run;
data want;
set have;
by wordmtch;
retain c1tot c2tot c3tot c4tot;
if first.wordmtch then do
c1tot = c1;
c2tot = c2;
c3tot = c3;
c4tot = c4;
end;
else do;
c1tot = c1tot+c1;
c2tot = c2tot+c2;
c3tot = c3tot+c3;
c4tot = c4tot+c4;
end;
if last.wordmtch then output;
drop c1 c2 c3 c4 words;
rename c1tot = c1 c2tot = c2 c3tot = c3 c4tot = c4 wordmtch = words;
run;
Hope this helps,
Lee
Thank you Lee!
You gave me a great idea on how to dealing with this issue and it worked. I will try some macro or adjustment based on your code to fit my original data.
Using PROC SUMMARY will also work.
data have;
input words $ 1-28 c1 c2 c3 c4;
wordmtch = scan(words,1);
cards;
ally 21 32 45 65
ally traitor 56 76 45 67
barriar overcome 87 98 21 76
barriar overcome instruction 32 42 12 32
cat 67 36 43 85
cat pat 18 92 18 27
cat pa 82 54 21 34
;
run;
proc sort;
by wordmtch;
proc summary data=have;
by wordmtch;
var c1 c2 c3 c4;
output out=have2
sum=;
run;
proc print data=have2;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.