BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hx
Calcite | Level 5 hx
Calcite | Level 5

I have a dataset like 

wordsc1c2c3c4
ally21324565
ally traitor56764567
barriar overcome87982176
barriar overcome instruction32421232
cat67364385
cat pat18921827
cat pa82542134

I want to group these data by words if they have similar words and calculate their sum of c1 c2 c3 c4 seperately. 

For example, I want 'the first level' group calculation if the first word in the first column are similar. that is to say, something like:

first levelc1c2c3c4
ally7710890132
barriar11914033108
cat16718282146

How do I do this? I come up with some ideas about 'like' operator, but there are thousands of words in the first column, I don't know whether there is some loop that I can do with this process, can some one help me out?

1 ACCEPTED SOLUTION

Accepted Solutions
BenbowL
Fluorite | Level 6

Hi hx,

 

I have a solution that fits the data you have presented, however it is dependent on you wanting to group your data based on a match of the first word in the "words" variable (you could edit the wordmtch variable to include the first two words, or write a macro to change this for you, if you also require that) and I'm sure there is a better way to go about doing this, but maybe it will help you.

 

data have;
input words $ 1-28 c1 c2 c3 c4;
wordmtch = scan(words,1);
cards;
ally                         21 32 45 65
ally traitor                 56 76 45 67
barriar overcome             87 98 21 76
barriar overcome instruction 32 42 12 32
cat                          67 36 43 85
cat pat                      18 92 18 27
cat pa                       82 54 21 34
;
run;


data want;
set have;
by wordmtch;
retain c1tot c2tot c3tot c4tot;
if first.wordmtch then do
c1tot = c1;
c2tot = c2;
c3tot = c3;
c4tot = c4;
end;
else do;
c1tot = c1tot+c1;
c2tot = c2tot+c2;
c3tot = c3tot+c3;
c4tot = c4tot+c4;
end;
if last.wordmtch then output;
drop c1 c2 c3 c4 words;
rename c1tot = c1 c2tot = c2 c3tot = c3 c4tot = c4 wordmtch = words;
run;

Hope this helps,

 

Lee

View solution in original post

4 REPLIES 4
BenbowL
Fluorite | Level 6

Hi hx,

 

I have a solution that fits the data you have presented, however it is dependent on you wanting to group your data based on a match of the first word in the "words" variable (you could edit the wordmtch variable to include the first two words, or write a macro to change this for you, if you also require that) and I'm sure there is a better way to go about doing this, but maybe it will help you.

 

data have;
input words $ 1-28 c1 c2 c3 c4;
wordmtch = scan(words,1);
cards;
ally                         21 32 45 65
ally traitor                 56 76 45 67
barriar overcome             87 98 21 76
barriar overcome instruction 32 42 12 32
cat                          67 36 43 85
cat pat                      18 92 18 27
cat pa                       82 54 21 34
;
run;


data want;
set have;
by wordmtch;
retain c1tot c2tot c3tot c4tot;
if first.wordmtch then do
c1tot = c1;
c2tot = c2;
c3tot = c3;
c4tot = c4;
end;
else do;
c1tot = c1tot+c1;
c2tot = c2tot+c2;
c3tot = c3tot+c3;
c4tot = c4tot+c4;
end;
if last.wordmtch then output;
drop c1 c2 c3 c4 words;
rename c1tot = c1 c2tot = c2 c3tot = c3 c4tot = c4 wordmtch = words;
run;

Hope this helps,

 

Lee

hx
Calcite | Level 5 hx
Calcite | Level 5

Thank you Lee!

You gave me a great idea on how to dealing with this issue and it worked. I will try some macro or adjustment based on your code to fit my original data. 

GreggB
Pyrite | Level 9

Using PROC SUMMARY will also work.

 

data have;
input words $ 1-28 c1 c2 c3 c4;
wordmtch = scan(words,1);
cards;
ally 21 32 45 65
ally traitor 56 76 45 67
barriar overcome 87 98 21 76
barriar overcome instruction 32 42 12 32
cat 67 36 43 85
cat pat 18 92 18 27
cat pa 82 54 21 34
;
run;
proc sort;
by wordmtch;
proc summary data=have;
by wordmtch;
var c1 c2 c3 c4;
output out=have2
sum=;
run;
proc print data=have2;
run;

hx
Calcite | Level 5 hx
Calcite | Level 5
thank you greggB, the thing is I have almost 10 thousands observations for the 'words', and many of them have some typos. for example, the word 'cat' may be typed as 'catt' and I still want them to be treated as the same, should I include 'like' operator somewhere?

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1072 views
  • 0 likes
  • 3 in conversation