Hi all,
I have a data set as below. I am trying to get the sum of scores within each cluster based on some rules:
1. Within the same cluster if there are OI and FT items then sum only OI items' scores.
2. If there are only FT items, then sum all of the FT items' scores.
I also need the frequency of clusterids. I was kind of able to get the frequency and sum of OI items when there are only OI items, but it looks complicated to me now. Will appreciate for any help.
Have
clusterid item_type score
AA11 OI 1
AA11 FT 1
AA11 OI 1
AA22 FT 1
AA22 FT 2
Want
clusterid item_type score sum Frequency
AA11 OI 1 2 3
AA11 FT 1 2 3
AA11 OI 1 2 3
AA22 FT 1 3 2
AA22 FT 2 3 2
Here's my solution. One of the things I like about this is that you can see the intermediate results (from PROC FREQ and PROC TRANSPOSE) and so you can better understand the underlying logic. Yes, of course, you can do the whole thing in one DATA step (@andreas_lds) or one SQL (@Ksharp), but for beginners in SAS, I like this better.
Please note I have created data set named HAVE as SAS data step code. @dustychair: Please follow this example in future questions, provide data as working SAS data step code.
data have;
input clusterid $ item_type $ score;
cards;
AA11 OI 1
AA11 FT 1
AA11 OI 1
AA22 FT 1
AA22 FT 2
;
proc freq data=have;
tables clusterid*item_type/noprint out=_counts_;
run;
proc transpose data=_counts_ out=_counts_t;
by clusterid;
var count;
id item_type;
run;
proc sql;
create table want as select a.*
,case when b.ft>0 and b.oi>0 then sum(a.score*(a.item_type='OI'))
when b.ft>0 and b.oi<1 then sum(a.score*(a.item_type='FT')) end as sum
,count(a.clusterid) as n
from have as a left join _counts_t as b
on a.clusterid=b.clusterid
group by a.clusterid;
quit;
What have you tried?
The problem can be solved by using DOW-loops, other ways exist, of course, using multiple steps.
data want_dow;
if 0 then set have; /* maintain variable order */
freq = 0;
oi = 0;
ft = 0;
do _n_ = 1 by 1 until(last.clusterid);
set have;
by clusterid;
freq = freq + 1;
ft = ft + ifn(item_type = 'FT', score, 0);
oi = oi + ifn(item_type = 'OI', score, 0);
end;
if oi > 0 then sum = oi;
else sum = ft;
do _n_ = 1 by 1 until(last.clusterid);
set have;
by clusterid;
output;
end;
drop ft oi;
run;
Hi @andreas_lds ,
Thank you for providing a possible solution. Some results are not as I expected but we are very close. It is working when there is a cluster that has only FT or only OI. However, when there are FT and OI items together in the same cluster it is not working for some cases. Please see a small data from the results. According to this data the sum should be 0 for all rows however it is 2. Is it because FT=2?
clusterid item_type score sum
s2224 FT 2 2
s2224 OI 0 2
s2224 OI 0 2
s2224 OI 0 2
s2224 FT 0 2
/* Assuming there are only two levels (OI FI) in item_type. */ data have; input clusterid $ item_type $ score; cards; AA11 OI 1 AA11 FT 1 AA11 OI 1 AA22 FT 1 AA22 FT 2 ; proc sql; create table temp as select *,case when sum(item_type='OI') and sum(item_type='FT') and item_type='OI' then score when sum(item_type='FT')=count(item_type) then score else 0 end as temp, count(item_type) as Frequency from have group by clusterid ; create table want(drop=temp) as select *,sum(temp) as sum from temp group by clusterid ; quit;
Here's my solution. One of the things I like about this is that you can see the intermediate results (from PROC FREQ and PROC TRANSPOSE) and so you can better understand the underlying logic. Yes, of course, you can do the whole thing in one DATA step (@andreas_lds) or one SQL (@Ksharp), but for beginners in SAS, I like this better.
Please note I have created data set named HAVE as SAS data step code. @dustychair: Please follow this example in future questions, provide data as working SAS data step code.
data have;
input clusterid $ item_type $ score;
cards;
AA11 OI 1
AA11 FT 1
AA11 OI 1
AA22 FT 1
AA22 FT 2
;
proc freq data=have;
tables clusterid*item_type/noprint out=_counts_;
run;
proc transpose data=_counts_ out=_counts_t;
by clusterid;
var count;
id item_type;
run;
proc sql;
create table want as select a.*
,case when b.ft>0 and b.oi>0 then sum(a.score*(a.item_type='OI'))
when b.ft>0 and b.oi<1 then sum(a.score*(a.item_type='FT')) end as sum
,count(a.clusterid) as n
from have as a left join _counts_t as b
on a.clusterid=b.clusterid
group by a.clusterid;
quit;
Hi @PaigeMiller,
Thank you for providing a possible solution. I like the idea of being simple. I tried your code. The sum column has results only for FT items the rest is missing. I don't know what part is not working correctly.
Show us the code you used, or better yet the log. Don't ever say "it isn't working" without providing the code or the log.
Hi @PaigeMiller ,
Here it is. The variables names are different; also I added studentid since i want it per student.
Thanks
proc sql;
create table sci_me_gr05 as
select distinct a.*,b.Item_Type, b.interactionTypes, b.ItemName
from sg.sci_me_gr05 a left join TM.metadata b
on a.itemuin =b.itemreference;
quit;
data sci_me_gr05;
length clusterid $100.;
set sci_me_gr05;
if scan(lowcase(itemname),2,"-")="tao21" then clusterid=UPCASE(scan(itemname,3,"_"));
else if substr(lowcase(itemname),1,2)="sc" then clusterid=UPCASE(scan(itemname,3,"_"));
else if substr(lowcase(itemname),1,2)="nm" and index(itemname,"_") then clusterid=UPCASE(scan(itemname,2,"_"));
else clusterid=UPCASE(scan(itemname,4,"-"));
if index(lowcase(clusterid),"item") then clusterid=scan(clusterid,1,"_ ");
if missing(clusterid) then clusterid=UPCASE(scan(scan(itemname,2,"_"),1," "));
if clusterid="ITEM1" then clusterid=UPCASE(scan(scan(itemname,4,"-"),1,"_"));
run;
proc sort data=sci_me_gr05;
by StudentSystemUIN;
run;
proc freq data=sci_me_gr05;
tables clusterid*itemstatus/noprint out=_counts_;
by StudentSystemUIN;
run;
proc transpose data=_counts_ out=_counts_t;
by clusterid;
var count;
id itemstatus;
run;
proc sql;
create table CTT1 as select a.*
,case when b.Field_Test>0 and b.Operational>0 then sum(a.itemscore*(a.itemstatus='Operational'))
when b.Field_Test>0 and b.Operational<1 then sum(a.itemscore*(a.itemstatus='Field Test')) end as sum
,count(a.clusterid) as n
from sci_me_gr05 as a left join _counts_t as b
on a.clusterid=b.clusterid
group by a.clusterid, a.StudentSystemUIN;
quit;
Hi @PaigeMiller,
I just added one statement to your code. it resolved those missing sums.
thank you so much.
Here is the final version of the code.
proc sql;
create table CTT1 as select a.*
,case when b.Field_Test>0 and b.Operational>0 then sum(a.itemscore*(a.itemstatus='Operational'))
when b.Operational>0 then sum(a.itemscore*(a.itemstatus='Operational'))
when b.Field_Test>0 and b.Operational<1 then sum(a.itemscore*(a.itemstatus='Field Test')) end as sum
,count(a.clusterid) as n
from sci_me_gr05 as a left join _counts_t as b
on a.clusterid=b.clusterid
group by a.clusterid, a.StudentSystemUIN;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.