Solved: Re: conditional sum

dustychair · Posted 01-30-2023 12:15 AM

Hi all,

I have a data set as below. I am trying to get the sum of scores within each cluster based on some rules:

1. Within the same cluster if there are OI and FT items then sum only OI items' scores.

2. If there are only FT items, then sum all of the FT items' scores.

I also need the frequency of clusterids. I was kind of able to get the frequency and sum of OI items when there are only OI items, but it looks complicated to me now. Will appreciate for any help.

Have

clusterid item_type score

AA11 OI 1

AA11 FT 1

AA11 OI 1

AA22 FT 1

AA22 FT 2

Want

clusterid item_type score sum Frequency

AA11 OI 1 2 3

AA11 FT 1 2 3

AA11 OI 1 2 3

AA22 FT 1 3 2

AA22 FT 2 3 2

PaigeMiller · Posted 01-30-2023 07:51 AM

Here's my solution. One of the things I like about this is that you can see the intermediate results (from PROC FREQ and PROC TRANSPOSE) and so you can better understand the underlying logic. Yes, of course, you can do the whole thing in one DATA step (@andreas_lds) or one SQL (@Ksharp), but for beginners in SAS, I like this better.

Please note I have created data set named HAVE as SAS data step code. @dustychair: Please follow this example in future questions, provide data as working SAS data step code.

data have;
input clusterid $ item_type $ score;
cards;
AA11 OI 1
AA11 FT 1
AA11 OI 1
AA22 FT 1
AA22 FT 2
;
proc freq data=have;
    tables clusterid*item_type/noprint out=_counts_;
run;
proc transpose data=_counts_ out=_counts_t;
    by clusterid;
    var count;
    id item_type;
run;

proc sql;
    create table want as select a.*
        ,case when b.ft>0 and b.oi>0 then sum(a.score*(a.item_type='OI')) 
            when b.ft>0 and b.oi<1 then sum(a.score*(a.item_type='FT')) end as sum
        ,count(a.clusterid) as n

        from have as a left join _counts_t as b
        on a.clusterid=b.clusterid
        group by a.clusterid;
quit;

--
Paige Miller

View solution in original post

andreas_lds · Posted 01-30-2023 01:44 AM

What have you tried?

The problem can be solved by using DOW-loops, other ways exist, of course, using multiple steps.

Spoiler

data want_dow;
   if 0 then set have; /* maintain variable order */
   
   freq = 0;
   oi = 0;
   ft = 0;
   
   do _n_ = 1 by 1 until(last.clusterid);
      set have;
      by clusterid;
      
      freq = freq + 1;      
      ft = ft + ifn(item_type = 'FT', score, 0);
      oi = oi + ifn(item_type = 'OI', score, 0);
   end;
   
   if oi > 0 then sum = oi;
   else sum = ft;
   
   do _n_ = 1 by 1 until(last.clusterid);
      set have;
      by clusterid;
      output;
   end;
   
   drop ft oi;
run;

data want_dow; if 0 then set have; /* maintain variable order */ freq = 0; oi = 0; ft = 0; do _n_ = 1 by 1 until(last.clusterid); set have; by clusterid; freq = freq + 1; ft = ft + ifn(item_type = 'FT', score, 0); oi = oi + ifn(item_type = 'OI', score, 0); end; if oi > 0 then sum = oi; else sum = ft; do _n_ = 1 by 1 until(last.clusterid); set have; by clusterid; output; end; drop ft oi; run;

dustychair · Posted 02-01-2023 01:08 AM

Hi @andreas_lds ,

Thank you for providing a possible solution. Some results are not as I expected but we are very close. It is working when there is a cluster that has only FT or only OI. However, when there are FT and OI items together in the same cluster it is not working for some cases. Please see a small data from the results. According to this data the sum should be 0 for all rows however it is 2. Is it because FT=2?

clusterid item_type score sum

s2224 FT 2 2

s2224 OI 0 2

s2224 FT 0 2

Ksharp · Posted 01-30-2023 05:39 AM

/*
Assuming there are only two levels (OI FI) in item_type.
*/
data have;
input clusterid $ item_type $ score;
cards;
AA11 OI 1
AA11 FT 1
AA11 OI 1
AA22 FT 1
AA22 FT 2
;
proc sql;
create table temp as
select *,case 
when sum(item_type='OI') and sum(item_type='FT') and item_type='OI' then score
when  sum(item_type='FT')=count(item_type) then score
else 0 end as temp,
count(item_type) as Frequency
 from have
  group by clusterid ;

create table want(drop=temp) as
select *,sum(temp) as sum
 from temp
  group by clusterid ;
quit;

PaigeMiller · Posted 01-30-2023 07:51 AM

Here's my solution. One of the things I like about this is that you can see the intermediate results (from PROC FREQ and PROC TRANSPOSE) and so you can better understand the underlying logic. Yes, of course, you can do the whole thing in one DATA step (@andreas_lds) or one SQL (@Ksharp), but for beginners in SAS, I like this better.

Please note I have created data set named HAVE as SAS data step code. @dustychair: Please follow this example in future questions, provide data as working SAS data step code.

data have;
input clusterid $ item_type $ score;
cards;
AA11 OI 1
AA11 FT 1
AA11 OI 1
AA22 FT 1
AA22 FT 2
;
proc freq data=have;
    tables clusterid*item_type/noprint out=_counts_;
run;
proc transpose data=_counts_ out=_counts_t;
    by clusterid;
    var count;
    id item_type;
run;

proc sql;
    create table want as select a.*
        ,case when b.ft>0 and b.oi>0 then sum(a.score*(a.item_type='OI')) 
            when b.ft>0 and b.oi<1 then sum(a.score*(a.item_type='FT')) end as sum
        ,count(a.clusterid) as n

        from have as a left join _counts_t as b
        on a.clusterid=b.clusterid
        group by a.clusterid;
quit;

--
Paige Miller

dustychair · Posted 02-01-2023 01:13 AM

Hi @PaigeMiller,

Thank you for providing a possible solution. I like the idea of being simple. I tried your code. The sum column has results only for FT items the rest is missing. I don't know what part is not working correctly.

PaigeMiller · Posted 02-01-2023 05:58 AM

Show us the code you used, or better yet the log. Don't ever say "it isn't working" without providing the code or the log.

--
Paige Miller

dustychair · Posted 02-01-2023 10:53 AM

Hi @PaigeMiller ,

Here it is. The variables names are different; also I added studentid since i want it per student.

Thanks


proc sql;
create table sci_me_gr05 as
select distinct a.*,b.Item_Type, b.interactionTypes, b.ItemName
from  sg.sci_me_gr05  a left join TM.metadata   b
on a.itemuin =b.itemreference;
quit;

data sci_me_gr05;
	length clusterid $100.;
	set sci_me_gr05;

	if scan(lowcase(itemname),2,"-")="tao21" then clusterid=UPCASE(scan(itemname,3,"_"));
	else if substr(lowcase(itemname),1,2)="sc" then clusterid=UPCASE(scan(itemname,3,"_"));
	else if substr(lowcase(itemname),1,2)="nm" and index(itemname,"_") then clusterid=UPCASE(scan(itemname,2,"_"));
	else clusterid=UPCASE(scan(itemname,4,"-"));

	if index(lowcase(clusterid),"item") then clusterid=scan(clusterid,1,"_ ");
	if missing(clusterid) then clusterid=UPCASE(scan(scan(itemname,2,"_"),1," "));
	if clusterid="ITEM1" then clusterid=UPCASE(scan(scan(itemname,4,"-"),1,"_"));
run;

proc sort data=sci_me_gr05;
by StudentSystemUIN;
run;
proc freq data=sci_me_gr05;
    tables clusterid*itemstatus/noprint out=_counts_;
	by StudentSystemUIN;
run;
proc transpose data=_counts_ out=_counts_t;
    by clusterid;
    var count;
    id itemstatus;
run;


proc sql;
    create table CTT1 as select a.*
        ,case when b.Field_Test>0 and b.Operational>0 then sum(a.itemscore*(a.itemstatus='Operational')) 
            when b.Field_Test>0 and b.Operational<1 then sum(a.itemscore*(a.itemstatus='Field Test')) end as sum
        ,count(a.clusterid) as n

        from sci_me_gr05 as a left join _counts_t as b
        on a.clusterid=b.clusterid
		group by a.clusterid, a.StudentSystemUIN;
quit;

dustychair · Posted 02-01-2023 12:57 PM

Hi @PaigeMiller,

I just added one statement to your code. it resolved those missing sums.

thank you so much.

Here is the final version of the code.


proc sql;
    create table CTT1 as select a.*
        ,case when b.Field_Test>0 and b.Operational>0 then sum(a.itemscore*(a.itemstatus='Operational')) 
         when b.Operational>0 then sum(a.itemscore*(a.itemstatus='Operational')) 
 when b.Field_Test>0 and b.Operational<1 then sum(a.itemscore*(a.itemstatus='Field Test')) end as sum
        ,count(a.clusterid) as n

        from sci_me_gr05 as a left join _counts_t as b
        on a.clusterid=b.clusterid
		group by a.clusterid, a.StudentSystemUIN;
quit;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away