BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
dustychair
Pyrite | Level 9

Hi all,

I have a data set as below. I am trying to get the sum of scores within each cluster based on some rules:

1. Within the same cluster if there are OI and FT items then sum only OI items' scores.

2. If there are only FT items, then sum all of the FT items' scores.

 

I also need the frequency of clusterids. I was kind of able to get the frequency and sum of OI items when there are only OI items, but it looks complicated to me now. Will appreciate for any help.

Have

clusterid item_type score

AA11 OI 1

AA11 FT 1

AA11 OI 1

AA22 FT 1

AA22 FT 2

 

Want

clusterid item_type score sum Frequency

AA11 OI 1 2 3

AA11 FT 1 2 3

AA11 OI 1 2 3

AA22 FT 1 3 2

AA22 FT 2 3 2

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Here's my solution. One of the things I like about this is that you can see the intermediate results (from PROC FREQ and PROC TRANSPOSE) and so you can better understand the underlying logic. Yes, of course, you can do the whole thing in one DATA step (@andreas_lds) or one SQL (@Ksharp), but for beginners in SAS, I like this better.

 

Please note I have created data set named HAVE as SAS data step code. @dustychair: Please follow this example in future questions, provide data as working SAS data step code.

 

data have;
input clusterid $ item_type $ score;
cards;
AA11 OI 1
AA11 FT 1
AA11 OI 1
AA22 FT 1
AA22 FT 2
;
proc freq data=have;
    tables clusterid*item_type/noprint out=_counts_;
run;
proc transpose data=_counts_ out=_counts_t;
    by clusterid;
    var count;
    id item_type;
run;

proc sql;
    create table want as select a.*
        ,case when b.ft>0 and b.oi>0 then sum(a.score*(a.item_type='OI')) 
            when b.ft>0 and b.oi<1 then sum(a.score*(a.item_type='FT')) end as sum
        ,count(a.clusterid) as n

        from have as a left join _counts_t as b
        on a.clusterid=b.clusterid
        group by a.clusterid;
quit; 

  

--
Paige Miller

View solution in original post

8 REPLIES 8
andreas_lds
Jade | Level 19

What have you tried?

The problem can be solved by using DOW-loops, other ways exist, of course, using multiple steps.

Spoiler
data want_dow;
   if 0 then set have; /* maintain variable order */
   
   freq = 0;
   oi = 0;
   ft = 0;
   
   do _n_ = 1 by 1 until(last.clusterid);
      set have;
      by clusterid;
      
      freq = freq + 1;      
      ft = ft + ifn(item_type = 'FT', score, 0);
      oi = oi + ifn(item_type = 'OI', score, 0);
   end;
   
   if oi > 0 then sum = oi;
   else sum = ft;
   
   do _n_ = 1 by 1 until(last.clusterid);
      set have;
      by clusterid;
      output;
   end;
   
   drop ft oi;
run;
dustychair
Pyrite | Level 9

Hi @andreas_lds ,

Thank you for providing a possible solution. Some results are not as I expected but we are very close. It is working when there is a cluster that has only FT or only OI. However, when there are FT and OI items together in the same cluster it is not working for some cases. Please see  a small data from the results. According to this data the sum should be 0 for all rows however it is 2. Is it because FT=2?

clusterid item_type score sum

s2224 FT 2 2

s2224 OI 0 2

s2224 OI 0 2

s2224 OI 0 2

s2224 FT 0 2

Ksharp
Super User
/*
Assuming there are only two levels (OI FI) in item_type.
*/
data have;
input clusterid $ item_type $ score;
cards;
AA11 OI 1
AA11 FT 1
AA11 OI 1
AA22 FT 1
AA22 FT 2
;
proc sql;
create table temp as
select *,case 
when sum(item_type='OI') and sum(item_type='FT') and item_type='OI' then score
when  sum(item_type='FT')=count(item_type) then score
else 0 end as temp,
count(item_type) as Frequency
 from have
  group by clusterid ;

create table want(drop=temp) as
select *,sum(temp) as sum
 from temp
  group by clusterid ;
quit;
PaigeMiller
Diamond | Level 26

Here's my solution. One of the things I like about this is that you can see the intermediate results (from PROC FREQ and PROC TRANSPOSE) and so you can better understand the underlying logic. Yes, of course, you can do the whole thing in one DATA step (@andreas_lds) or one SQL (@Ksharp), but for beginners in SAS, I like this better.

 

Please note I have created data set named HAVE as SAS data step code. @dustychair: Please follow this example in future questions, provide data as working SAS data step code.

 

data have;
input clusterid $ item_type $ score;
cards;
AA11 OI 1
AA11 FT 1
AA11 OI 1
AA22 FT 1
AA22 FT 2
;
proc freq data=have;
    tables clusterid*item_type/noprint out=_counts_;
run;
proc transpose data=_counts_ out=_counts_t;
    by clusterid;
    var count;
    id item_type;
run;

proc sql;
    create table want as select a.*
        ,case when b.ft>0 and b.oi>0 then sum(a.score*(a.item_type='OI')) 
            when b.ft>0 and b.oi<1 then sum(a.score*(a.item_type='FT')) end as sum
        ,count(a.clusterid) as n

        from have as a left join _counts_t as b
        on a.clusterid=b.clusterid
        group by a.clusterid;
quit; 

  

--
Paige Miller
dustychair
Pyrite | Level 9

Hi @PaigeMiller,

Thank you for providing a possible solution. I like the idea of being simple. I tried your code. The sum column has results only for FT items the rest is missing. I don't know what part is not working correctly.

PaigeMiller
Diamond | Level 26

Show us the code you used, or better yet the log. Don't ever say "it isn't working" without providing the code or the log.

--
Paige Miller
dustychair
Pyrite | Level 9

Hi @PaigeMiller ,

Here it is. The variables names are different; also I added studentid since i want it per student.

Thanks


proc sql;
create table sci_me_gr05 as
select distinct a.*,b.Item_Type, b.interactionTypes, b.ItemName
from  sg.sci_me_gr05  a left join TM.metadata   b
on a.itemuin =b.itemreference;
quit;

data sci_me_gr05;
	length clusterid $100.;
	set sci_me_gr05;

	if scan(lowcase(itemname),2,"-")="tao21" then clusterid=UPCASE(scan(itemname,3,"_"));
	else if substr(lowcase(itemname),1,2)="sc" then clusterid=UPCASE(scan(itemname,3,"_"));
	else if substr(lowcase(itemname),1,2)="nm" and index(itemname,"_") then clusterid=UPCASE(scan(itemname,2,"_"));
	else clusterid=UPCASE(scan(itemname,4,"-"));

	if index(lowcase(clusterid),"item") then clusterid=scan(clusterid,1,"_ ");
	if missing(clusterid) then clusterid=UPCASE(scan(scan(itemname,2,"_"),1," "));
	if clusterid="ITEM1" then clusterid=UPCASE(scan(scan(itemname,4,"-"),1,"_"));
run;

proc sort data=sci_me_gr05;
by StudentSystemUIN;
run;
proc freq data=sci_me_gr05;
    tables clusterid*itemstatus/noprint out=_counts_;
	by StudentSystemUIN;
run;
proc transpose data=_counts_ out=_counts_t;
    by clusterid;
    var count;
    id itemstatus;
run;


proc sql;
    create table CTT1 as select a.*
        ,case when b.Field_Test>0 and b.Operational>0 then sum(a.itemscore*(a.itemstatus='Operational')) 
            when b.Field_Test>0 and b.Operational<1 then sum(a.itemscore*(a.itemstatus='Field Test')) end as sum
        ,count(a.clusterid) as n

        from sci_me_gr05 as a left join _counts_t as b
        on a.clusterid=b.clusterid
		group by a.clusterid, a.StudentSystemUIN;
quit; 
dustychair
Pyrite | Level 9

Hi @PaigeMiller,

I just added one statement to your code. it resolved those missing sums.

thank you so much.

Here is the final version of the code.


proc sql;
    create table CTT1 as select a.*
        ,case when b.Field_Test>0 and b.Operational>0 then sum(a.itemscore*(a.itemstatus='Operational')) 
         when b.Operational>0 then sum(a.itemscore*(a.itemstatus='Operational')) 
 when b.Field_Test>0 and b.Operational<1 then sum(a.itemscore*(a.itemstatus='Field Test')) end as sum
        ,count(a.clusterid) as n

        from sci_me_gr05 as a left join _counts_t as b
        on a.clusterid=b.clusterid
		group by a.clusterid, a.StudentSystemUIN;
quit; 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1122 views
  • 2 likes
  • 4 in conversation