BookmarkSubscribeRSS Feed
CP2
Pyrite | Level 9 CP2
Pyrite | Level 9

hello,

I'm having trouble sorting and collapsing my data in a proc report. I have created a sample dataset. Prior programming has replaced any frequency percent <3.9% as 'other' in the drugcombination variable. In the report, I want to do the following:

 

1) to collapse the 'other' category within each combogroup.

2) to place 'Other' combogroup at the end and 'other' drugcombinations at the end within combogroup. I believe I have done this with the dummy sort variables I have created. 

3) I want to put the drugcombinationchemo in an order based on the descending value of the percent column (while still keeping 'other' at the end if possible). 

 

I have tried various ways to show the percent column in descending order plus collapsing 'other' and nothing seems to work. Do I need to do a proc tabulate to sum the 'other' values and then do proc report?

 

I am getting the following output which is not accomplishing item 1 and item 3 above.

 

SAS Output


Combination Group Combination Percent
Dar Monodaratumumab100.00
IMiD Monolenalidomide94.29
 pomalidomide1.43
 thalidomide4.29
PI Monobortezomib93.88
 carfilzomib5.94
 other0.17
PI-Cyclo±bortezomib - cyclophosphamide99.25
 carfilzomib - cyclophosphamide0.75
PI-Dox±bortezomib - doxorubicin lipos100.00
PI-IMiDbortezomib - lenalidomide98.36
 bortezomib - pomalidomide1.09
 other0.55
Otherbortezomib - daratumumab17.24
 cyclophosphamide10.34
 elotuzumab13.79
 elotuzumab - lenalidomide6.90
 investigational drugs34.48
 other3.45
   
   
   
   
 
code:
DATA have ;
infile datalines delimiter=',' ;
format combogroup $12. DrugCombinationChemo $30. Percent 8.2 drugsort 8.0 drugsort2 8.0 ;
input combogroup $ DrugCombinationChemo $ Percent drugsort drugsort2 ;
datalines ;
Dar Mono,daratumumab,100,0,0
IMiD Mono,lenalidomide,94.2857142857143,0,0
IMiD Mono,thalidomide,4.28571428571429,0,0
IMiD Mono,pomalidomide,1.42857142857143,0,0
Other,investigational drugs,34.4827586206897,1,0
Other,bortezomib - daratumumab,17.2413793103448,1,0
Other,elotuzumab,13.7931034482759,1,0
Other,cyclophosphamide,10.3448275862069,1,0
Other,elotuzumab - lenalidomide,6.89655172413793,1,0
Other,other,3.44827586206897,1,1
Other,other,3.44827586206897,1,1
Other,other,3.44827586206897,1,1
Other,other,3.44827586206897,1,1
Other,other,3.44827586206897,1,1
PI Mono,bortezomib,93.8811188811189,0,0
PI Mono,carfilzomib,5.94405594405594,0,0
PI Mono,other,0.174825174825175,0,1
PI-Cyclo±,bortezomib - cyclophosphamide,99.2481203007519,0,0
PI-Cyclo±,carfilzomib - cyclophosphamide,0.75187969924812,0,0
PI-Dox±,bortezomib - doxorubicin liposomal,100,0,0
PI-IMiD,bortezomib - lenalidomide,98.3606557377049,0,0
PI-IMiD,bortezomib - pomalidomide,1.09289617486339,0,0
PI-IMiD,other,0.546448087431694,0,1

;
run;
proc print; run;

proc sort data=have ;
by drugsort combogroup drugsort2 descending percent ;
run;
proc print; run;

proc report data=have nowd
style(report)=[background=ltgray ]
style(column)=[background=gwh]
style(header)=[background=ltgray foreground=white];
column drugsort combogroup drugcombinationchemo percent ;
define drugsort /order order=internal noprint;
define combogroup /order order=data group 'Combination Group' ;
define percent /analysis sum order order=internal;
define drugcombinationchemo /group 'Combination' ;
run;
7 REPLIES 7
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sorry, not here to write an output for you.  What I will say is that when I do these types of output I make sure the dataset looks like the output I want.  So you would sort the data - based on some ID you assign in a datastep - in the way the output should look, you don't have to report the ID, just make sure the column has order=data option on.  This fixes how the data gets displayed.  In fact for all my reports I have id's for each row stating what page it should be on, and then one id for each subgroup within page, this way when I report I just use the data order to report.

Cynthia_sas
Diamond | Level 26

Hi:

  In looking at all your variables, and changing your code slightly, it seems that between DRUGSORT and DRUGSORT2, you have too many ways for the "OTHER" category to be put on the report. Some OTHER rows have DRUGSORT2=0 and DRUGSORT=1 and then other rows have DRUGSORT2=1 and DRUGSORT=0 and DRUGSORT2=1 and DRUGSORT=1 -- so there are 3 possible ways for an "OTHER" row to be placed on the report.

 

  I'm not sure of your logic, so don't know what to suggest. Using both DRUGSORT2 and DRUGSORT on the report seems to lead closer to what you say you want, but since I am not entirely sure of what you want to see from your description I don't know. Whatever comes first on the COLUMN statement is going to control the order of the rows. So if you only put DRUGSORT in the COLUMN statement and NOT DRUGSORT2, you may not get the results you want.

 

  Here's a little test I ran changing a few things in your PROC REPORT to show the impact of having both DRUGSORT2 and DRUGSORT in the column statement. I also have some questions about the data values that look wonky to me.

 

cynthia

proc_report_drugsort.png

CP2
Pyrite | Level 9 CP2
Pyrite | Level 9

Cynthia,

Thanks for taking a look at this. There was a step I didn't identify which might explain the 'other' issue.

 

There will always be an OTHER combogroup (DRUGSORT=1 IF COMBOGROUP=OTHER OR ELSE IT IS 0). Within each combogroup category, including this OTHER combo group, there are many drugs. The percentages should add to 100% within each combogroup. We only want to show the drugs that most contribute to the category so drug names have been changed to "other" if the share is small with the intent of collapsing the 'other' category and summing the percents for that group. This means that there will even be 'other' drugcombinationchemo within the OTHER combogroup. So, if I add drugsort2 to the columns and define it as an ORDER option I will get "Other" combogroup and "other" drugcombinationchemo showing up at the end of the table. This is ok but the problem is I really want the percents to be in descending order WITHIN combogroup too. For example the following code gets me there ALMOST. If you run it you will see that for the IMiD Mono category pomalidomide is only 1% and is listed before thalidomide which is 4%. If I have to choose between "other" being at the end of each category OR values be in descending order, I'd rather have the values in descending order.

 

The other problem which can be fixed with a proc tabulate is that the multiple 'others' need to be summed and collapsed into one row within each combogroup category. I was hoping this could be done in the proc report to avoid unnecessary multiple steps but maybe it's not possible ??? What I am noticing is that it only shows one of the 'other' rows in the category and blanks out the others.

 

Your sort order is not in the right order and I believe this is because drugsort needs to be before drugsort2. If this helps identify my problem better and you think of anything please pass it along. Thanks!

 

proc report data=have nowd
style(report)=[background=ltgray ]
style(column)=[background=gwh]
style(header)=[background=ltgray foreground=white];
column drugsort combogroup drugsort2 drugcombinationchemo percent ;
define drugsort /order order=internal noprint;
define drugsort2 /order order=internal noprint;
define combogroup /order order=data group 'Combination Group' ;
define percent /analysis sum order order=internal;
define drugcombinationchemo /group 'Combination' ;
run;

 

 

Cynthia_sas
Diamond | Level 26
Hi:
This will NOT work in PROC REPORT --
define percent /analysis sum order order=internal;
you can't collapse and summarize and then also try to order. The variable is either ANALYSIS SUM -or- it is ORDER -- it cannot be both. And you need it to be ANALYSIS SUM so the rows collapse.

I guess I still don't understand your logic. Are you "pre-calculating" the percents with another procedure? The challenge you will face is that the OTHER rows with DRUGSORT=0 will NEVER get summed with the OTHER rows when DRUGSORT=1. So if that is the basis of your logic, that is something you have to change or revisit.

cynthia
CP2
Pyrite | Level 9 CP2
Pyrite | Level 9

The 'have' sample dataset was originally created from a proc freq using ods output to a dataset.  I appreciate you helping me to understand the proc report better but it looks like I'd be better off doing my reporting in Excel. However, just so I don't leave you hanging, this is what I want to end up with (created in Excel from the SAS 'have' dataset in my example).

 

Combination GroupCombinationPercent
Dar Monodaratumumab100
IMiD Monolenalidomide94.29
 thalidomide4.29
 pomalidomide1.43
PI Monobortezomib93.88
 carfilzomib5.94
 other0.17
PI-Cyclo±bortezomib - cyclophosphamide99.25
 carfilzomib - cyclophosphamide0.75
PI-Dox±bortezomib - doxorubicin lipos100
PI-IMiDbortezomib - lenalidomide98.36
 bortezomib - pomalidomide1.09
 other0.55
Otherinvestigational drugs34.48
 bortezomib - daratumumab17.24
 elotuzumab13.79
 cyclophosphamide10.34
 elotuzumab - lenalidomide6.9
 other17.25
   
ballardw
Super User

Consider a proper sort and data step sometimes.

proc sort data=have;
  by drugsort combogroup descending percent;
run;
data want;
   set have;
   by drugsort combogroup DrugCombinationChemo notsorted;
   retain otsum;
   if first.DrugCombinationChemo then otsum=.;
   otsum= sum(otsum,percent);
   if last.DrugCombinationChemo then do;
      percent=otsum;
      output;
   end;
run;

proc print data=want (drop=drugsort drugsort2 otsum) noobs;
run;

Cynthia_sas
Diamond | Level 26

Now I am confused again. I thought you wanted ALL the "other" to be at the bottom and yet, you show other under PI-IMiD and under PI Mono. But that is very doable with PROC REPORT just changing your code a bit.

 

Using your sample data, but without regard to DRUGSORT2, and only changing the usages to be consistent (usage of GROUP does collapsing), this is what I get:

not_drugsort2_and_group.png


cynthia

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2032 views
  • 1 like
  • 4 in conversation