BookmarkSubscribeRSS Feed
Cheesiepoof05
Obsidian | Level 7

I'm new to using SAS Enterprise Guide and am trying to learn how to summarize data and wasn't sure whether code or some SAS EG report is the best method.  See below for my current data as well as the results I'm trying to get afterwards.  Thank you for your help.

 

I have the following data:

CATEGORYNAMESTATUSCOUNT_2021MARKET_SHARE_2021COUNT_2022MARKET_SHARE_2022
ANIMALSDOGREAL520.00%281.17%
ANIMALSDOGIMAGINARY1768.00%235298.66%
ANIMALSFISHREAL....
ANIMALSBEARREAL312.00%40.17%
COLORSGREENIMAGINARY825.00%70.18%
COLORSORANGEREAL....
COLORSBLACKIMAGINARY2475.00%388599.82%
SHAPESCIRCLEREAL1979.17%27100.00%
SHAPESTRIANGLEIMAGINARY520.83%..
SHAPESSQUAREIMAGINARY....

 

I'm trying to add subtotal lines that break at Category only and sum the counts and market shares such as:  (red lines are what I'm seeking to add)

CATEGORYNAMESTATUSCOUNT_2021MARKET_SHARE_2021COUNT_2022MARKET_SHARE_2022
ANIMALSDOGREAL520.00%281.17%
ANIMALSDOGIMAGINARY1768.00%235298.66%
ANIMALSFISHREAL....
ANIMALSBEARREAL312.00%40.17%
ANIMALSTOTAL 25100.00%2384100.00%
COLORSGREENIMAGINARY825.00%70.18%
COLORSORANGEREAL....
COLORSBLACKIMAGINARY2475.00%388599.82%
COLORSTOTAL 32100.00%3892100.00%
SHAPESCIRCLEREAL1979.17%27100.00%
SHAPESTRIANGLEIMAGINARY520.83%..
SHAPESSQUAREIMAGINARY....
SHAPESTOTAL 24100.00%27100.00%
8 REPLIES 8
PaigeMiller
Diamond | Level 26

PROC REPORT makes this relatively easy. You do not need to first create columns with year in the column name; in fact this is a mistake. Use the original (long) data set, rather than the (wide) data set with year in the column name. See Maxim 19.

 

data fake_data;
input category $ name $ year count marketshare;
cards;
ANIMALS DOG 2021 5 0.002
ANIMALS CAT 2021 9 0.078
ANIMALS BEAR 2021 14 .92
ANIMALS DOG 2022 13 0.042
ANIMALS CAT 2022 22 0.038
ANIMALS BEAR 2022 88 .92
COLORS GREEN 2021 5 0.002
COLORS RED 2021 9 0.078
COLORS STRIPES 2021 14 .92
COLORS GREEN 2022 13 0.042
COLORS RED 2022 22 0.038
COLORS STRIPES 2022 88 .92
;

proc report data=fake_data;
    columns category name year,(count marketshare);
    define year/across 'YEAR';
    define category/group 'CATEGORY';
    define name/group 'GROUP';
    define count/sum 'COUNT';
    define marketshare/sum format=percent10.2 'PERCENT';
    break after category/summarize;
run;

 

--
Paige Miller
Cheesiepoof05
Obsidian | Level 7

@PaigeMiller Thanks for the suggestion.  Unfortunately my data is in two different locations for the prior quarter and current quarter so I didn't split them apart, rather they were brought in from separate sources.  Any way to modify your PROC REPORT to my current format?

PaigeMiller
Diamond | Level 26

I would modify the data to be in the format I showed. Just because you receive data in a bad format doesn't mean you have to leave it in the bad format.

--
Paige Miller
Cheesiepoof05
Obsidian | Level 7

@PaigeMiller Even though this report is the last step of my process?  I understand if I was building a database, but I'm just creating a quarterly report.

Reeza
Super User

This report is easier to create if the raw data is in a different form. If you're developing this process I would strongly agree there's a point in redesigning this to be more dynamic and to do the years dynamically. Otherwise next year you're going to have modify the code to account for a new year. And then each year thereafter.
It's easier to make the code cleaner from the start.

 

If you do it manually you'll also need to add in another variable to control the sort order to have it sorted to the end. Here's an example of how it can be done. Some of this is hardcoded so you'll need to adapt of course. Now imagine adding a new year and/or different category levels. 

 

You would modify your output statement to get the totals needed. This also depends on if you want a table or report. 

 

proc means data=sashelp.cars noprint;
class make model;
types make make*model;


output out=summary sum(mpg_city) = mpg_city sum(mpg_highway) = mpg_highway;
run;

proc sort data=summary;
by make;
run;



data want;
set summary;
by make _type_;
if _type_ = 2 then do;
 model = 'Total';
 sort_order = 9999;
end;
else if first._type_ and _type_ = 3 then sort_order=1;
else sort_order+1;

drop _freq_;
run;

proc sort data=want;
by make sort_order;
run;
PaigeMiller
Diamond | Level 26

What I would do is exactly what I explained. To me that is the easiest path, and if you ever get 2023 data, then the code I provided will continue to work without modification, whereas your code will have to be modified to create columns that have 2023 in the variable name. Good code has value in the long run.

 

It's fine if you don't want to do that, that's up to you. 

--
Paige Miller
PaigeMiller
Diamond | Level 26

Starting with the data in your original table (the first one you showed)

 

data re_arrange;
    set have;
    year=2021;
    count=count_2021;
    market_share=market_share_2021;
    output;
    year=2022;
    count=count_2022;
    market_share=market_share_2022;
    output;
    drop count_2021 count_2022 market_share_2021 market_share_2022;
run;

 

Now this data set will run through my PROC REPORT code without additional modification (if I did it right, I can't test this code because I don't have your data in a usable form). Of course, at this point, you lose the future benefit of not having to change the program when 2023 data is available, the code in this message will have to be modified for 2023. (That again is the benefit that @Reeza and I described, by doing it right the first time, re-writing the entire system of code rather than adding kludge upon kludge)

--
Paige Miller
Reeza
Super User
data want;
set have;
by category;
output;
if first.category then do;
 count_2021_sum = count_2021;
 count_2022_sum= count_2022;
end;
else do;
 count_2021_sum +count_2021;
count_2022_sum+count_2022;
end;

if last.category then do;
Name='Total';
count_2021 = count_2021_sum;
count_2022=count_2022_sum;
output;
end;

run;

Another manual option - note you need to add in other variables.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1261 views
  • 3 likes
  • 3 in conversation