hello,
I am relatively new to using SAS and trying to replicate a report that is currently in excel.
I have a table that has products sold by month of the year. the dataset looks like this:
Month | Product1 | product 2 | product 3
1 | 350 | 342 | 234 | sum of Jan
2 | 433 | 435 | 753 | sum of Feb
3 | 644 | 245 | 253 | sum of March
total | sum of product 1 | sum of product 2 | sum of product 3
I have already calculated the totals both by Month and by product using proc means and an array sum, but now I want to create a new table with the same variables (month and products), but have each observation a percentage of the total sold by month.
for example, I want January for product 1 (where 350 currently is) to be 350 / sum of product 1. this is easy in excel, by simply dividing b2 by e2, but I am having trouble replicating that in SAS.
Any ideas or thoughts would be appreciated, thank you for your time.
Simple summaries may not require as many steps as you think. sometimes.
In this case:
data work.month; input Month $ Product1 Product2 Product3 ; tot = sum(of product:); datalines; Jan 3 4 5 Feb 2 3 6 Mar 2 5 4 ; run; Proc tabulate data= work.month; class month /order=data; var Product: tot; table month='' , (Product: tot)*colpctsum=''*f=f3. /box=month ; run;
Note that data set built with a data step. Others can copy it and likely show the proc report version.
Proc Tabulate has very limited abilities to combine multiple variables but can sum rows or within classification variables, in this case column sums or request the cell percentage of that sum. Tabulate doesn't by default use percent signs in the output as they can overload a table appearance and when all of the values are percentages unneeded.
The use of Product: is just to get all of the variables with the same root name without typing all of them out.
The (product: tot) parentheses indicate we are going to request the same statistic (or group of statistics) for all of the variables inside.
Use of month=' ' is to suppress the word "month" appearing in a different location. The option box=month places the label of the variable month into the box in the upper left corner.
The statistic is COLPCTSUM which requests the percent of the column sum. The =' ' is to suppress the word colpctsum from appearing in the output. You can provide any label text you might want such as '% Sales' or what ever though for multiple variables it may be redundant. The *f= says to display using the specified format, default would be two decimals.
Note that
colpctsum=' '*f=f3.*(Product: tot)
would also be acceptable syntax. The difference is that the label for colpctsum would only appear once if used.
Depending on you data if you had started with data in a somewhat different layout you may be able to skip any intermediate calculations:
data example; informat date mmddyy10. product $10.; format date mmddyy10.; input date product sales; datalines; 01/01/2017 Apples 123 01/05/2017 Apples 236 01/01/2017 Pears 222 01/01/2017 Carrots 6 02/01/2017 Apples 88 02/11/2017 Apples 22 02/21/2017 Pears 77 03/05/2017 Carrots 66 03/01/2017 Apples 100 03/11/2017 Carrots 44 03/21/2017 Pears 99 03/05/2017 Carrots 22 ; run; proc tabulate data=example; class date; format date monname3.; class product; var sales; tables date='', (product='' All='Total')*sales=''*colpctsum=''*f=f3. /box='Month' misstext='0' ; run;
Note that here we use actual SAS date values and a FORMAT allows us to create the grouping we want (month). If your data went across years this month total would sum across years. So some care in selecting the format is needed but there are MANY date formats. You could change to a week of year, quarter of year or day of the week just by changing the format. No additional data needed.
The (product=' ' All='Total') says to use "all product" to group the data to generate a statistic for the combined data.
Look into Proc Report or Proc Tabulate
Provide an actual data sample we can code against. If you have an existing data set Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
It isn't really clear which totals of what type you have at the moment (at least to me)
And depending on what you have it may require a pass through a data step.
Thank you for the quick response! You're right, not very clear in hindsight. Maybe this will help.
What I have:
Month | Product 1 | Product 2 | Product 3 | Total |
Jan | 3 | 4 | 5 | 12 |
Feb | 2 | 3 | 6 | 11 |
Mar | 2 | 5 | 4 | 11 |
Total | 7 | 12 | 15 | 34 |
What I want:
Month | Product 1 | Product 2 | Product 3 | Total |
Jan | 43% | 33% | 33% | 35% |
Feb | 29% | 25% | 40% | 32% |
Mar | 29% | 42% | 27% | 32% |
Does this make more sense? Sorry for the lack of clarity. Appreciate the help you giving.
if it matters at all, this is the code I used to create the total row and column.
proc means noprint =work.month;
output out=summary sum=;
run;
data month;
set work.month work.summary (in=a);
if a then Month='Total';
drop _type_ _freq_;
run;
data month;
set work.month;
array n {*} _numeric_;
Total=sum(of n[*]);
run;
Something along the line of below should work.
data have;
infile datalines dlm=' ';
input Month $ Product_1 Product_2 Product_3;
datalines;
1Jan 3 4 5 12
2Feb 2 3 6 11
3Mar 2 5 4 11
;
run;
data inter;
set have;
array _prod {*} Product_1 Product_2 Product_3;
length Product $32;
do _i=1 to dim(_prod);
product=vname(_prod[_i]);
amount=_prod[_i];
output;
end;
run;
proc tabulate data=inter noseps order=data;
class month product;
var amount;
keylabel colpctsum=' ' all='Total';
table
month,
(product=' ' all)*amount=' '*(colpctsum*f=pctfmt9.)
;
run;
N.B: If working with dates of any kind using SAS date values is highly recommended.
@huffa9299 my contribution:
data have;
input Month $ Product1 Product2 Product3 Total;
datalines;
Jan 3 4 5 12
Feb 2 3 6 11
Mar 2 5 4 11
Total 7 12 15 34
;
proc transpose data=have(where=(month='Total')) out=want(drop=month);;
by month notsorted;
var product: total;
run;
data final_want;
if _N_ = 1 then do;
if 0 then set want;
declare hash h(dataset:'want');
h.defineKey('_name_');
h.defineData('col1');
h.defineDone();
end;
set have(where=(month ne 'Total'));
array t(*)Product: total;
do _n_=1 to dim(t);
_temp=vname(t(_n_));
if h.find(key:_temp)=0 then t(_n_)=divide(t(_n_),col1);
end;
drop _: col1;
run;
Hi,
You can use Cartesian and do in single PROC SQL.
DATA Have;
INFILE DATALINES dlm=",";
INPUT Month $ Product1 Product2 Product3 Total;
DATALINES;
Jan,3,4,5,12
Feb,2,3,6,11
Mar,2,5,4,11
Total,7,12,15,34
;
RUN;
proc sql;
create table want as
select Month,(a.Product1/b.Product1_) format percent. as Product_1,
(a.Product2/b.Product2_) format percent. as Product_2,
(a.Product3/b.Product3_) format percent. as Product_3
from (select * from have
where month<>"Total") a,
(select Product1 as Product1_,
Product2 as Product2_,
Product3 as Product3_
from have
where month="Total") b
;
quit;
Simple summaries may not require as many steps as you think. sometimes.
In this case:
data work.month; input Month $ Product1 Product2 Product3 ; tot = sum(of product:); datalines; Jan 3 4 5 Feb 2 3 6 Mar 2 5 4 ; run; Proc tabulate data= work.month; class month /order=data; var Product: tot; table month='' , (Product: tot)*colpctsum=''*f=f3. /box=month ; run;
Note that data set built with a data step. Others can copy it and likely show the proc report version.
Proc Tabulate has very limited abilities to combine multiple variables but can sum rows or within classification variables, in this case column sums or request the cell percentage of that sum. Tabulate doesn't by default use percent signs in the output as they can overload a table appearance and when all of the values are percentages unneeded.
The use of Product: is just to get all of the variables with the same root name without typing all of them out.
The (product: tot) parentheses indicate we are going to request the same statistic (or group of statistics) for all of the variables inside.
Use of month=' ' is to suppress the word "month" appearing in a different location. The option box=month places the label of the variable month into the box in the upper left corner.
The statistic is COLPCTSUM which requests the percent of the column sum. The =' ' is to suppress the word colpctsum from appearing in the output. You can provide any label text you might want such as '% Sales' or what ever though for multiple variables it may be redundant. The *f= says to display using the specified format, default would be two decimals.
Note that
colpctsum=' '*f=f3.*(Product: tot)
would also be acceptable syntax. The difference is that the label for colpctsum would only appear once if used.
Depending on you data if you had started with data in a somewhat different layout you may be able to skip any intermediate calculations:
data example; informat date mmddyy10. product $10.; format date mmddyy10.; input date product sales; datalines; 01/01/2017 Apples 123 01/05/2017 Apples 236 01/01/2017 Pears 222 01/01/2017 Carrots 6 02/01/2017 Apples 88 02/11/2017 Apples 22 02/21/2017 Pears 77 03/05/2017 Carrots 66 03/01/2017 Apples 100 03/11/2017 Carrots 44 03/21/2017 Pears 99 03/05/2017 Carrots 22 ; run; proc tabulate data=example; class date; format date monname3.; class product; var sales; tables date='', (product='' All='Total')*sales=''*colpctsum=''*f=f3. /box='Month' misstext='0' ; run;
Note that here we use actual SAS date values and a FORMAT allows us to create the grouping we want (month). If your data went across years this month total would sum across years. So some care in selecting the format is needed but there are MANY date formats. You could change to a week of year, quarter of year or day of the week just by changing the format. No additional data needed.
The (product=' ' All='Total') says to use "all product" to group the data to generate a statistic for the combined data.
This worked great, Thank you all for sharing your knowledge!
data have;
infile datalines dlm=' ';
input Month $ Product_1 Product_2 Product_3;
datalines;
Jan 3 4 5 12
Feb 2 3 6 11
Mar 2 5 4 11
;
run;
proc iml;
use have;
read all var _num_ into x[c=vname];
read all var{month};
close;
temp=x||x[,+];
want=temp/temp[+,];
mattrib want f=percent8.2;
create want from want[r=month c=(vname||'Total')];
append from want[r=month];
close;
quit;
proc print noobs;run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.