DATA Step, Macro, Functions and more

how do i calculate the percentage of each obs in a var compared to the total sum of each var?

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

how do i calculate the percentage of each obs in a var compared to the total sum of each var?

hello,

 

I am relatively new to using SAS and trying to replicate a report that is currently in excel.

I have a table that has products sold by month of the year. the dataset looks like this:

 

Month | Product1 | product 2 | product 3 

1 | 350 | 342 | 234 | sum of Jan

2 | 433 | 435 | 753 | sum of Feb

3 | 644 | 245 | 253 | sum of March

total | sum of product 1 | sum of product 2 | sum of product 3

 

I have already calculated the totals both by Month and by product using proc means and an array sum, but now I want to create a new table with the same variables (month and products), but have each observation a percentage of the total sold by month.

 

for example, I want January for product 1 (where 350 currently is) to be 350 / sum of product 1. this is easy in excel, by simply dividing b2 by e2, but I am having trouble replicating that in SAS.

 

Any ideas or thoughts would be appreciated, thank you for your time.


Accepted Solutions
Solution
‎02-18-2018 02:06 PM
Super User
Posts: 12,996

Re: how do i calculate the percentage of each obs in a var compared to the total sum of each var?

Posted in reply to huffa9299

Simple summaries may not require as many steps as you think. sometimes.

In this case:

data work.month;
   input Month $ Product1 Product2 Product3 ;
   tot = sum(of product:);
datalines; 
Jan 3 4 5
Feb 2 3 6
Mar 2 5 4
;
run;


Proc tabulate data= work.month;
   class month /order=data;
   var Product: tot;
   table month='' ,
         (Product: tot)*colpctsum=''*f=f3.
         /box=month
  ;
run;

Note that data set built with a data step. Others can copy it and likely show the proc report version.

 

Proc Tabulate has very limited abilities to combine multiple variables but can sum rows or within classification variables, in this case column sums or request the cell percentage of that sum. Tabulate doesn't by default use percent signs in the output as they can overload a table appearance and when all of the values are percentages unneeded.

The use of Product: is just to get all of the variables with the same root name without typing all of them out.

The (product: tot) parentheses indicate we are going to request the same statistic (or group of statistics) for all of the variables inside.

Use of month=' ' is to suppress the word "month" appearing in a different location. The option box=month places the label of the variable month into the box in the upper left corner.

The statistic is COLPCTSUM which requests the percent of the column sum. The =' ' is to suppress the word colpctsum from appearing in the output. You can provide any label text you might want such as '% Sales' or what ever though for multiple variables it may be redundant. The *f= says to display using the specified format, default would be two decimals.

Note that

colpctsum=' '*f=f3.*(Product: tot)

would also be acceptable syntax. The difference is that the label for colpctsum would only appear once if used.

 

 

Depending on you data if you had started with data in a somewhat different layout you may be able to skip any intermediate calculations:

data example;
   informat date mmddyy10. product $10.;
   format date mmddyy10.;
   input date product sales;
datalines;
01/01/2017  Apples 123
01/05/2017  Apples 236
01/01/2017  Pears 222
01/01/2017  Carrots 6
02/01/2017  Apples 88
02/11/2017  Apples 22
02/21/2017  Pears 77
03/05/2017  Carrots 66
03/01/2017  Apples 100
03/11/2017  Carrots 44
03/21/2017  Pears 99
03/05/2017  Carrots 22
;
run;

proc tabulate data=example;
   class date;
   format date monname3.;
   class product;
   var sales;
   tables date='',
          (product='' All='Total')*sales=''*colpctsum=''*f=f3.
          /box='Month' misstext='0'
   ;
run;

Note that here we use actual SAS date values and a FORMAT allows us to create the grouping we want (month). If your data went across years this month total would sum across years. So some care in selecting the format is needed but there are MANY date formats. You could change to a week of year,  quarter of year or day of the week just by changing the format. No additional data needed.

 

The (product=' ' All='Total') says to use "all product" to group the data to generate a statistic for the combined data.

View solution in original post


All Replies
Respected Advisor
Posts: 4,538

Re: how do i calculate the percentage of each obs in a var compared to the total sum of each var?

Posted in reply to huffa9299

Look into Proc Report or Proc Tabulate

Super User
Posts: 12,996

Re: how do i calculate the percentage of each obs in a var compared to the total sum of each var?

Posted in reply to huffa9299

Provide an actual data sample we can code against. If you have an existing data set Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

It isn't really clear which totals of what type you have at the moment (at least to me)

 

And depending on what you have it may require a pass through a data step.

New Contributor
Posts: 4

Re: how do i calculate the percentage of each obs in a var compared to the total sum of each var?

Thank you for the quick response! You're right, not very clear in hindsight.  Maybe this will help.

 

What I have:

MonthProduct 1Product 2Product 3Total
Jan34512
Feb23611
Mar25411
Total7121534

 

What I want:

MonthProduct 1Product 2Product 3Total
Jan43%33%33%35%
Feb29%25%40%32%
Mar29%42%27%32%

 

Does this make more sense? Sorry for the lack of clarity. Appreciate the help you giving.

 

if it matters at all, this is the code I used to create the total row and column.

 

proc means noprint =work.month;

output out=summary sum=;

run;

 

data month;

set work.month work.summary (in=a);

if a then Month='Total';

drop _type_ _freq_;

run;

data month;

set work.month;

array n {*} _numeric_;

Total=sum(of n[*]);

run;

Respected Advisor
Posts: 4,538

Re: how do i calculate the percentage of each obs in a var compared to the total sum of each var?

[ Edited ]
Posted in reply to huffa9299

@huffa9299

Something along the line of below should work.

data have;
  infile datalines dlm=' ';
  input Month $ Product_1 Product_2 Product_3;
  datalines;
1Jan 3 4 5 12
2Feb 2 3 6 11
3Mar 2 5 4 11
;
run;

data inter;
  set have;
  array _prod {*} Product_1 Product_2 Product_3;
  length Product $32;
  do _i=1 to dim(_prod);
    product=vname(_prod[_i]);
    amount=_prod[_i];
    output;
  end;
run;

proc tabulate data=inter noseps order=data;
  class month product;
  var amount;
  keylabel colpctsum=' ' all='Total';
  table 
    month, 
    (product=' ' all)*amount=' '*(colpctsum*f=pctfmt9.)
    ;
run;

 

N.B: If working with dates of any kind using SAS date values is highly recommended. 

PROC Star
Posts: 1,289

Re: how do i calculate the percentage of each obs in a var compared to the total sum of each var?

@huffa9299 my contribution:

data have;
input Month $	Product1	Product2	Product3	Total;
datalines;
Jan	3	4	5	12
Feb	2	3	6	11
Mar	2	5	4	11
Total	7	12	15	34
;

proc transpose data=have(where=(month='Total')) out=want(drop=month);;
by month notsorted;
var product: total;
run;

data final_want;
  if _N_ = 1 then do;
  if 0 then set want;
      declare hash h(dataset:'want');
      h.defineKey('_name_');
      h.defineData('col1');
      h.defineDone();
    end;
set have(where=(month ne 'Total'));
array t(*)Product: total;
do _n_=1 to dim(t);
_temp=vname(t(_n_));
if h.find(key:_temp)=0 then t(_n_)=divide(t(_n_),col1);
end;
drop _: col1;
run;
Super Contributor
Posts: 448

Re: how do i calculate the percentage of each obs in a var compared to the total sum of each var?

Posted in reply to huffa9299

Hi,

 

You can use Cartesian and do in single PROC SQL.

 

DATA Have;
INFILE DATALINES dlm=",";
INPUT Month $ Product1 Product2 Product3 Total;
DATALINES;
Jan,3,4,5,12
Feb,2,3,6,11
Mar,2,5,4,11
Total,7,12,15,34
;
RUN;

proc sql;
create table want as
select Month,(a.Product1/b.Product1_) format percent. as Product_1,
		(a.Product2/b.Product2_) format percent. as Product_2,
		(a.Product3/b.Product3_) format percent. as Product_3
	from (select * from have 
					where month<>"Total") a,
		 (select Product1 as Product1_,
				 Product2 as Product2_,
				 Product3 as Product3_
				from have 
					where month="Total") b
					  ;
quit;

Thanks,
Suryakiran
Solution
‎02-18-2018 02:06 PM
Super User
Posts: 12,996

Re: how do i calculate the percentage of each obs in a var compared to the total sum of each var?

Posted in reply to huffa9299

Simple summaries may not require as many steps as you think. sometimes.

In this case:

data work.month;
   input Month $ Product1 Product2 Product3 ;
   tot = sum(of product:);
datalines; 
Jan 3 4 5
Feb 2 3 6
Mar 2 5 4
;
run;


Proc tabulate data= work.month;
   class month /order=data;
   var Product: tot;
   table month='' ,
         (Product: tot)*colpctsum=''*f=f3.
         /box=month
  ;
run;

Note that data set built with a data step. Others can copy it and likely show the proc report version.

 

Proc Tabulate has very limited abilities to combine multiple variables but can sum rows or within classification variables, in this case column sums or request the cell percentage of that sum. Tabulate doesn't by default use percent signs in the output as they can overload a table appearance and when all of the values are percentages unneeded.

The use of Product: is just to get all of the variables with the same root name without typing all of them out.

The (product: tot) parentheses indicate we are going to request the same statistic (or group of statistics) for all of the variables inside.

Use of month=' ' is to suppress the word "month" appearing in a different location. The option box=month places the label of the variable month into the box in the upper left corner.

The statistic is COLPCTSUM which requests the percent of the column sum. The =' ' is to suppress the word colpctsum from appearing in the output. You can provide any label text you might want such as '% Sales' or what ever though for multiple variables it may be redundant. The *f= says to display using the specified format, default would be two decimals.

Note that

colpctsum=' '*f=f3.*(Product: tot)

would also be acceptable syntax. The difference is that the label for colpctsum would only appear once if used.

 

 

Depending on you data if you had started with data in a somewhat different layout you may be able to skip any intermediate calculations:

data example;
   informat date mmddyy10. product $10.;
   format date mmddyy10.;
   input date product sales;
datalines;
01/01/2017  Apples 123
01/05/2017  Apples 236
01/01/2017  Pears 222
01/01/2017  Carrots 6
02/01/2017  Apples 88
02/11/2017  Apples 22
02/21/2017  Pears 77
03/05/2017  Carrots 66
03/01/2017  Apples 100
03/11/2017  Carrots 44
03/21/2017  Pears 99
03/05/2017  Carrots 22
;
run;

proc tabulate data=example;
   class date;
   format date monname3.;
   class product;
   var sales;
   tables date='',
          (product='' All='Total')*sales=''*colpctsum=''*f=f3.
          /box='Month' misstext='0'
   ;
run;

Note that here we use actual SAS date values and a FORMAT allows us to create the grouping we want (month). If your data went across years this month total would sum across years. So some care in selecting the format is needed but there are MANY date formats. You could change to a week of year,  quarter of year or day of the week just by changing the format. No additional data needed.

 

The (product=' ' All='Total') says to use "all product" to group the data to generate a statistic for the combined data.

New Contributor
Posts: 4

Re: how do i calculate the percentage of each obs in a var compared to the total sum of each var?

This worked great, Thank you all for sharing your knowledge!

Super User
Posts: 10,609

Re: how do i calculate the percentage of each obs in a var compared to the total sum of each var?

Posted in reply to huffa9299
data have;
  infile datalines dlm=' ';
  input Month $ Product_1 Product_2 Product_3;
  datalines;
Jan 3 4 5 12
Feb 2 3 6 11
Mar 2 5 4 11
;
run;
proc iml;
use have;
read all var _num_ into x[c=vname];
read all var{month};
close;

temp=x||x[,+];
want=temp/temp[+,];
mattrib want  f=percent8.2;
create want from want[r=month c=(vname||'Total')];
append from want[r=month];
close;
quit;

proc print noobs;run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 204 views
  • 0 likes
  • 6 in conversation