I have data like below.
| Date | Cat1 | Cat2 | Cat3 | Cat4 | Cat5 | Cat6 | Cat7 | Value | 
| Jan-18 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 
| Jan-18 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 
| Jan-18 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 
| 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | |
| Jan-18 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 
| Jan-18 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 
| Jan-18 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 
| Jan-18 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 
| Jan-18 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 
| Jan-18 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 
| Jan-18 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 
| Feb-18 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 
| Feb-18 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 
| Feb-18 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 
| Feb-18 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 
| Feb-18 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 
| Feb-18 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 
| Feb-18 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 
| Feb-18 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 
| Mar-18 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 
| Mar-18 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 
| Mar-18 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 
| Mar-18 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 
| Mar-18 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 
| Mar-18 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 
| Mar-18 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 
| Mar-18 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 
| Mar-18 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 
| Mar-18 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 
| Mar-18 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 
| Mar-18 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 
| Mar-18 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 
| Mar-18 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 
| Mar-18 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 
I would like to create in sas the following summary table:
| Rate Cat 1 | Rate Cat 2 | Rate Cat 3 | Rate Cat 4 | Rate Cat 5 | Rate Cat 6 | Rate Cat 7 | |
| Jan | 75% | 50% | 50% | 75% | 67% | 57% | 71% | 
| Feb | 60% | 20% | 50% | 25% | 50% | 50% | 50% | 
| March | 63% | 42% | 50% | 40% | 57% | 50% | 50% | 
These are rates are calculated in excel using an average if.
E.g for for jan Rate cat1:
So the jan rate cat1 = the average of the `value` in January, when Cat 1 =1.
I am looking to do the same thing as what I have done in excel, but in SAS
My bad, I misunderstood. Like this?
proc sql;
  select DATE
       , sum(VALUE*(CAT1=1))/sum(CAT1=1) as RATE1
       ...
from HAVE
group by DATE;
Well, there may be a more elegant way to do it, but you could use multiple arrays, two at a time. The first would hold the 1 or 0 for the category and the second would hold the value to be averaged. As you loop through the 1/0 array, the vale of the second array is added to an accumulator, and then at the end of the iteration for a given month, the average is computed. Subsequent categories would be computed in like manner, all results being stored in a results array.
Something like the below. See results below code. Is this along the lines of what you're looking for?
DATA	Have;
	FORMAT	Date	YYMMD7.;
	INFILE	DATALINES	DSD	DLM='09'X	MISSOVER;
	INPUT	
		Date	:	ANYDTDTE11.	Cat1	Cat2	Cat3	Cat4	Cat5	Cat6	Cat7	Value;
DATALINES;
Jan-18-2021	1	0	0	0	1	1	0	0
Jan-18-2021	0	1	1	0	0	1	1	0
Jan-18-2021	0	1	0	1	0	1	1	1
Jan-18-2021	1	1	0	0	1	0	0
Jan-18-2021	1	0	0	0	0	0	0	1
Jan-18-2021	0	1	0	1	0	0	1	0
Jan-18-2021	0	1	0	1	1	0	1	1
Jan-18-2021	0	0	1	0	0	1	1	1
Jan-18-2021	1	1	1	0	0	0	1	1
Jan-18-2021	1	0	0	0	1	1	0	1
Jan-18-2021	0	0	0	1	0	1	1	1
Feb-18-2021	1	1	1	0	1	0	1	0
Feb-18-2021	0	1	0	1	0	0	0	0
Feb-18-2021	1	0	1	0	0	1	0	1
Feb-18-2021	0	0	1	0	1	1	0	1
Feb-18-2021	1	1	0	0	1	0	1	1
Feb-18-2021	1	0	0	1	0	0	1	1
Feb-18-2021	0	1	1	1	0	1	0	0
Feb-18-2021	1	1	0	1	1	1	1	0
Mar-18-2021	0	1	0	1	1	0	0	0
Mar-18-2021	1	1	0	1	0	1	0	0
Mar-18-2021	1	1	1	0	0	1	0	1
Mar-18-2021	0	1	0	0	0	1	1	0
Mar-18-2021	1	1	1	0	1	1	0	1
Mar-18-2021	1	1	1	1	1	0	1	1
Mar-18-2021	1	0	0	1	0	1	0	0
Mar-18-2021	1	1	0	0	0	0	0	0
Mar-18-2021	1	1	1	1	1	1	0	1
Mar-18-2021	0	1	0	1	1	1	1	1
Mar-18-2021	0	1	1	0	0	0	0	0
Mar-18-2021	0	1	1	1	0	0	0	0
Mar-18-2021	0	1	1	1	1	1	1	0
Mar-18-2021	0	0	1	1	1	1	1	0
Mar-18-2021	1	0	0	1	0	1	1	1
;
RUN;
DATA	Want;
	DROP	_:	Cat1 - Cat7	Value;
	SET	Have;
		BY	Date	NOTSORTED;
	ARRAY	Cats	{*}	Cat1	-	Cat7;
	ARRAY	Sums	{7}	_TEMPORARY_;
	ARRAY	Rates	{7}	Rate1	-	Rate7;
	FORMAT	Rate1	-	Rate7	PERCENT10.2;
	DO	_i						=	1	TO	DIM(Cats);
		IF	Cats{_i}						THEN
			Sums{_i}			+	Value;
	END;
	IF	Last.Date							THEN
		DO;
			DO	_i				=	1	TO	DIM(Cats);
				Rates{_i}		=	Sums{_i}	/	7;
				CALL	MISSING(Sums{_i});
			END;
			OUTPUT;
		END;
	ELSE
		DO;
			DELETE;			
		END;
RUN;
Results:
Jim
The values in the summary table don't match the source table you have posted. What is the role of the variable "Value"?
If you want to use sas, i strongly recommend to drop the idea of doing things the same way you would do them using Excel, you won't get far, if you stick to the concepts of the inferior software.
Value is used in the rate calculation. The data i have posted is fine.
Take the rows in January in which cat1 = 1
| Date | Cat1 | Cat2 | Cat3 | Cat4 | Cat5 | Cat6 | Cat7 | Value | 
| Jan-18 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 
| Jan-18 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 
| Jan-18 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 
| 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | |
| Jan-18 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 
| Jan-18 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 
| Jan-18 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 
| Jan-18 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 
| Jan-18 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 
| Jan-18 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 
CA rate for cat1 Jan = sum of value = 3 divided by sum of cat = 4, therefore rate = 3/4
@sasprogramming wrote:
CA rate for cat1 Jan = sum of value = 3 divided by sum of cat = 4, therefore rate = 3/4
OK, well, then what I posted earlier is a bit off and needs adjustment. Let me tweak it a bit.
Jim
I think I have slightly different data based on the original posting -- or I'm misunderstanding (which is entirely possible).
Based on the data I have (see posted below), I get a 5 for the denominator for January, which is the count of "1's" in the Cat1 column. For the numerator, I get a 3, which is the count of "1's" in the Value column with a corresponding "1" in the Cat1 column. So, 3/5 = 0.6 or 60%, yes? Full results, below, followed by revised code.
January Data:
Jan-18-2021 1 0 0 0 1 1 0 0 Jan-18-2021 0 1 1 0 0 1 1 0 Jan-18-2021 0 1 0 1 0 1 1 1 Jan-18-2021 1 1 0 0 1 0 0 Jan-18-2021 1 0 0 0 0 0 0 1 Jan-18-2021 0 1 0 1 0 0 1 0 Jan-18-2021 0 1 0 1 1 0 1 1 Jan-18-2021 0 0 1 0 0 1 1 1 Jan-18-2021 1 1 1 0 0 0 1 1 Jan-18-2021 1 0 0 0 1 1 0 1 Jan-18-2021 0 0 0 1 0 1 1 1
Results (all months):
Code:
DATA	Have;
	FORMAT	Date	YYMMD7.;
	INFILE	DATALINES	DSD	DLM='09'X	MISSOVER;
	INPUT	
		Date	:	ANYDTDTE11.	Cat1	Cat2	Cat3	Cat4	Cat5	Cat6	Cat7	Value;
DATALINES;
Jan-18-2021	1	0	0	0	1	1	0	0
Jan-18-2021	0	1	1	0	0	1	1	0
Jan-18-2021	0	1	0	1	0	1	1	1
Jan-18-2021	1	1	0	0	1	0	0
Jan-18-2021	1	0	0	0	0	0	0	1
Jan-18-2021	0	1	0	1	0	0	1	0
Jan-18-2021	0	1	0	1	1	0	1	1
Jan-18-2021	0	0	1	0	0	1	1	1
Jan-18-2021	1	1	1	0	0	0	1	1
Jan-18-2021	1	0	0	0	1	1	0	1
Jan-18-2021	0	0	0	1	0	1	1	1
Feb-18-2021	1	1	1	0	1	0	1	0
Feb-18-2021	0	1	0	1	0	0	0	0
Feb-18-2021	1	0	1	0	0	1	0	1
Feb-18-2021	0	0	1	0	1	1	0	1
Feb-18-2021	1	1	0	0	1	0	1	1
Feb-18-2021	1	0	0	1	0	0	1	1
Feb-18-2021	0	1	1	1	0	1	0	0
Feb-18-2021	1	1	0	1	1	1	1	0
Mar-18-2021	0	1	0	1	1	0	0	0
Mar-18-2021	1	1	0	1	0	1	0	0
Mar-18-2021	1	1	1	0	0	1	0	1
Mar-18-2021	0	1	0	0	0	1	1	0
Mar-18-2021	1	1	1	0	1	1	0	1
Mar-18-2021	1	1	1	1	1	0	1	1
Mar-18-2021	1	0	0	1	0	1	0	0
Mar-18-2021	1	1	0	0	0	0	0	0
Mar-18-2021	1	1	1	1	1	1	0	1
Mar-18-2021	0	1	0	1	1	1	1	1
Mar-18-2021	0	1	1	0	0	0	0	0
Mar-18-2021	0	1	1	1	0	0	0	0
Mar-18-2021	0	1	1	1	1	1	1	0
Mar-18-2021	0	0	1	1	1	1	1	0
Mar-18-2021	1	0	0	1	0	1	1	1
;
RUN;
DATA	Want;
	DROP	_:	Cat1 - Cat7	Value;
	SET	Have;
		BY	Date	NOTSORTED;
	ARRAY	Cats	{*}	Cat1	-	Cat7;
	ARRAY	Sums	{7}	_TEMPORARY_;
	ARRAY	Cat_Cnt	{7}	_TEMPORARY_;
	ARRAY	Rates	{7}	Rate1	-	Rate7;
	FORMAT	Rate1	-	Rate7	PERCENT10.2;
	DO	_i						=	1	TO	DIM(Cats);
		IF	Cats{_i}						THEN
			DO;
				Sums{_i}		+	Value;
				Cat_Cnt{_i}		+	1;
			END;
	END;
	IF	Last.Date							THEN
		DO;
			DO	_i				=	1	TO	DIM(Cats);
				Rates{_i}		=	Sums{_i}	/	Cat_Cnt{_i};
				CALL	MISSING(Sums{_i}, Cat_Cnt{_i});
			END;
			OUTPUT;
		END;
	ELSE
		DO;
			DELETE;			
		END;
RUN;Jim
Like this?
proc sql;
  select DATE
       , sum(VALUE)/sum(CAT1) as RATE1
       , sum(VALUE)/sum(CAT2) as RATE2
       ...
from HAVE
group by DATE;My bad, I misunderstood. Like this?
proc sql;
  select DATE
       , sum(VALUE*(CAT1=1))/sum(CAT1=1) as RATE1
       ...
from HAVE
group by DATE;
@ChrisNZ wrote:proc sql; select DATE , sum(VALUE*(CAT1=1))/sum(CAT1=1) as RATE1 ... from HAVE group by DATE;
Oh! Now, that's danged clever. And I suppose that if the CATn variables had possible values greater than 1, then we could do the following:
proc sql;
  select DATE
       , sum(VALUE*(CAT1>0))/sum(IFN(CAT1>0,CAT1,0)) as RATE1	FORMAT=PERCENT10.2
       , sum(VALUE*(CAT2>0))/sum(IFN(CAT2>0,CAT2,0)) as RATE2	FORMAT=PERCENT10.2
       , sum(VALUE*(CAT3>0))/sum(IFN(CAT3>0,CAT3,0)) as RATE3	FORMAT=PERCENT10.2
       , sum(VALUE*(CAT4>0))/sum(IFN(CAT4>0,CAT4,0)) as RATE4	FORMAT=PERCENT10.2
       , sum(VALUE*(CAT5>0))/sum(IFN(CAT5>0,CAT5,0)) as RATE5	FORMAT=PERCENT10.2
       , sum(VALUE*(CAT6>0))/sum(IFN(CAT6>0,CAT6,0)) as RATE6	FORMAT=PERCENT10.2
       , sum(VALUE*(CAT7>0))/sum(IFN(CAT7>0,CAT7,0)) as RATE7	FORMAT=PERCENT10.2
from HAVE
group by DATE;
Jim
@sasprogramming wrote:
Value is used in the rate calculation. The data i have posted is fine.
Take the rows in January in which cat1 = 1
Date Cat1 Cat2 Cat3 Cat4 Cat5 Cat6 Cat7 Value Jan-18 1 0 0 0 1 1 0 0 Jan-18 0 1 1 0 0 1 1 0 Jan-18 0 1 0 1 0 1 1 1 0 1 1 0 0 1 0 0 Jan-18 1 0 0 0 0 0 0 1 Jan-18 0 1 0 1 0 0 1 0 Jan-18 0 1 0 1 1 0 1 1 Jan-18 0 0 1 0 0 1 1 1 Jan-18 1 1 1 0 0 0 1 1 Jan-18 1 0 0 0 1 1 0 1 
CA rate for cat1 Jan = sum of value = 3 divided by sum of cat = 4, therefore rate = 3/4
I cannot understand where the 3 and 4 come from in your calculations. There are four observations with CAT1=1. Is that how you got 4? But then where did the 3 come from?
Is that value of date supposed to be missing?
First step: make sure all of your "grouping" variables, such as a date are not missing. Fill in if they are missing before running any of the suggested solutions.
Second, your "want" only shows Jan, Feb etc. Do you have data from more than one year in the Date variable? If so, is the "want" supposed to only reflect one year, apparently 18 (bad juju can arise with reliance on 2-digit years).
PROC REPORT can do this easily. The mean of the 0/1 binary variables is the percent you want.
proc report data=have;
    columns date cat1-cat7;
    define date/group "Date";
    define cat1/mean format=percent8.2;
    define cat2/mean format=percent8.2;
    define cat3/mean format=percent8.2;
    define cat4/mean format=percent8.2;
    define cat5/mean format=percent8.2;
    define cat6/mean format=percent8.2;
    define cat7/mean format=percent8.2;
run;
For the Rate: If the numerator is determined by the Value column and the denominator is determined by the Cat column, would a Proc Report give us the result we would want in this case? An excellent technique in general though; one that didn't even occur to me.
Jim
Yes, PROC REPORT could be modified so that the value column is used in the denominator. It was never clear to me from reading the original post that the percents used the value column in the calculations. and the Excel example is something I ignore, as I don't like and don't use Excel.
Seems simple enough.
data want;
  do until(last.date);
    set have;
    by date;
    array cat[7];
    array top[7];
    array bottom[7];
    array rate[7];
    do index=1 to dim(cat);
      if cat[index] then do;
        top[index]=sum(top[index],value);
        bottom[index]=sum(bottom[index],1);
      end;
    end;
  end;
  do index=1 to dim(cat);
    rate[index]=divide(top[index],bottom[index]);
  end;
  drop index cat: top: bottom: value;
run;Obs date rate1 rate2 rate3 rate4 rate5 rate6 rate7 1 2021-01-18 0.750 0.50000 0.5 0.75 0.66667 0.57143 0.71429 2 2021-02-18 0.600 0.20000 0.5 0.25 0.50000 0.50000 0.50000 3 2021-03-18 0.625 0.41667 0.5 0.40 0.57143 0.50000 0.50000
But it would be much easier with a different structure where the category number is not stored in the NAME of a variable but instead as the VALUE of a variable instead.
data have ;
  input date :date. value @ ;
  do cat=1 to 7;
    input result @;
    output;
  end;
  format date yymmdd10.;
DATALINES;
18-Jan-2021 0 1 0 0 0 1 1 0 0
18-Jan-2021 0 0 1 1 0 0 1 1 0
18-Jan-2021 1 0 1 0 1 0 1 1 1
18-Jan-2021 0 0 1 1 0 0 1 0 0
18-Jan-2021 1 1 0 0 0 0 0 0 1
18-Jan-2021 0 0 1 0 1 0 0 1 0
18-Jan-2021 1 0 1 0 1 1 0 1 1
18-Jan-2021 1 0 0 1 0 0 1 1 1
18-Jan-2021 1 1 1 1 0 0 0 1 1
18-Jan-2021 1 1 0 0 0 1 1 0 1
18-Jan-2021 1 0 0 0 1 0 1 1 1
18-Feb-2021 0 1 1 1 0 1 0 1 0
18-Feb-2021 0 0 1 0 1 0 0 0 0
18-Feb-2021 1 1 0 1 0 0 1 0 1
18-Feb-2021 1 0 0 1 0 1 1 0 1
18-Feb-2021 1 1 1 0 0 1 0 1 1
18-Feb-2021 1 1 0 0 1 0 0 1 1
18-Feb-2021 0 0 1 1 1 0 1 0 0
18-Feb-2021 0 1 1 0 1 1 1 1 0
18-Mar-2021 0 0 1 0 1 1 0 0 0
18-Mar-2021 0 1 1 0 1 0 1 0 0
18-Mar-2021 1 1 1 1 0 0 1 0 1
18-Mar-2021 0 0 1 0 0 0 1 1 0
18-Mar-2021 1 1 1 1 0 1 1 0 1
18-Mar-2021 1 1 1 1 1 1 0 1 1
18-Mar-2021 0 1 0 0 1 0 1 0 0
18-Mar-2021 0 1 1 0 0 0 0 0 0
18-Mar-2021 1 1 1 1 1 1 1 0 1
18-Mar-2021 1 0 1 0 1 1 1 1 1
18-Mar-2021 0 0 1 1 0 0 0 0 0
18-Mar-2021 0 0 1 1 1 0 0 0 0
18-Mar-2021 0 0 1 1 1 1 1 1 0
18-Mar-2021 0 0 0 1 1 1 1 1 0
18-Mar-2021 1 1 0 0 1 0 1 1 1
;
proc sql ;
create table want as
select date,cat
     , sum(result*value)/sum(result) as rate
from have
group by 1,2
;
quit;Which you could then use to make a REPORT that looks like your request.
proc report data=want;
 column date rate,cat ;
 define date/group;
 define rate/sum ' ';
 define cat/across ' ';
run;date 1 2 3 4 5 6 7 2021-01-18 0.75 0.5 0.5 0.75 0.6666667 0.5714286 0.7142857 2021-02-18 0.6 0.2 0.5 0.25 0.5 0.5 0.5 2021-03-18 0.625 0.4166667 0.5 0.4 0.5714286 0.5 0.5
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
