Solved: Re: create an average if table

sasprogramming · Posted 08-09-2021 11:05 PM

I have data like below.

Date	Cat1	Cat2	Cat3	Cat4	Cat5	Cat6	Cat7	Value
Jan-18	1	0	0	0	1	1	0	0
Jan-18	0	1	1	0	0	1	1	0
Jan-18	0	1	0	1	0	1	1	1
	0	1	1	0	0	1	0	0
Jan-18	1	0	0	0	0	0	0	1
Jan-18	0	1	0	1	0	0	1	0
Jan-18	0	1	0	1	1	0	1	1
Jan-18	0	0	1	0	0	1	1	1
Jan-18	1	1	1	0	0	0	1	1
Jan-18	1	0	0	0	1	1	0	1
Jan-18	0	0	0	1	0	1	1	1
Feb-18	1	1	1	0	1	0	1	0
Feb-18	0	1	0	1	0	0	0	0
Feb-18	1	0	1	0	0	1	0	1
Feb-18	0	0	1	0	1	1	0	1
Feb-18	1	1	0	0	1	0	1	1
Feb-18	1	0	0	1	0	0	1	1
Feb-18	0	1	1	1	0	1	0	0
Feb-18	1	1	0	1	1	1	1	0
Mar-18	0	1	0	1	1	0	0	0
Mar-18	1	1	0	1	0	1	0	0
Mar-18	1	1	1	0	0	1	0	1
Mar-18	0	1	0	0	0	1	1	0
Mar-18	1	1	1	0	1	1	0	1
Mar-18	1	1	1	1	1	0	1	1
Mar-18	1	0	0	1	0	1	0	0
Mar-18	1	1	0	0	0	0	0	0
Mar-18	1	1	1	1	1	1	0	1
Mar-18	0	1	0	1	1	1	1	1
Mar-18	0	1	1	0	0	0	0	0
Mar-18	0	1	1	1	0	0	0	0
Mar-18	0	1	1	1	1	1	1	0
Mar-18	0	0	1	1	1	1	1	0
Mar-18	1	0	0	1	0	1	1	1

I would like to create in sas the following summary table:

	Rate Cat 1	Rate Cat 2	Rate Cat 3	Rate Cat 4	Rate Cat 5	Rate Cat 6	Rate Cat 7
Jan	75%	50%	50%	75%	67%	57%	71%
Feb	60%	20%	50%	25%	50%	50%	50%
March	63%	42%	50%	40%	57%	50%	50%

These are rates are calculated in excel using an average if.

E.g for for jan Rate cat1:

So the jan rate cat1 = the average of the `value` in January, when Cat 1 =1.

I am looking to do the same thing as what I have done in excel, but in SAS

ChrisNZ · Posted 08-10-2021 05:43 AM

My bad, I misunderstood. Like this?

proc sql;
  select DATE
       , sum(VALUE*(CAT1=1))/sum(CAT1=1) as RATE1
       ...
from HAVE
group by DATE;

High-Performance SAS Coding - Third Edition

View solution in original post

jimbarbour · Posted 08-09-2021 11:23 PM

Well, there may be a more elegant way to do it, but you could use multiple arrays, two at a time. The first would hold the 1 or 0 for the category and the second would hold the value to be averaged. As you loop through the 1/0 array, the vale of the second array is added to an accumulator, and then at the end of the iteration for a given month, the average is computed. Subsequent categories would be computed in like manner, all results being stored in a results array.

Something like the below. See results below code. Is this along the lines of what you're looking for?

DATA	Have;
	FORMAT	Date	YYMMD7.;
	INFILE	DATALINES	DSD	DLM='09'X	MISSOVER;
	INPUT	
		Date	:	ANYDTDTE11.	Cat1	Cat2	Cat3	Cat4	Cat5	Cat6	Cat7	Value;

DATALINES;
Jan-18-2021	1	0	0	0	1	1	0	0
Jan-18-2021	0	1	1	0	0	1	1	0
Jan-18-2021	0	1	0	1	0	1	1	1
Jan-18-2021	1	1	0	0	1	0	0
Jan-18-2021	1	0	0	0	0	0	0	1
Jan-18-2021	0	1	0	1	0	0	1	0
Jan-18-2021	0	1	0	1	1	0	1	1
Jan-18-2021	0	0	1	0	0	1	1	1
Jan-18-2021	1	1	1	0	0	0	1	1
Jan-18-2021	1	0	0	0	1	1	0	1
Jan-18-2021	0	0	0	1	0	1	1	1
Feb-18-2021	1	1	1	0	1	0	1	0
Feb-18-2021	0	1	0	1	0	0	0	0
Feb-18-2021	1	0	1	0	0	1	0	1
Feb-18-2021	0	0	1	0	1	1	0	1
Feb-18-2021	1	1	0	0	1	0	1	1
Feb-18-2021	1	0	0	1	0	0	1	1
Feb-18-2021	0	1	1	1	0	1	0	0
Feb-18-2021	1	1	0	1	1	1	1	0
Mar-18-2021	0	1	0	1	1	0	0	0
Mar-18-2021	1	1	0	1	0	1	0	0
Mar-18-2021	1	1	1	0	0	1	0	1
Mar-18-2021	0	1	0	0	0	1	1	0
Mar-18-2021	1	1	1	0	1	1	0	1
Mar-18-2021	1	1	1	1	1	0	1	1
Mar-18-2021	1	0	0	1	0	1	0	0
Mar-18-2021	1	1	0	0	0	0	0	0
Mar-18-2021	1	1	1	1	1	1	0	1
Mar-18-2021	0	1	0	1	1	1	1	1
Mar-18-2021	0	1	1	0	0	0	0	0
Mar-18-2021	0	1	1	1	0	0	0	0
Mar-18-2021	0	1	1	1	1	1	1	0
Mar-18-2021	0	0	1	1	1	1	1	0
Mar-18-2021	1	0	0	1	0	1	1	1
;
RUN;

DATA	Want;
	DROP	_:	Cat1 - Cat7	Value;
	SET	Have;
		BY	Date	NOTSORTED;
	ARRAY	Cats	{*}	Cat1	-	Cat7;
	ARRAY	Sums	{7}	_TEMPORARY_;
	ARRAY	Rates	{7}	Rate1	-	Rate7;
	FORMAT	Rate1	-	Rate7	PERCENT10.2;

	DO	_i						=	1	TO	DIM(Cats);
		IF	Cats{_i}						THEN
			Sums{_i}			+	Value;
	END;

	IF	Last.Date							THEN
		DO;
			DO	_i				=	1	TO	DIM(Cats);
				Rates{_i}		=	Sums{_i}	/	7;
				CALL	MISSING(Sums{_i});
			END;
			OUTPUT;
		END;
	ELSE
		DO;
			DELETE;			
		END;
RUN;

Results:

Jim

andreas_lds · Posted 08-10-2021 12:50 AM

The values in the summary table don't match the source table you have posted. What is the role of the variable "Value"?

If you want to use sas, i strongly recommend to drop the idea of doing things the same way you would do them using Excel, you won't get far, if you stick to the concepts of the inferior software.

sasprogramming · Posted 08-10-2021 01:03 AM

Value is used in the rate calculation. The data i have posted is fine.

Take the rows in January in which cat1 = 1

Date	Cat1	Cat2	Cat3	Cat4	Cat5	Cat6	Cat7	Value
Jan-18	1	0	0	0	1	1	0	0
Jan-18	0	1	1	0	0	1	1	0
Jan-18	0	1	0	1	0	1	1	1
	0	1	1	0	0	1	0	0
Jan-18	1	0	0	0	0	0	0	1
Jan-18	0	1	0	1	0	0	1	0
Jan-18	0	1	0	1	1	0	1	1
Jan-18	0	0	1	0	0	1	1	1
Jan-18	1	1	1	0	0	0	1	1
Jan-18	1	0	0	0	1	1	0	1

CA rate for cat1 Jan = sum of value = 3 divided by sum of cat = 4, therefore rate = 3/4

jimbarbour · Posted 08-10-2021 01:05 AM

@sasprogramming wrote:

CA rate for cat1 Jan = sum of value = 3 divided by sum of cat = 4, therefore rate = 3/4

OK, well, then what I posted earlier is a bit off and needs adjustment. Let me tweak it a bit.

Jim

jimbarbour · Posted 08-10-2021 01:17 AM

I think I have slightly different data based on the original posting -- or I'm misunderstanding (which is entirely possible).

Based on the data I have (see posted below), I get a 5 for the denominator for January, which is the count of "1's" in the Cat1 column. For the numerator, I get a 3, which is the count of "1's" in the Value column with a corresponding "1" in the Cat1 column. So, 3/5 = 0.6 or 60%, yes? Full results, below, followed by revised code.

January Data:

Jan-18-2021	1	0	0	0	1	1	0	0
Jan-18-2021	0	1	1	0	0	1	1	0
Jan-18-2021	0	1	0	1	0	1	1	1
Jan-18-2021	1	1	0	0	1	0	0
Jan-18-2021	1	0	0	0	0	0	0	1
Jan-18-2021	0	1	0	1	0	0	1	0
Jan-18-2021	0	1	0	1	1	0	1	1
Jan-18-2021	0	0	1	0	0	1	1	1
Jan-18-2021	1	1	1	0	0	0	1	1
Jan-18-2021	1	0	0	0	1	1	0	1
Jan-18-2021	0	0	0	1	0	1	1	1

Results (all months):

Code:

DATA	Have;
	FORMAT	Date	YYMMD7.;
	INFILE	DATALINES	DSD	DLM='09'X	MISSOVER;
	INPUT	
		Date	:	ANYDTDTE11.	Cat1	Cat2	Cat3	Cat4	Cat5	Cat6	Cat7	Value;

DATALINES;
Jan-18-2021	1	0	0	0	1	1	0	0
Jan-18-2021	0	1	1	0	0	1	1	0
Jan-18-2021	0	1	0	1	0	1	1	1
Jan-18-2021	1	1	0	0	1	0	0
Jan-18-2021	1	0	0	0	0	0	0	1
Jan-18-2021	0	1	0	1	0	0	1	0
Jan-18-2021	0	1	0	1	1	0	1	1
Jan-18-2021	0	0	1	0	0	1	1	1
Jan-18-2021	1	1	1	0	0	0	1	1
Jan-18-2021	1	0	0	0	1	1	0	1
Jan-18-2021	0	0	0	1	0	1	1	1
Feb-18-2021	1	1	1	0	1	0	1	0
Feb-18-2021	0	1	0	1	0	0	0	0
Feb-18-2021	1	0	1	0	0	1	0	1
Feb-18-2021	0	0	1	0	1	1	0	1
Feb-18-2021	1	1	0	0	1	0	1	1
Feb-18-2021	1	0	0	1	0	0	1	1
Feb-18-2021	0	1	1	1	0	1	0	0
Feb-18-2021	1	1	0	1	1	1	1	0
Mar-18-2021	0	1	0	1	1	0	0	0
Mar-18-2021	1	1	0	1	0	1	0	0
Mar-18-2021	1	1	1	0	0	1	0	1
Mar-18-2021	0	1	0	0	0	1	1	0
Mar-18-2021	1	1	1	0	1	1	0	1
Mar-18-2021	1	1	1	1	1	0	1	1
Mar-18-2021	1	0	0	1	0	1	0	0
Mar-18-2021	1	1	0	0	0	0	0	0
Mar-18-2021	1	1	1	1	1	1	0	1
Mar-18-2021	0	1	0	1	1	1	1	1
Mar-18-2021	0	1	1	0	0	0	0	0
Mar-18-2021	0	1	1	1	0	0	0	0
Mar-18-2021	0	1	1	1	1	1	1	0
Mar-18-2021	0	0	1	1	1	1	1	0
Mar-18-2021	1	0	0	1	0	1	1	1
;
RUN;

DATA	Want;
	DROP	_:	Cat1 - Cat7	Value;
	SET	Have;
		BY	Date	NOTSORTED;
	ARRAY	Cats	{*}	Cat1	-	Cat7;
	ARRAY	Sums	{7}	_TEMPORARY_;
	ARRAY	Cat_Cnt	{7}	_TEMPORARY_;
	ARRAY	Rates	{7}	Rate1	-	Rate7;
	FORMAT	Rate1	-	Rate7	PERCENT10.2;

	DO	_i						=	1	TO	DIM(Cats);
		IF	Cats{_i}						THEN
			DO;
				Sums{_i}		+	Value;
				Cat_Cnt{_i}		+	1;
			END;
	END;

	IF	Last.Date							THEN
		DO;
			DO	_i				=	1	TO	DIM(Cats);
				Rates{_i}		=	Sums{_i}	/	Cat_Cnt{_i};
				CALL	MISSING(Sums{_i}, Cat_Cnt{_i});
			END;
			OUTPUT;
		END;
	ELSE
		DO;
			DELETE;			
		END;
RUN;

Jim

ChrisNZ · Posted 08-10-2021 05:38 AM

Like this?

proc sql;
  select DATE
       , sum(VALUE)/sum(CAT1) as RATE1
       , sum(VALUE)/sum(CAT2) as RATE2
       ...
from HAVE
group by DATE;

High-Performance SAS Coding - Third Edition

ChrisNZ · Posted 08-10-2021 05:43 AM

My bad, I misunderstood. Like this?

proc sql;
  select DATE
       , sum(VALUE*(CAT1=1))/sum(CAT1=1) as RATE1
       ...
from HAVE
group by DATE;

High-Performance SAS Coding - Third Edition

jimbarbour · Posted 08-10-2021 12:03 PM

@ChrisNZ wrote:

proc sql;
  select DATE
       , sum(VALUE*(CAT1=1))/sum(CAT1=1) as RATE1
       ...
from HAVE
group by DATE;

Oh! Now, that's danged clever. And I suppose that if the CATn variables had possible values greater than 1, then we could do the following:

proc sql;
  select DATE
       , sum(VALUE*(CAT1>0))/sum(IFN(CAT1>0,CAT1,0)) as RATE1	FORMAT=PERCENT10.2
       , sum(VALUE*(CAT2>0))/sum(IFN(CAT2>0,CAT2,0)) as RATE2	FORMAT=PERCENT10.2
       , sum(VALUE*(CAT3>0))/sum(IFN(CAT3>0,CAT3,0)) as RATE3	FORMAT=PERCENT10.2
       , sum(VALUE*(CAT4>0))/sum(IFN(CAT4>0,CAT4,0)) as RATE4	FORMAT=PERCENT10.2
       , sum(VALUE*(CAT5>0))/sum(IFN(CAT5>0,CAT5,0)) as RATE5	FORMAT=PERCENT10.2
       , sum(VALUE*(CAT6>0))/sum(IFN(CAT6>0,CAT6,0)) as RATE6	FORMAT=PERCENT10.2
       , sum(VALUE*(CAT7>0))/sum(IFN(CAT7>0,CAT7,0)) as RATE7	FORMAT=PERCENT10.2
from HAVE
group by DATE;

Jim

Tom · Posted 08-10-2021 03:10 PM

@sasprogramming wrote:

Value is used in the rate calculation. The data i have posted is fine.

Take the rows in January in which cat1 = 1

Date Cat1 Cat2 Cat3 Cat4 Cat5 Cat6 Cat7 Value

Jan-18 1 0 0 0 1 1 0 0

Jan-18 0 1 1 0 0 1 1 0

Jan-18 0 1 0 1 0 1 1 1

0 1 1 0 0 1 0 0

Jan-18 1 0 0 0 0 0 0 1

Jan-18 0 1 0 1 0 0 1 0

Jan-18 0 1 0 1 1 0 1 1

Jan-18 0 0 1 0 0 1 1 1

Jan-18 1 1 1 0 0 0 1 1

Jan-18 1 0 0 0 1 1 0 1

CA rate for cat1 Jan = sum of value = 3 divided by sum of cat = 4, therefore rate = 3/4

I cannot understand where the 3 and 4 come from in your calculations. There are four observations with CAT1=1. Is that how you got 4? But then where did the 3 come from?

ballardw · Posted 08-10-2021 06:05 AM

Is that value of date supposed to be missing?

First step: make sure all of your "grouping" variables, such as a date are not missing. Fill in if they are missing before running any of the suggested solutions.

Second, your "want" only shows Jan, Feb etc. Do you have data from more than one year in the Date variable? If so, is the "want" supposed to only reflect one year, apparently 18 (bad juju can arise with reliance on 2-digit years).

PaigeMiller · Posted 08-10-2021 08:08 AM

PROC REPORT can do this easily. The mean of the 0/1 binary variables is the percent you want.

proc report data=have;
    columns date cat1-cat7;
    define date/group "Date";
    define cat1/mean format=percent8.2;
    define cat2/mean format=percent8.2;
    define cat3/mean format=percent8.2;
    define cat4/mean format=percent8.2;
    define cat5/mean format=percent8.2;
    define cat6/mean format=percent8.2;
    define cat7/mean format=percent8.2;
run;

--
Paige Miller

jimbarbour · Posted 08-10-2021 11:56 AM

@PaigeMiller,

For the Rate: If the numerator is determined by the Value column and the denominator is determined by the Cat column, would a Proc Report give us the result we would want in this case? An excellent technique in general though; one that didn't even occur to me.

Jim

PaigeMiller · Posted 08-10-2021 12:22 PM

Yes, PROC REPORT could be modified so that the value column is used in the denominator. It was never clear to me from reading the original post that the percents used the value column in the calculations. and the Excel example is something I ignore, as I don't like and don't use Excel.

--
Paige Miller

Tom · Posted 08-10-2021 03:36 PM

Seems simple enough.

data want;
  do until(last.date);
    set have;
    by date;
    array cat[7];
    array top[7];
    array bottom[7];
    array rate[7];
    do index=1 to dim(cat);
      if cat[index] then do;
        top[index]=sum(top[index],value);
        bottom[index]=sum(bottom[index],1);
      end;
    end;
  end;
  do index=1 to dim(cat);
    rate[index]=divide(top[index],bottom[index]);
  end;
  drop index cat: top: bottom: value;
run;

Obs          date    rate1     rate2     rate3    rate4     rate5      rate6      rate7

 1     2021-01-18    0.750    0.50000     0.5      0.75    0.66667    0.57143    0.71429
 2     2021-02-18    0.600    0.20000     0.5      0.25    0.50000    0.50000    0.50000
 3     2021-03-18    0.625    0.41667     0.5      0.40    0.57143    0.50000    0.50000

But it would be much easier with a different structure where the category number is not stored in the NAME of a variable but instead as the VALUE of a variable instead.

data have ;
  input date :date. value @ ;
  do cat=1 to 7;
    input result @;
    output;
  end;
  format date yymmdd10.;
DATALINES;
18-Jan-2021 0 1 0 0 0 1 1 0 0
18-Jan-2021 0 0 1 1 0 0 1 1 0
18-Jan-2021 1 0 1 0 1 0 1 1 1
18-Jan-2021 0 0 1 1 0 0 1 0 0
18-Jan-2021 1 1 0 0 0 0 0 0 1
18-Jan-2021 0 0 1 0 1 0 0 1 0
18-Jan-2021 1 0 1 0 1 1 0 1 1
18-Jan-2021 1 0 0 1 0 0 1 1 1
18-Jan-2021 1 1 1 1 0 0 0 1 1
18-Jan-2021 1 1 0 0 0 1 1 0 1
18-Jan-2021 1 0 0 0 1 0 1 1 1
18-Feb-2021 0 1 1 1 0 1 0 1 0
18-Feb-2021 0 0 1 0 1 0 0 0 0
18-Feb-2021 1 1 0 1 0 0 1 0 1
18-Feb-2021 1 0 0 1 0 1 1 0 1
18-Feb-2021 1 1 1 0 0 1 0 1 1
18-Feb-2021 1 1 0 0 1 0 0 1 1
18-Feb-2021 0 0 1 1 1 0 1 0 0
18-Feb-2021 0 1 1 0 1 1 1 1 0
18-Mar-2021 0 0 1 0 1 1 0 0 0
18-Mar-2021 0 1 1 0 1 0 1 0 0
18-Mar-2021 1 1 1 1 0 0 1 0 1
18-Mar-2021 0 0 1 0 0 0 1 1 0
18-Mar-2021 1 1 1 1 0 1 1 0 1
18-Mar-2021 1 1 1 1 1 1 0 1 1
18-Mar-2021 0 1 0 0 1 0 1 0 0
18-Mar-2021 0 1 1 0 0 0 0 0 0
18-Mar-2021 1 1 1 1 1 1 1 0 1
18-Mar-2021 1 0 1 0 1 1 1 1 1
18-Mar-2021 0 0 1 1 0 0 0 0 0
18-Mar-2021 0 0 1 1 1 0 0 0 0
18-Mar-2021 0 0 1 1 1 1 1 1 0
18-Mar-2021 0 0 0 1 1 1 1 1 0
18-Mar-2021 1 1 0 0 1 0 1 1 1
;

proc sql ;
create table want as
select date,cat
     , sum(result*value)/sum(result) as rate
from have
group by 1,2
;
quit;

Which you could then use to make a REPORT that looks like your request.

proc report data=want;
 column date rate,cat ;
 define date/group;
 define rate/sum ' ';
 define cat/across ' ';
run;

        date          1          2          3          4          5          6          7
  2021-01-18       0.75        0.5        0.5       0.75  0.6666667  0.5714286  0.7142857
  2021-02-18        0.6        0.2        0.5       0.25        0.5        0.5        0.5
  2021-03-18      0.625  0.4166667        0.5        0.4  0.5714286        0.5        0.5

Registration is open

SAS Training: Just a Click Away