I have data like below.
Date | Cat1 | Cat2 | Cat3 | Cat4 | Cat5 | Cat6 | Cat7 | Value |
Jan-18 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
Jan-18 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 |
Jan-18 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 |
0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | |
Jan-18 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
Jan-18 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
Jan-18 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 |
Jan-18 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 |
Jan-18 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 |
Jan-18 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 |
Jan-18 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 |
Feb-18 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 |
Feb-18 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
Feb-18 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 |
Feb-18 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 |
Feb-18 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 |
Feb-18 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
Feb-18 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 |
Feb-18 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 |
Mar-18 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 |
Mar-18 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
Mar-18 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 |
Mar-18 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 |
Mar-18 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 |
Mar-18 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 |
Mar-18 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
Mar-18 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
Mar-18 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
Mar-18 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
Mar-18 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
Mar-18 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
Mar-18 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 |
Mar-18 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 |
Mar-18 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 |
I would like to create in sas the following summary table:
Rate Cat 1 | Rate Cat 2 | Rate Cat 3 | Rate Cat 4 | Rate Cat 5 | Rate Cat 6 | Rate Cat 7 | |
Jan | 75% | 50% | 50% | 75% | 67% | 57% | 71% |
Feb | 60% | 20% | 50% | 25% | 50% | 50% | 50% |
March | 63% | 42% | 50% | 40% | 57% | 50% | 50% |
These are rates are calculated in excel using an average if.
E.g for for jan Rate cat1:
So the jan rate cat1 = the average of the `value` in January, when Cat 1 =1.
I am looking to do the same thing as what I have done in excel, but in SAS
My bad, I misunderstood. Like this?
proc sql;
select DATE
, sum(VALUE*(CAT1=1))/sum(CAT1=1) as RATE1
...
from HAVE
group by DATE;
Well, there may be a more elegant way to do it, but you could use multiple arrays, two at a time. The first would hold the 1 or 0 for the category and the second would hold the value to be averaged. As you loop through the 1/0 array, the vale of the second array is added to an accumulator, and then at the end of the iteration for a given month, the average is computed. Subsequent categories would be computed in like manner, all results being stored in a results array.
Something like the below. See results below code. Is this along the lines of what you're looking for?
DATA Have;
FORMAT Date YYMMD7.;
INFILE DATALINES DSD DLM='09'X MISSOVER;
INPUT
Date : ANYDTDTE11. Cat1 Cat2 Cat3 Cat4 Cat5 Cat6 Cat7 Value;
DATALINES;
Jan-18-2021 1 0 0 0 1 1 0 0
Jan-18-2021 0 1 1 0 0 1 1 0
Jan-18-2021 0 1 0 1 0 1 1 1
Jan-18-2021 1 1 0 0 1 0 0
Jan-18-2021 1 0 0 0 0 0 0 1
Jan-18-2021 0 1 0 1 0 0 1 0
Jan-18-2021 0 1 0 1 1 0 1 1
Jan-18-2021 0 0 1 0 0 1 1 1
Jan-18-2021 1 1 1 0 0 0 1 1
Jan-18-2021 1 0 0 0 1 1 0 1
Jan-18-2021 0 0 0 1 0 1 1 1
Feb-18-2021 1 1 1 0 1 0 1 0
Feb-18-2021 0 1 0 1 0 0 0 0
Feb-18-2021 1 0 1 0 0 1 0 1
Feb-18-2021 0 0 1 0 1 1 0 1
Feb-18-2021 1 1 0 0 1 0 1 1
Feb-18-2021 1 0 0 1 0 0 1 1
Feb-18-2021 0 1 1 1 0 1 0 0
Feb-18-2021 1 1 0 1 1 1 1 0
Mar-18-2021 0 1 0 1 1 0 0 0
Mar-18-2021 1 1 0 1 0 1 0 0
Mar-18-2021 1 1 1 0 0 1 0 1
Mar-18-2021 0 1 0 0 0 1 1 0
Mar-18-2021 1 1 1 0 1 1 0 1
Mar-18-2021 1 1 1 1 1 0 1 1
Mar-18-2021 1 0 0 1 0 1 0 0
Mar-18-2021 1 1 0 0 0 0 0 0
Mar-18-2021 1 1 1 1 1 1 0 1
Mar-18-2021 0 1 0 1 1 1 1 1
Mar-18-2021 0 1 1 0 0 0 0 0
Mar-18-2021 0 1 1 1 0 0 0 0
Mar-18-2021 0 1 1 1 1 1 1 0
Mar-18-2021 0 0 1 1 1 1 1 0
Mar-18-2021 1 0 0 1 0 1 1 1
;
RUN;
DATA Want;
DROP _: Cat1 - Cat7 Value;
SET Have;
BY Date NOTSORTED;
ARRAY Cats {*} Cat1 - Cat7;
ARRAY Sums {7} _TEMPORARY_;
ARRAY Rates {7} Rate1 - Rate7;
FORMAT Rate1 - Rate7 PERCENT10.2;
DO _i = 1 TO DIM(Cats);
IF Cats{_i} THEN
Sums{_i} + Value;
END;
IF Last.Date THEN
DO;
DO _i = 1 TO DIM(Cats);
Rates{_i} = Sums{_i} / 7;
CALL MISSING(Sums{_i});
END;
OUTPUT;
END;
ELSE
DO;
DELETE;
END;
RUN;
Results:
Jim
The values in the summary table don't match the source table you have posted. What is the role of the variable "Value"?
If you want to use sas, i strongly recommend to drop the idea of doing things the same way you would do them using Excel, you won't get far, if you stick to the concepts of the inferior software.
Value is used in the rate calculation. The data i have posted is fine.
Take the rows in January in which cat1 = 1
Date | Cat1 | Cat2 | Cat3 | Cat4 | Cat5 | Cat6 | Cat7 | Value |
Jan-18 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
Jan-18 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 |
Jan-18 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 |
0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | |
Jan-18 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
Jan-18 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
Jan-18 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 |
Jan-18 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 |
Jan-18 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 |
Jan-18 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 |
CA rate for cat1 Jan = sum of value = 3 divided by sum of cat = 4, therefore rate = 3/4
@sasprogramming wrote:
CA rate for cat1 Jan = sum of value = 3 divided by sum of cat = 4, therefore rate = 3/4
OK, well, then what I posted earlier is a bit off and needs adjustment. Let me tweak it a bit.
Jim
I think I have slightly different data based on the original posting -- or I'm misunderstanding (which is entirely possible).
Based on the data I have (see posted below), I get a 5 for the denominator for January, which is the count of "1's" in the Cat1 column. For the numerator, I get a 3, which is the count of "1's" in the Value column with a corresponding "1" in the Cat1 column. So, 3/5 = 0.6 or 60%, yes? Full results, below, followed by revised code.
January Data:
Jan-18-2021 1 0 0 0 1 1 0 0 Jan-18-2021 0 1 1 0 0 1 1 0 Jan-18-2021 0 1 0 1 0 1 1 1 Jan-18-2021 1 1 0 0 1 0 0 Jan-18-2021 1 0 0 0 0 0 0 1 Jan-18-2021 0 1 0 1 0 0 1 0 Jan-18-2021 0 1 0 1 1 0 1 1 Jan-18-2021 0 0 1 0 0 1 1 1 Jan-18-2021 1 1 1 0 0 0 1 1 Jan-18-2021 1 0 0 0 1 1 0 1 Jan-18-2021 0 0 0 1 0 1 1 1
Results (all months):
Code:
DATA Have;
FORMAT Date YYMMD7.;
INFILE DATALINES DSD DLM='09'X MISSOVER;
INPUT
Date : ANYDTDTE11. Cat1 Cat2 Cat3 Cat4 Cat5 Cat6 Cat7 Value;
DATALINES;
Jan-18-2021 1 0 0 0 1 1 0 0
Jan-18-2021 0 1 1 0 0 1 1 0
Jan-18-2021 0 1 0 1 0 1 1 1
Jan-18-2021 1 1 0 0 1 0 0
Jan-18-2021 1 0 0 0 0 0 0 1
Jan-18-2021 0 1 0 1 0 0 1 0
Jan-18-2021 0 1 0 1 1 0 1 1
Jan-18-2021 0 0 1 0 0 1 1 1
Jan-18-2021 1 1 1 0 0 0 1 1
Jan-18-2021 1 0 0 0 1 1 0 1
Jan-18-2021 0 0 0 1 0 1 1 1
Feb-18-2021 1 1 1 0 1 0 1 0
Feb-18-2021 0 1 0 1 0 0 0 0
Feb-18-2021 1 0 1 0 0 1 0 1
Feb-18-2021 0 0 1 0 1 1 0 1
Feb-18-2021 1 1 0 0 1 0 1 1
Feb-18-2021 1 0 0 1 0 0 1 1
Feb-18-2021 0 1 1 1 0 1 0 0
Feb-18-2021 1 1 0 1 1 1 1 0
Mar-18-2021 0 1 0 1 1 0 0 0
Mar-18-2021 1 1 0 1 0 1 0 0
Mar-18-2021 1 1 1 0 0 1 0 1
Mar-18-2021 0 1 0 0 0 1 1 0
Mar-18-2021 1 1 1 0 1 1 0 1
Mar-18-2021 1 1 1 1 1 0 1 1
Mar-18-2021 1 0 0 1 0 1 0 0
Mar-18-2021 1 1 0 0 0 0 0 0
Mar-18-2021 1 1 1 1 1 1 0 1
Mar-18-2021 0 1 0 1 1 1 1 1
Mar-18-2021 0 1 1 0 0 0 0 0
Mar-18-2021 0 1 1 1 0 0 0 0
Mar-18-2021 0 1 1 1 1 1 1 0
Mar-18-2021 0 0 1 1 1 1 1 0
Mar-18-2021 1 0 0 1 0 1 1 1
;
RUN;
DATA Want;
DROP _: Cat1 - Cat7 Value;
SET Have;
BY Date NOTSORTED;
ARRAY Cats {*} Cat1 - Cat7;
ARRAY Sums {7} _TEMPORARY_;
ARRAY Cat_Cnt {7} _TEMPORARY_;
ARRAY Rates {7} Rate1 - Rate7;
FORMAT Rate1 - Rate7 PERCENT10.2;
DO _i = 1 TO DIM(Cats);
IF Cats{_i} THEN
DO;
Sums{_i} + Value;
Cat_Cnt{_i} + 1;
END;
END;
IF Last.Date THEN
DO;
DO _i = 1 TO DIM(Cats);
Rates{_i} = Sums{_i} / Cat_Cnt{_i};
CALL MISSING(Sums{_i}, Cat_Cnt{_i});
END;
OUTPUT;
END;
ELSE
DO;
DELETE;
END;
RUN;
Jim
Like this?
proc sql;
select DATE
, sum(VALUE)/sum(CAT1) as RATE1
, sum(VALUE)/sum(CAT2) as RATE2
...
from HAVE
group by DATE;
My bad, I misunderstood. Like this?
proc sql;
select DATE
, sum(VALUE*(CAT1=1))/sum(CAT1=1) as RATE1
...
from HAVE
group by DATE;
@ChrisNZ wrote:proc sql; select DATE , sum(VALUE*(CAT1=1))/sum(CAT1=1) as RATE1 ... from HAVE group by DATE;
Oh! Now, that's danged clever. And I suppose that if the CATn variables had possible values greater than 1, then we could do the following:
proc sql;
select DATE
, sum(VALUE*(CAT1>0))/sum(IFN(CAT1>0,CAT1,0)) as RATE1 FORMAT=PERCENT10.2
, sum(VALUE*(CAT2>0))/sum(IFN(CAT2>0,CAT2,0)) as RATE2 FORMAT=PERCENT10.2
, sum(VALUE*(CAT3>0))/sum(IFN(CAT3>0,CAT3,0)) as RATE3 FORMAT=PERCENT10.2
, sum(VALUE*(CAT4>0))/sum(IFN(CAT4>0,CAT4,0)) as RATE4 FORMAT=PERCENT10.2
, sum(VALUE*(CAT5>0))/sum(IFN(CAT5>0,CAT5,0)) as RATE5 FORMAT=PERCENT10.2
, sum(VALUE*(CAT6>0))/sum(IFN(CAT6>0,CAT6,0)) as RATE6 FORMAT=PERCENT10.2
, sum(VALUE*(CAT7>0))/sum(IFN(CAT7>0,CAT7,0)) as RATE7 FORMAT=PERCENT10.2
from HAVE
group by DATE;
Jim
@sasprogramming wrote:
Value is used in the rate calculation. The data i have posted is fine.
Take the rows in January in which cat1 = 1
Date Cat1 Cat2 Cat3 Cat4 Cat5 Cat6 Cat7 Value Jan-18 1 0 0 0 1 1 0 0 Jan-18 0 1 1 0 0 1 1 0 Jan-18 0 1 0 1 0 1 1 1 0 1 1 0 0 1 0 0 Jan-18 1 0 0 0 0 0 0 1 Jan-18 0 1 0 1 0 0 1 0 Jan-18 0 1 0 1 1 0 1 1 Jan-18 0 0 1 0 0 1 1 1 Jan-18 1 1 1 0 0 0 1 1 Jan-18 1 0 0 0 1 1 0 1
CA rate for cat1 Jan = sum of value = 3 divided by sum of cat = 4, therefore rate = 3/4
I cannot understand where the 3 and 4 come from in your calculations. There are four observations with CAT1=1. Is that how you got 4? But then where did the 3 come from?
Is that value of date supposed to be missing?
First step: make sure all of your "grouping" variables, such as a date are not missing. Fill in if they are missing before running any of the suggested solutions.
Second, your "want" only shows Jan, Feb etc. Do you have data from more than one year in the Date variable? If so, is the "want" supposed to only reflect one year, apparently 18 (bad juju can arise with reliance on 2-digit years).
PROC REPORT can do this easily. The mean of the 0/1 binary variables is the percent you want.
proc report data=have;
columns date cat1-cat7;
define date/group "Date";
define cat1/mean format=percent8.2;
define cat2/mean format=percent8.2;
define cat3/mean format=percent8.2;
define cat4/mean format=percent8.2;
define cat5/mean format=percent8.2;
define cat6/mean format=percent8.2;
define cat7/mean format=percent8.2;
run;
For the Rate: If the numerator is determined by the Value column and the denominator is determined by the Cat column, would a Proc Report give us the result we would want in this case? An excellent technique in general though; one that didn't even occur to me.
Jim
Yes, PROC REPORT could be modified so that the value column is used in the denominator. It was never clear to me from reading the original post that the percents used the value column in the calculations. and the Excel example is something I ignore, as I don't like and don't use Excel.
Seems simple enough.
data want;
do until(last.date);
set have;
by date;
array cat[7];
array top[7];
array bottom[7];
array rate[7];
do index=1 to dim(cat);
if cat[index] then do;
top[index]=sum(top[index],value);
bottom[index]=sum(bottom[index],1);
end;
end;
end;
do index=1 to dim(cat);
rate[index]=divide(top[index],bottom[index]);
end;
drop index cat: top: bottom: value;
run;
Obs date rate1 rate2 rate3 rate4 rate5 rate6 rate7 1 2021-01-18 0.750 0.50000 0.5 0.75 0.66667 0.57143 0.71429 2 2021-02-18 0.600 0.20000 0.5 0.25 0.50000 0.50000 0.50000 3 2021-03-18 0.625 0.41667 0.5 0.40 0.57143 0.50000 0.50000
But it would be much easier with a different structure where the category number is not stored in the NAME of a variable but instead as the VALUE of a variable instead.
data have ;
input date :date. value @ ;
do cat=1 to 7;
input result @;
output;
end;
format date yymmdd10.;
DATALINES;
18-Jan-2021 0 1 0 0 0 1 1 0 0
18-Jan-2021 0 0 1 1 0 0 1 1 0
18-Jan-2021 1 0 1 0 1 0 1 1 1
18-Jan-2021 0 0 1 1 0 0 1 0 0
18-Jan-2021 1 1 0 0 0 0 0 0 1
18-Jan-2021 0 0 1 0 1 0 0 1 0
18-Jan-2021 1 0 1 0 1 1 0 1 1
18-Jan-2021 1 0 0 1 0 0 1 1 1
18-Jan-2021 1 1 1 1 0 0 0 1 1
18-Jan-2021 1 1 0 0 0 1 1 0 1
18-Jan-2021 1 0 0 0 1 0 1 1 1
18-Feb-2021 0 1 1 1 0 1 0 1 0
18-Feb-2021 0 0 1 0 1 0 0 0 0
18-Feb-2021 1 1 0 1 0 0 1 0 1
18-Feb-2021 1 0 0 1 0 1 1 0 1
18-Feb-2021 1 1 1 0 0 1 0 1 1
18-Feb-2021 1 1 0 0 1 0 0 1 1
18-Feb-2021 0 0 1 1 1 0 1 0 0
18-Feb-2021 0 1 1 0 1 1 1 1 0
18-Mar-2021 0 0 1 0 1 1 0 0 0
18-Mar-2021 0 1 1 0 1 0 1 0 0
18-Mar-2021 1 1 1 1 0 0 1 0 1
18-Mar-2021 0 0 1 0 0 0 1 1 0
18-Mar-2021 1 1 1 1 0 1 1 0 1
18-Mar-2021 1 1 1 1 1 1 0 1 1
18-Mar-2021 0 1 0 0 1 0 1 0 0
18-Mar-2021 0 1 1 0 0 0 0 0 0
18-Mar-2021 1 1 1 1 1 1 1 0 1
18-Mar-2021 1 0 1 0 1 1 1 1 1
18-Mar-2021 0 0 1 1 0 0 0 0 0
18-Mar-2021 0 0 1 1 1 0 0 0 0
18-Mar-2021 0 0 1 1 1 1 1 1 0
18-Mar-2021 0 0 0 1 1 1 1 1 0
18-Mar-2021 1 1 0 0 1 0 1 1 1
;
proc sql ;
create table want as
select date,cat
, sum(result*value)/sum(result) as rate
from have
group by 1,2
;
quit;
Which you could then use to make a REPORT that looks like your request.
proc report data=want;
column date rate,cat ;
define date/group;
define rate/sum ' ';
define cat/across ' ';
run;
date 1 2 3 4 5 6 7 2021-01-18 0.75 0.5 0.5 0.75 0.6666667 0.5714286 0.7142857 2021-02-18 0.6 0.2 0.5 0.25 0.5 0.5 0.5 2021-03-18 0.625 0.4166667 0.5 0.4 0.5714286 0.5 0.5
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.