BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

I have data like below.

Date Cat1 Cat2 Cat3 Cat4 Cat5 Cat6 Cat7 Value
Jan-18 1 0 0 0 1 1 0 0
Jan-18 0 1 1 0 0 1 1 0
Jan-18 0 1 0 1 0 1 1 1
  0 1 1 0 0 1 0 0
Jan-18 1 0 0 0 0 0 0 1
Jan-18 0 1 0 1 0 0 1 0
Jan-18 0 1 0 1 1 0 1 1
Jan-18 0 0 1 0 0 1 1 1
Jan-18 1 1 1 0 0 0 1 1
Jan-18 1 0 0 0 1 1 0 1
Jan-18 0 0 0 1 0 1 1 1
Feb-18 1 1 1 0 1 0 1 0
Feb-18 0 1 0 1 0 0 0 0
Feb-18 1 0 1 0 0 1 0 1
Feb-18 0 0 1 0 1 1 0 1
Feb-18 1 1 0 0 1 0 1 1
Feb-18 1 0 0 1 0 0 1 1
Feb-18 0 1 1 1 0 1 0 0
Feb-18 1 1 0 1 1 1 1 0
Mar-18 0 1 0 1 1 0 0 0
Mar-18 1 1 0 1 0 1 0 0
Mar-18 1 1 1 0 0 1 0 1
Mar-18 0 1 0 0 0 1 1 0
Mar-18 1 1 1 0 1 1 0 1
Mar-18 1 1 1 1 1 0 1 1
Mar-18 1 0 0 1 0 1 0 0
Mar-18 1 1 0 0 0 0 0 0
Mar-18 1 1 1 1 1 1 0 1
Mar-18 0 1 0 1 1 1 1 1
Mar-18 0 1 1 0 0 0 0 0
Mar-18 0 1 1 1 0 0 0 0
Mar-18 0 1 1 1 1 1 1 0
Mar-18 0 0 1 1 1 1 1 0
Mar-18 1 0 0 1 0 1 1 1

 

I would like to create in sas the following summary table:

  Rate Cat 1 Rate Cat 2 Rate Cat 3 Rate Cat 4 Rate Cat 5 Rate Cat 6 Rate Cat 7
Jan 75% 50% 50% 75% 67% 57% 71%
Feb 60% 20% 50% 25% 50% 50% 50%
March 63% 42% 50% 40% 57% 50% 50%

 

These are rates are calculated in excel using an average if.

E.g for for jan Rate cat1:

sasprogramming_0-1628564628198.png

So the jan rate cat1 = the average of the `value` in January, when Cat 1 =1.

 

I am looking to do the same thing as what I have done in excel, but in SAS

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

My bad, I misunderstood. Like this?

proc sql;
  select DATE
       , sum(VALUE*(CAT1=1))/sum(CAT1=1) as RATE1
       ...
from HAVE
group by DATE;

 

View solution in original post

14 REPLIES 14
jimbarbour
Meteorite | Level 14

Well, there may be a more elegant way to do it, but you could use multiple arrays, two at a time.  The first would hold the 1 or 0 for the category and the second would hold the value to be averaged.  As you loop through the 1/0 array, the vale of the second array is added to an accumulator, and then at the end of the iteration for a given month, the average is computed.  Subsequent categories would be computed in like manner, all results being stored in a results array.

 

Something like the below.  See results below code.  Is this along the lines of what you're looking for?

DATA	Have;
	FORMAT	Date	YYMMD7.;
	INFILE	DATALINES	DSD	DLM='09'X	MISSOVER;
	INPUT	
		Date	:	ANYDTDTE11.	Cat1	Cat2	Cat3	Cat4	Cat5	Cat6	Cat7	Value;

DATALINES;
Jan-18-2021	1	0	0	0	1	1	0	0
Jan-18-2021	0	1	1	0	0	1	1	0
Jan-18-2021	0	1	0	1	0	1	1	1
Jan-18-2021	1	1	0	0	1	0	0
Jan-18-2021	1	0	0	0	0	0	0	1
Jan-18-2021	0	1	0	1	0	0	1	0
Jan-18-2021	0	1	0	1	1	0	1	1
Jan-18-2021	0	0	1	0	0	1	1	1
Jan-18-2021	1	1	1	0	0	0	1	1
Jan-18-2021	1	0	0	0	1	1	0	1
Jan-18-2021	0	0	0	1	0	1	1	1
Feb-18-2021	1	1	1	0	1	0	1	0
Feb-18-2021	0	1	0	1	0	0	0	0
Feb-18-2021	1	0	1	0	0	1	0	1
Feb-18-2021	0	0	1	0	1	1	0	1
Feb-18-2021	1	1	0	0	1	0	1	1
Feb-18-2021	1	0	0	1	0	0	1	1
Feb-18-2021	0	1	1	1	0	1	0	0
Feb-18-2021	1	1	0	1	1	1	1	0
Mar-18-2021	0	1	0	1	1	0	0	0
Mar-18-2021	1	1	0	1	0	1	0	0
Mar-18-2021	1	1	1	0	0	1	0	1
Mar-18-2021	0	1	0	0	0	1	1	0
Mar-18-2021	1	1	1	0	1	1	0	1
Mar-18-2021	1	1	1	1	1	0	1	1
Mar-18-2021	1	0	0	1	0	1	0	0
Mar-18-2021	1	1	0	0	0	0	0	0
Mar-18-2021	1	1	1	1	1	1	0	1
Mar-18-2021	0	1	0	1	1	1	1	1
Mar-18-2021	0	1	1	0	0	0	0	0
Mar-18-2021	0	1	1	1	0	0	0	0
Mar-18-2021	0	1	1	1	1	1	1	0
Mar-18-2021	0	0	1	1	1	1	1	0
Mar-18-2021	1	0	0	1	0	1	1	1
;
RUN;

DATA	Want;
	DROP	_:	Cat1 - Cat7	Value;
	SET	Have;
		BY	Date	NOTSORTED;
	ARRAY	Cats	{*}	Cat1	-	Cat7;
	ARRAY	Sums	{7}	_TEMPORARY_;
	ARRAY	Rates	{7}	Rate1	-	Rate7;
	FORMAT	Rate1	-	Rate7	PERCENT10.2;

	DO	_i						=	1	TO	DIM(Cats);
		IF	Cats{_i}						THEN
			Sums{_i}			+	Value;
	END;

	IF	Last.Date							THEN
		DO;
			DO	_i				=	1	TO	DIM(Cats);
				Rates{_i}		=	Sums{_i}	/	7;
				CALL	MISSING(Sums{_i});
			END;
			OUTPUT;
		END;
	ELSE
		DO;
			DELETE;			
		END;
RUN;

 

Results:

jimbarbour_0-1628571577325.png

 

Jim

andreas_lds
Jade | Level 19

The values in the summary table don't match the source table you have posted. What is the role of the variable "Value"?

If you want to use sas, i strongly recommend to drop the idea of doing things the same way you would do them using Excel, you won't get far, if you stick to the concepts of the inferior software.

 

 

sasprogramming
Quartz | Level 8

Value is used in the rate calculation. The data i have posted is fine.

Take the rows in January in which cat1 = 1

Date Cat1 Cat2 Cat3 Cat4 Cat5 Cat6 Cat7 Value
Jan-18 1 0 0 0 1 1 0 0
Jan-18 0 1 1 0 0 1 1 0
Jan-18 0 1 0 1 0 1 1 1
  0 1 1 0 0 1 0 0
Jan-18 1 0 0 0 0 0 0 1
Jan-18 0 1 0 1 0 0 1 0
Jan-18 0 1 0 1 1 0 1 1
Jan-18 0 0 1 0 0 1 1 1
Jan-18 1 1 1 0 0 0 1 1
Jan-18 1 0 0 0 1 1 0 1

 

CA rate for cat1 Jan = sum of value = 3 divided by sum of cat = 4, therefore rate = 3/4

jimbarbour
Meteorite | Level 14

@sasprogramming wrote:

 

CA rate for cat1 Jan = sum of value = 3 divided by sum of cat = 4, therefore rate = 3/4


OK, well, then what I posted earlier is a bit off and needs adjustment.  Let me tweak it a bit.

 

Jim

jimbarbour
Meteorite | Level 14

I think I have slightly different data based on the original posting -- or I'm misunderstanding (which is entirely possible).

Based on the data I have (see posted below), I get a 5 for the denominator for January, which is the count of "1's" in the Cat1 column.  For the numerator, I get a 3, which is the count of "1's" in the Value column with a corresponding "1" in the Cat1 column.  So, 3/5 = 0.6 or 60%, yes?  Full results, below, followed by revised code.

 

January Data:

 

Jan-18-2021	1	0	0	0	1	1	0	0
Jan-18-2021	0	1	1	0	0	1	1	0
Jan-18-2021	0	1	0	1	0	1	1	1
Jan-18-2021	1	1	0	0	1	0	0
Jan-18-2021	1	0	0	0	0	0	0	1
Jan-18-2021	0	1	0	1	0	0	1	0
Jan-18-2021	0	1	0	1	1	0	1	1
Jan-18-2021	0	0	1	0	0	1	1	1
Jan-18-2021	1	1	1	0	0	0	1	1
Jan-18-2021	1	0	0	0	1	1	0	1
Jan-18-2021	0	0	0	1	0	1	1	1

 

Results (all months):

jimbarbour_0-1628572574475.png

 

Code:

DATA	Have;
	FORMAT	Date	YYMMD7.;
	INFILE	DATALINES	DSD	DLM='09'X	MISSOVER;
	INPUT	
		Date	:	ANYDTDTE11.	Cat1	Cat2	Cat3	Cat4	Cat5	Cat6	Cat7	Value;

DATALINES;
Jan-18-2021	1	0	0	0	1	1	0	0
Jan-18-2021	0	1	1	0	0	1	1	0
Jan-18-2021	0	1	0	1	0	1	1	1
Jan-18-2021	1	1	0	0	1	0	0
Jan-18-2021	1	0	0	0	0	0	0	1
Jan-18-2021	0	1	0	1	0	0	1	0
Jan-18-2021	0	1	0	1	1	0	1	1
Jan-18-2021	0	0	1	0	0	1	1	1
Jan-18-2021	1	1	1	0	0	0	1	1
Jan-18-2021	1	0	0	0	1	1	0	1
Jan-18-2021	0	0	0	1	0	1	1	1
Feb-18-2021	1	1	1	0	1	0	1	0
Feb-18-2021	0	1	0	1	0	0	0	0
Feb-18-2021	1	0	1	0	0	1	0	1
Feb-18-2021	0	0	1	0	1	1	0	1
Feb-18-2021	1	1	0	0	1	0	1	1
Feb-18-2021	1	0	0	1	0	0	1	1
Feb-18-2021	0	1	1	1	0	1	0	0
Feb-18-2021	1	1	0	1	1	1	1	0
Mar-18-2021	0	1	0	1	1	0	0	0
Mar-18-2021	1	1	0	1	0	1	0	0
Mar-18-2021	1	1	1	0	0	1	0	1
Mar-18-2021	0	1	0	0	0	1	1	0
Mar-18-2021	1	1	1	0	1	1	0	1
Mar-18-2021	1	1	1	1	1	0	1	1
Mar-18-2021	1	0	0	1	0	1	0	0
Mar-18-2021	1	1	0	0	0	0	0	0
Mar-18-2021	1	1	1	1	1	1	0	1
Mar-18-2021	0	1	0	1	1	1	1	1
Mar-18-2021	0	1	1	0	0	0	0	0
Mar-18-2021	0	1	1	1	0	0	0	0
Mar-18-2021	0	1	1	1	1	1	1	0
Mar-18-2021	0	0	1	1	1	1	1	0
Mar-18-2021	1	0	0	1	0	1	1	1
;
RUN;

DATA	Want;
	DROP	_:	Cat1 - Cat7	Value;
	SET	Have;
		BY	Date	NOTSORTED;
	ARRAY	Cats	{*}	Cat1	-	Cat7;
	ARRAY	Sums	{7}	_TEMPORARY_;
	ARRAY	Cat_Cnt	{7}	_TEMPORARY_;
	ARRAY	Rates	{7}	Rate1	-	Rate7;
	FORMAT	Rate1	-	Rate7	PERCENT10.2;

	DO	_i						=	1	TO	DIM(Cats);
		IF	Cats{_i}						THEN
			DO;
				Sums{_i}		+	Value;
				Cat_Cnt{_i}		+	1;
			END;
	END;

	IF	Last.Date							THEN
		DO;
			DO	_i				=	1	TO	DIM(Cats);
				Rates{_i}		=	Sums{_i}	/	Cat_Cnt{_i};
				CALL	MISSING(Sums{_i}, Cat_Cnt{_i});
			END;
			OUTPUT;
		END;
	ELSE
		DO;
			DELETE;			
		END;
RUN;

Jim

ChrisNZ
Tourmaline | Level 20

Like this?

proc sql;
  select DATE
       , sum(VALUE)/sum(CAT1) as RATE1
       , sum(VALUE)/sum(CAT2) as RATE2
       ...
from HAVE
group by DATE;
ChrisNZ
Tourmaline | Level 20

My bad, I misunderstood. Like this?

proc sql;
  select DATE
       , sum(VALUE*(CAT1=1))/sum(CAT1=1) as RATE1
       ...
from HAVE
group by DATE;

 

jimbarbour
Meteorite | Level 14

@ChrisNZ wrote:
proc sql;
  select DATE
       , sum(VALUE*(CAT1=1))/sum(CAT1=1) as RATE1
       ...
from HAVE
group by DATE;

Oh!  Now, that's danged clever.  And I suppose that if the CATn variables had possible values greater than 1, then we could do the following:

 

proc sql;
  select DATE
       , sum(VALUE*(CAT1>0))/sum(IFN(CAT1>0,CAT1,0)) as RATE1	FORMAT=PERCENT10.2
       , sum(VALUE*(CAT2>0))/sum(IFN(CAT2>0,CAT2,0)) as RATE2	FORMAT=PERCENT10.2
       , sum(VALUE*(CAT3>0))/sum(IFN(CAT3>0,CAT3,0)) as RATE3	FORMAT=PERCENT10.2
       , sum(VALUE*(CAT4>0))/sum(IFN(CAT4>0,CAT4,0)) as RATE4	FORMAT=PERCENT10.2
       , sum(VALUE*(CAT5>0))/sum(IFN(CAT5>0,CAT5,0)) as RATE5	FORMAT=PERCENT10.2
       , sum(VALUE*(CAT6>0))/sum(IFN(CAT6>0,CAT6,0)) as RATE6	FORMAT=PERCENT10.2
       , sum(VALUE*(CAT7>0))/sum(IFN(CAT7>0,CAT7,0)) as RATE7	FORMAT=PERCENT10.2
from HAVE
group by DATE;

 

 

Jim

Tom
Super User Tom
Super User

@sasprogramming wrote:

Value is used in the rate calculation. The data i have posted is fine.

Take the rows in January in which cat1 = 1

Date Cat1 Cat2 Cat3 Cat4 Cat5 Cat6 Cat7 Value
Jan-18 1 0 0 0 1 1 0 0
Jan-18 0 1 1 0 0 1 1 0
Jan-18 0 1 0 1 0 1 1 1
  0 1 1 0 0 1 0 0
Jan-18 1 0 0 0 0 0 0 1
Jan-18 0 1 0 1 0 0 1 0
Jan-18 0 1 0 1 1 0 1 1
Jan-18 0 0 1 0 0 1 1 1
Jan-18 1 1 1 0 0 0 1 1
Jan-18 1 0 0 0 1 1 0 1

 

CA rate for cat1 Jan = sum of value = 3 divided by sum of cat = 4, therefore rate = 3/4


I cannot understand where the 3 and 4 come from in your calculations.  There are four observations with CAT1=1.  Is that how you got 4?  But then where did the 3 come from?

ballardw
Super User

Is that value of date supposed to be missing?

First step: make sure all of your "grouping" variables, such as a date are not missing. Fill in if they are missing before running any of the suggested solutions.

Second, your "want" only shows Jan, Feb etc. Do you have data from more than one year in the Date variable? If so, is the "want" supposed to only reflect one year, apparently 18 (bad juju can arise with reliance on 2-digit years).

PaigeMiller
Diamond | Level 26

PROC REPORT can do this easily. The mean of the 0/1 binary variables is the percent you want.

 

proc report data=have;
    columns date cat1-cat7;
    define date/group "Date";
    define cat1/mean format=percent8.2;
    define cat2/mean format=percent8.2;
    define cat3/mean format=percent8.2;
    define cat4/mean format=percent8.2;
    define cat5/mean format=percent8.2;
    define cat6/mean format=percent8.2;
    define cat7/mean format=percent8.2;
run;
--
Paige Miller
jimbarbour
Meteorite | Level 14

@PaigeMiller,

 

For the Rate:  If the numerator is determined by the Value column and the denominator is determined by the Cat column, would a Proc Report give us the result we would want in this case?  An excellent technique in general though; one that didn't even occur to me.

 

Jim

jimbarbour_0-1628610985075.png

 

PaigeMiller
Diamond | Level 26

Yes, PROC REPORT could be modified so that the value column is used in the denominator. It was never clear to me from reading the original post that the percents used the value column in the calculations. and the Excel example is something I ignore, as I don't like and don't use Excel.

--
Paige Miller
Tom
Super User Tom
Super User

Seems simple enough.

data want;
  do until(last.date);
    set have;
    by date;
    array cat[7];
    array top[7];
    array bottom[7];
    array rate[7];
    do index=1 to dim(cat);
      if cat[index] then do;
        top[index]=sum(top[index],value);
        bottom[index]=sum(bottom[index],1);
      end;
    end;
  end;
  do index=1 to dim(cat);
    rate[index]=divide(top[index],bottom[index]);
  end;
  drop index cat: top: bottom: value;
run;
Obs          date    rate1     rate2     rate3    rate4     rate5      rate6      rate7

 1     2021-01-18    0.750    0.50000     0.5      0.75    0.66667    0.57143    0.71429
 2     2021-02-18    0.600    0.20000     0.5      0.25    0.50000    0.50000    0.50000
 3     2021-03-18    0.625    0.41667     0.5      0.40    0.57143    0.50000    0.50000

But it would be much easier with a different structure where the category number is not stored in the NAME of a variable but instead as the VALUE of a variable instead.

data have ;
  input date :date. value @ ;
  do cat=1 to 7;
    input result @;
    output;
  end;
  format date yymmdd10.;
DATALINES;
18-Jan-2021 0 1 0 0 0 1 1 0 0
18-Jan-2021 0 0 1 1 0 0 1 1 0
18-Jan-2021 1 0 1 0 1 0 1 1 1
18-Jan-2021 0 0 1 1 0 0 1 0 0
18-Jan-2021 1 1 0 0 0 0 0 0 1
18-Jan-2021 0 0 1 0 1 0 0 1 0
18-Jan-2021 1 0 1 0 1 1 0 1 1
18-Jan-2021 1 0 0 1 0 0 1 1 1
18-Jan-2021 1 1 1 1 0 0 0 1 1
18-Jan-2021 1 1 0 0 0 1 1 0 1
18-Jan-2021 1 0 0 0 1 0 1 1 1
18-Feb-2021 0 1 1 1 0 1 0 1 0
18-Feb-2021 0 0 1 0 1 0 0 0 0
18-Feb-2021 1 1 0 1 0 0 1 0 1
18-Feb-2021 1 0 0 1 0 1 1 0 1
18-Feb-2021 1 1 1 0 0 1 0 1 1
18-Feb-2021 1 1 0 0 1 0 0 1 1
18-Feb-2021 0 0 1 1 1 0 1 0 0
18-Feb-2021 0 1 1 0 1 1 1 1 0
18-Mar-2021 0 0 1 0 1 1 0 0 0
18-Mar-2021 0 1 1 0 1 0 1 0 0
18-Mar-2021 1 1 1 1 0 0 1 0 1
18-Mar-2021 0 0 1 0 0 0 1 1 0
18-Mar-2021 1 1 1 1 0 1 1 0 1
18-Mar-2021 1 1 1 1 1 1 0 1 1
18-Mar-2021 0 1 0 0 1 0 1 0 0
18-Mar-2021 0 1 1 0 0 0 0 0 0
18-Mar-2021 1 1 1 1 1 1 1 0 1
18-Mar-2021 1 0 1 0 1 1 1 1 1
18-Mar-2021 0 0 1 1 0 0 0 0 0
18-Mar-2021 0 0 1 1 1 0 0 0 0
18-Mar-2021 0 0 1 1 1 1 1 1 0
18-Mar-2021 0 0 0 1 1 1 1 1 0
18-Mar-2021 1 1 0 0 1 0 1 1 1
;

proc sql ;
create table want as
select date,cat
     , sum(result*value)/sum(result) as rate
from have
group by 1,2
;
quit;

Which you could then use to make a REPORT that looks like your request.

proc report data=want;
 column date rate,cat ;
 define date/group;
 define rate/sum ' ';
 define cat/across ' ';
run;
        date          1          2          3          4          5          6          7
  2021-01-18       0.75        0.5        0.5       0.75  0.6666667  0.5714286  0.7142857
  2021-02-18        0.6        0.2        0.5       0.25        0.5        0.5        0.5
  2021-03-18      0.625  0.4166667        0.5        0.4  0.5714286        0.5        0.5

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 2348 views
  • 15 likes
  • 7 in conversation