Re: Create a count using conditional statement

CathyVI

Hello Everyone,

I need help trying to find the count for the multi_event column. Here is a sample code and my output. I was able to get the count for event 1-4 only. I also want to count when event is in more than 1 column.

data have;
input event1 event2 event3 event4;
datalines;
. 1 . .
1 1 . .
. . 1 .
1 . . 1
1 . . .
. 1 . .
. . 1 .
. 1 1 1
. . . 1
1 . . .
. . 1 1
;
run;

SAS output

Row 2 and 4 met my criteria for multi_event so they were output in multi_event.

Row 8 and 11 met the criteria but did not output. I want row 8, 11 or any other row with multiple count to be in the multi_event column.

Here is my attempted code (My coding is not classic I will appreciate learning a better way to code it)

data want;
set have;
if event1 ne . and event2=. and event3=. and event4=. then event1only=1;
if event2 ne . and event1=. and event3=. and event4=. then event2only=1;
if event3 ne . and event1=. and event2=. and event4=. then event3only=1;
if event4 ne . and event1=. and event2=. and event3=. then event4only=1;
if event1 ne . and (event2 ne . or  event3 ne . or event4 ne . ) then multi_event=1;
run;

This is my final desired output is to be able to count each column like this with proc tabulate or report.

Event1only	Event2Only	Event3Only	Event4Only	Multi_event
2	2	2	1	3

Thanks

FreelanceReinh

Hello @CathyVI,

If the values of event1, event2, etc. are either 1 or missing, you can count the non-missing values with the N function and then use the WHICHN function to populate the "only..." variables. The report is easy with PROC REPORT.

data want(drop=_c);
set have;
array only[4];
_c=n(of event:);
if _c>1 then multi=1;
else if _c=1 then only[whichn(1,of event:)]=1;
run;

proc report data=want;
column only: multi;
run;

Note that variable names only1, only2, etc. with a numeric suffix are more convenient to work with than event1only, etc.: see the array definition and the variable list in the COLUMN statement.

CathyVI

@FreelanceReinh @ballardw Thank you for explain this. Both codes worked but i have to read carefully to understand your coding.

@FreelanceReinh how can I add "Total" to the output table you created.

Only1	Only2	Only3	Only4	Multi	Total
2	2	2	1	4	11

FreelanceReinh

@CathyVI wrote:

@FreelanceReinh how can I add "Total" to the output table you created.

Only1

Only2

Only3

Only4

Multi

Total

2

2

2

1

4

11

I would compute the total in the DATA step and then add the computed variable to the COLUMN statement of the PROC REPORT step:

data want(drop=_c);
set have;
array only[4];
_c=n(of event:);
if _c>1 then multi=1;
else if _c=1 then only[whichn(1,of event:)]=1;
total=sum(of only[*], multi, 0);
run;

proc report data=want;
column only: multi total;
run;

The purpose of the zero in the third argument of the SUM function is to avoid log messages about missing values for (hypothetical) observations with missing values in all event... variables.

CathyVI

@FreelanceReinh @ballardw Thank you. I made a 2 silly mistake in my first question. My table is not (.) it is (0). Also, my columns have different names. When I tried to apply your code I did not get the missing value populated.

data have;
input event1 event2 case32 event46;
datalines;
0 1 0 0
1 1 0 0
0 0 1 0
1 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 1 1 1
0 0 0 1
1 0 0 0
0 0 1 1
;
run;

Sorry for the confusion. Please help.

ballardw

Looks like the definition of "all missing" would become SUM(of events)=0 .

@CathyVI wrote:

@FreelanceReinh @ballardw Thank you. I made a 2 silly mistake in my first question. My table is not (.) it is (0). Also, my columns have different names. When I tried to apply your code I did not get the missing value populated.
data have;
input event1 event2 case32 event46;
datalines;
0 1 0 0
1 1 0 0
0 0 1 0
1 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 1 1 1
0 0 0 1
1 0 0 0
0 0 1 1
;
run;
Sorry for the confusion. Please help.

ballardw

@ballardw wrote:

Looks like the definition of "all missing" would become SUM(of events)=0 .
@CathyVI wrote:

@FreelanceReinh @ballardw Thank you. I made a 2 silly mistake in my first question. My table is not (.) it is (0). Also, my columns have different names. When I tried to apply your code I did not get the missing value populated.
data have;
input event1 event2 case32 event46;
datalines;
0 1 0 0
1 1 0 0
0 0 1 0
1 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 1 1 1
0 0 0 1
1 0 0 0
0 0 1 1
;
run;
Sorry for the confusion. Please help.

I have to assume "missing" with the above would be all 0 values correct? None are provided to demonstrate.

If that assumption is correct a minor addition to @Reeza's code:

data have;
input event1 event2 case32 event46;
datalines;
0 1 0 0
1 1 0 0
0 0 1 0
1 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 1 1 1
0 0 0 1
1 0 0 0
0 0 1 1
0 0 0 0
;
run;

data want;
set have;
array _events(*) event1 event2 case32 event46;
length event_type $32.;
if sum(of _events(*))>1 then event_type = 'Multi';
else if sum(of _events(*))=0 then event_type = 'Missing';
else event_type = vname(_events(whichn(1, of _events(*))));

run;

proc tabulate data=want;
class event_type;
table  (event_type all="Total")*N='';
run;

Reeza

data have;
input event1 event2 case32 event46;
datalines;
0 1 0 0
1 1 0 0
0 0 1 0
1 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 1 1 1
0 0 0 1
1 0 0 0
0 0 1 1
;
run;

data want;
set have;
array _events(*) event: case:;

length event_type $32.;
if sum(of _events(*))>1 then event_type = 'Multi';
else event_type = vname(_events(whichn(1, of _events(*))));

run;

proc tabulate data=want;
class event_type;
table  (event_type all="Total")*N='';
run;

proc freq data=want noprint;
table event_type / out=summary;
run;

proc print data=summary;
run;

Many different ways to get there. 0/1 actually simplify things IMO.

FreelanceReinh

@CathyVI wrote:

My table is not (.) it is (0). Also, my columns have different names.

Reeza's definition of variable event_type is a great idea for these new requirements and, of course, using the SUM function instead of the N function.

CathyVI

@FreelanceReinh @Patrick @ballardw @Reeza Thank you all. I want to add a final twist to my work. I am sorry if you expected it earlier. I am a step by step learner so I don't want to get confused with too much information at a time. This is my twist.

I want to add "grp" variable to event46. The grp is only available if event46 is available however grp can be available when event46 is present and other event is present e.g., case32 or event2 (this is the multi ).

I would like my outcome to have the event46 outcomes in their respective groups.

data have;
input event1 event2 case32 event46 grp $;
datalines;
0 1 0 0 .
1 1 0 0 .
0 0 1 0 .
1 0 0 1 A
0 0 0 1 A 
1 0 0 0 .
0 0 1 1 C
0 0 0 1 C
0 1 0 0 .
0 0 1 0 .
0 1 1 1 B
0 0 0 1 A 
1 0 0 0 .
0 0 1 1 C
0 0 0 0 .
0 0 0 1 B 
0 0 0 1 B 
;
run;

Note: if case32 and event46 are both present, I still want them in the multi. I think I can do this in pro tabulate by adding grp but did not work

proc tabulate data=want;
class event_type/ order=formatted;
table grp, (event_type all="Total")*N='';
/*table grp*(event_type all="Total")*N='' */
run;

Again, sorry for not addition this details earlier. Thank you so much.

ballardw

@CathyVI wrote:

@FreelanceReinh @Patrick @ballardw @Reeza Thank you all. I want to add a final twist to my work. I am sorry if you expected it earlier. I am a step by step learner so I don't want to get confused with too much information at a time. This is my twist.

I want to add "grp" variable to event46. The grp is only available if event46 is available however grp can be available when event46 is present and other event is present e.g., case32 or event2 (this is the multi ).

I would like my outcome to have the event46 outcomes in their respective groups.
data have;
input event1 event2 case32 event46 grp $;
datalines;
0 1 0 0 .
1 1 0 0 .
0 0 1 0 .
1 0 0 1 A
0 0 0 1 A 
1 0 0 0 .
0 0 1 1 C
0 0 0 1 C
0 1 0 0 .
0 0 1 0 .
0 1 1 1 B
0 0 0 1 A 
1 0 0 0 .
0 0 1 1 C
0 0 0 0 .
0 0 0 1 B 
0 0 0 1 B 
;
run;
Note: if case32 and event46 are both present, I still want them in the multi. I think I can do this in pro tabulate by adding grp but did not work
proc tabulate data=want;
class event_type/ order=formatted;
table grp, (event_type all="Total")*N='';
/*table grp*(event_type all="Total")*N='' */
run;
Again, sorry for not addition this details earlier. Thank you so much.

I cannot detect the rule involved with why the first two times you have this the group is A but for the last two lines it is B. You need to provide rules, not examples sometimes.

0 0 0 1

CathyVI

@ballardw Thank you. There is no rule involved. The grp variable is part of the original dataset that grouped event46 into A,B,C. event46 is group in the original data, I did not group it but i want my outout into grouping so I see the count of A only, count of B only and count of C only.

Tom

Sounds like you want to convert EVENT46 + GRP into some unknown number of new 0/1 variables. Then your new problem is the same as the original problem.

Your example data does not seem to have any ID variable(s) to uniquely identify the observations. So let's add one, let's call it ROW.

data have;
  row+1;
  input event1 event2 case32 event46 grp $;
datalines;
0 1 0 0 .
1 1 0 0 .
0 0 1 0 .
1 0 0 1 A
0 0 0 1 A 
1 0 0 0 .
0 0 1 1 C
0 0 0 1 C
0 1 0 0 .
0 0 1 0 .
0 1 1 1 B
0 0 0 1 A 
1 0 0 0 .
0 0 1 1 C
0 0 0 0 .
0 0 0 1 B 
0 0 0 1 B 
;

Now we can use PROC TRANSPOSE to create EVENT46_A and EVENT46_B etc. Which you can merge back onto your existing data using the key variable(s).


proc transpose data=have prefix=event46_ out=new_vars(drop=_:);
  by row;
  where event46 ;
  id grp;
  var event46;
run;

data new_have;
  merge have new_vars;
  by row;
  drop event46 grp;
run;

Result:

                                            event46_    event46_    event46_
OBS    row    event1    event2    case32        A           C           B

  1      1       0         1         0          .           .           .
  2      2       1         1         0          .           .           .
  3      3       0         0         1          .           .           .
  4      4       1         0         0          1           .           .
  5      5       0         0         0          1           .           .
  6      6       1         0         0          .           .           .
  7      7       0         0         1          .           1           .
  8      8       0         0         0          .           1           .
  9      9       0         1         0          .           .           .
 10     10       0         0         1          .           .           .
 11     11       0         1         1          .           .           1
 12     12       0         0         0          1           .           .
 13     13       1         0         0          .           .           .
 14     14       0         0         1          .           1           .
 15     15       0         0         0          .           .           .
 16     16       0         0         0          .           .           1
 17     17       0         0         0          .           .           1

FreelanceReinh

@CathyVI wrote:

I would like my outcome to have the event46 outcomes in their respective groups.

So you want to split the count of "event46" into counts of the subgroups "event46 A", "event46 B", etc.?

A simple solution would be to modify the value of event_type in Reeza's code accordingly, e.g., insert as the last statement in the DATA step creating dataset WANT:

if event_type='event46' then event_type=catx(' ','event46',grp);

However, this might not suffice for a fancier report with, say, headers "A", "B", "C" and a spanning header "event46".

CathyVI

@FreelanceReinh @Tom @Patrick Thank you. I am afraid to tag the rest because this is not an efficient way of requesting help. I am solving this task bit by bit. Am very sorry if this is annoying.

data have;
input event1 event2 case32 event46 grp $;
datalines;
0 1 0 0 .
1 2 0 0 .
0 0 1 0 .
1 0 0 1 A
0 0 0 1 A 
1 0 0 0 .
0 0 1 1 C
0 0 0 1 C
0 1 0 0 .
0 0 1 0 .
0 0 2 0 .
0 1 1 1 B
0 0 0 2 A 
2 0 0 0 .
0 0 1 1 C
0 0 0 0 .
0 0 0 1 B 
0 0 0 1 B 

0 2 0 0 .
2 2 0 0 .
0 0 1 0 .
2 0 0 1 A
0 0 0 1 A 
;
run;

The initial data I shared has 0=missing days, 1=any value<=30 days, 2=value between 31-60 days. I mistakenly again forgot the 2. I was only sharing dataset with 0,1. Am sorry.

When I used @Reeza code, the 2 was part of the multi.

data want;
set have;
array _events(*) event1 event2 case32 event46 ;
length event_type $32.;
if sum(of _events(*))>1 then event_type = 'Multiple';
else if sum(of _events(*))=0 then event_type = 'Missing';
else event_type = vname(_events(whichn(1, of _events(*))));
if event_type='event46' then event_type=catx(' ','event46',grp);
run;

This is not what I want. I want 2 to follow the counting process as 1. if you have a single 2 you will be counted differently may if i can name it event1_60, event2_60 or case32_60 and if you have 2 in more than one group you will be multi_60. Again, I know this is inefficient way of requesting help. Am sorry

This is the overall goal am trying to get.

	Event1 only	Event2 only	Case32 only	Event46 A only	Event46 B only	Event46 C Only	Multiple	Total
30 days
31-60 days
Total

Classroom Training Available!