Solved: Create a count using conditional statement

CathyVI · Posted 04-25-2024 09:55 AM

Hello Everyone,

I need help trying to find the count for the multi_event column. Here is a sample code and my output. I was able to get the count for event 1-4 only. I also want to count when event is in more than 1 column.

data have;
input event1 event2 event3 event4;
datalines;
. 1 . .
1 1 . .
. . 1 .
1 . . 1
1 . . .
. 1 . .
. . 1 .
. 1 1 1
. . . 1
1 . . .
. . 1 1
;
run;

SAS output

Row 2 and 4 met my criteria for multi_event so they were output in multi_event.

Row 8 and 11 met the criteria but did not output. I want row 8, 11 or any other row with multiple count to be in the multi_event column.

Here is my attempted code (My coding is not classic I will appreciate learning a better way to code it)

data want;
set have;
if event1 ne . and event2=. and event3=. and event4=. then event1only=1;
if event2 ne . and event1=. and event3=. and event4=. then event2only=1;
if event3 ne . and event1=. and event2=. and event4=. then event3only=1;
if event4 ne . and event1=. and event2=. and event3=. then event4only=1;
if event1 ne . and (event2 ne . or  event3 ne . or event4 ne . ) then multi_event=1;
run;

This is my final desired output is to be able to count each column like this with proc tabulate or report.

Event1only	Event2Only	Event3Only	Event4Only	Multi_event
2	2	2	1	3

Thanks

FreelanceReinh · Posted 05-01-2024 10:55 AM

With these new specifications the DO-END block after "else if _n0<3 ..." changes:

data want(drop=_:);
set have;
array e[*] event1 event2 case32 event46;
length _s $4 _v event_type $40;
_s=cat(of e[*]);
_n0=countc(_s,'0');
if _n0=3 then do; /* Three zeros indicate that only one type of form has been filled. */
  _vn=verify(_s,'0'); /* This determines which form it was. */
  _v=vname(e[_vn]);
  event_type=catx(' ',_v,grp,'only');
  if _vn<3 then event_type=' '||event_type; /* Indentation improves sort order of report columns. */
  Duration=e[_vn];
end;
else if _n0<3 then do; /* Fewer than 3 zeros indicate that multiple forms have been filled. */
  _n1=countc(_s,'1');
  if _n1=1 then do; /* If value 1 occurs only once, ... */
    _vn=whichn(1, of e[*]); /* ... this determines the corresponding form. */
    _v=vname(e[_vn]);
    if _vn=4 then event_type=catx(' ',_v,grp,'only');
             else event_type=catx(' ',_v,    'only');
    if _vn<3 then event_type=' '||event_type; /* Indentation improves sort order of report columns. */
    Duration=1; /* Value 1 overrides value 2. */
  end;
  else do;
    event_type='Multiple';
    Duration=1+(_n1=0); /* Duration=2 only if value 1 does not occur, otherwise Duration=1. */
  end;
end;
format Duration days.;
run;

First, we count the 1s in the four-digit string _s. If that number _n1=1, we determine where the single 1 occurred (WHICHN function) and conclude that event_type is the corresponding "event" only (variable _v). In the special case where that is "Event46" (i.e., _vn=4), the value of grp is inserted. As previously, we indent "Event1" and "Event2". Definitely, Duration=1, so that value 1 prevails.

If _n1 does not equal 1 (while _n0<3), we must have either two (or more) 1s or zero 1s and two (or more) 2s. Both cases qualify for event_type='Multiple' and the Duration is 1 or 2, respectively.

To obtain an overview of the event_type and Duration values resulting from the various four-digit combinations in _s (and the value of grp), you can omit the DROP= dataset option from the DATA statement (let's call the new output dataset CHK) and run the following PROC FREQ step:

proc freq data=chk;
tables event_type*duration*_s / missing list;
run;

View solution in original post

FreelanceReinh · Posted 04-25-2024 10:42 AM

Hello @CathyVI,

If the values of event1, event2, etc. are either 1 or missing, you can count the non-missing values with the N function and then use the WHICHN function to populate the "only..." variables. The report is easy with PROC REPORT.

data want(drop=_c);
set have;
array only[4];
_c=n(of event:);
if _c>1 then multi=1;
else if _c=1 then only[whichn(1,of event:)]=1;
run;

proc report data=want;
column only: multi;
run;

Note that variable names only1, only2, etc. with a numeric suffix are more convenient to work with than event1only, etc.: see the array definition and the variable list in the COLUMN statement.

CathyVI · Posted 04-25-2024 12:59 PM

@FreelanceReinh @ballardw Thank you for explain this. Both codes worked but i have to read carefully to understand your coding.

@FreelanceReinh how can I add "Total" to the output table you created.

Only1	Only2	Only3	Only4	Multi	Total
2	2	2	1	4	11

FreelanceReinh · Posted 04-25-2024 01:19 PM

@CathyVI wrote:

@FreelanceReinh how can I add "Total" to the output table you created.

Only1

Only2

Only3

Only4

Multi

Total

2

2

2

1

4

11

I would compute the total in the DATA step and then add the computed variable to the COLUMN statement of the PROC REPORT step:

data want(drop=_c);
set have;
array only[4];
_c=n(of event:);
if _c>1 then multi=1;
else if _c=1 then only[whichn(1,of event:)]=1;
total=sum(of only[*], multi, 0);
run;

proc report data=want;
column only: multi total;
run;

The purpose of the zero in the third argument of the SUM function is to avoid log messages about missing values for (hypothetical) observations with missing values in all event... variables.

CathyVI · Posted 04-25-2024 04:15 PM

@FreelanceReinh @ballardw Thank you. I made a 2 silly mistake in my first question. My table is not (.) it is (0). Also, my columns have different names. When I tried to apply your code I did not get the missing value populated.

data have;
input event1 event2 case32 event46;
datalines;
0 1 0 0
1 1 0 0
0 0 1 0
1 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 1 1 1
0 0 0 1
1 0 0 0
0 0 1 1
;
run;

Sorry for the confusion. Please help.

ballardw · Posted 04-25-2024 04:22 PM

Looks like the definition of "all missing" would become SUM(of events)=0 .

@CathyVI wrote:

@FreelanceReinh @ballardw Thank you. I made a 2 silly mistake in my first question. My table is not (.) it is (0). Also, my columns have different names. When I tried to apply your code I did not get the missing value populated.
data have;
input event1 event2 case32 event46;
datalines;
0 1 0 0
1 1 0 0
0 0 1 0
1 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 1 1 1
0 0 0 1
1 0 0 0
0 0 1 1
;
run;
Sorry for the confusion. Please help.

ballardw · Posted 04-25-2024 06:00 PM

@ballardw wrote:

Looks like the definition of "all missing" would become SUM(of events)=0 .
@CathyVI wrote:

@FreelanceReinh @ballardw Thank you. I made a 2 silly mistake in my first question. My table is not (.) it is (0). Also, my columns have different names. When I tried to apply your code I did not get the missing value populated.
data have;
input event1 event2 case32 event46;
datalines;
0 1 0 0
1 1 0 0
0 0 1 0
1 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 1 1 1
0 0 0 1
1 0 0 0
0 0 1 1
;
run;
Sorry for the confusion. Please help.

I have to assume "missing" with the above would be all 0 values correct? None are provided to demonstrate.

If that assumption is correct a minor addition to @Reeza's code:

data have;
input event1 event2 case32 event46;
datalines;
0 1 0 0
1 1 0 0
0 0 1 0
1 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 1 1 1
0 0 0 1
1 0 0 0
0 0 1 1
0 0 0 0
;
run;

data want;
set have;
array _events(*) event1 event2 case32 event46;
length event_type $32.;
if sum(of _events(*))>1 then event_type = 'Multi';
else if sum(of _events(*))=0 then event_type = 'Missing';
else event_type = vname(_events(whichn(1, of _events(*))));

run;

proc tabulate data=want;
class event_type;
table  (event_type all="Total")*N='';
run;

Reeza · Posted 04-25-2024 04:30 PM

data have;
input event1 event2 case32 event46;
datalines;
0 1 0 0
1 1 0 0
0 0 1 0
1 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 1 1 1
0 0 0 1
1 0 0 0
0 0 1 1
;
run;

data want;
set have;
array _events(*) event: case:;

length event_type $32.;
if sum(of _events(*))>1 then event_type = 'Multi';
else event_type = vname(_events(whichn(1, of _events(*))));

run;

proc tabulate data=want;
class event_type;
table  (event_type all="Total")*N='';
run;

proc freq data=want noprint;
table event_type / out=summary;
run;

proc print data=summary;
run;

Many different ways to get there. 0/1 actually simplify things IMO.

FreelanceReinh · Posted 04-26-2024 02:39 AM

@CathyVI wrote:

My table is not (.) it is (0). Also, my columns have different names.

Reeza's definition of variable event_type is a great idea for these new requirements and, of course, using the SUM function instead of the N function.

CathyVI · Posted 04-26-2024 12:10 PM

@FreelanceReinh @Patrick @ballardw @Reeza Thank you all. I want to add a final twist to my work. I am sorry if you expected it earlier. I am a step by step learner so I don't want to get confused with too much information at a time. This is my twist.

I want to add "grp" variable to event46. The grp is only available if event46 is available however grp can be available when event46 is present and other event is present e.g., case32 or event2 (this is the multi ).

I would like my outcome to have the event46 outcomes in their respective groups.

data have;
input event1 event2 case32 event46 grp $;
datalines;
0 1 0 0 .
1 1 0 0 .
0 0 1 0 .
1 0 0 1 A
0 0 0 1 A 
1 0 0 0 .
0 0 1 1 C
0 0 0 1 C
0 1 0 0 .
0 0 1 0 .
0 1 1 1 B
0 0 0 1 A 
1 0 0 0 .
0 0 1 1 C
0 0 0 0 .
0 0 0 1 B 
0 0 0 1 B 
;
run;

Note: if case32 and event46 are both present, I still want them in the multi. I think I can do this in pro tabulate by adding grp but did not work

proc tabulate data=want;
class event_type/ order=formatted;
table grp, (event_type all="Total")*N='';
/*table grp*(event_type all="Total")*N='' */
run;

Again, sorry for not addition this details earlier. Thank you so much.

ballardw · Posted 04-26-2024 12:23 PM

@CathyVI wrote:

@FreelanceReinh @Patrick @ballardw @Reeza Thank you all. I want to add a final twist to my work. I am sorry if you expected it earlier. I am a step by step learner so I don't want to get confused with too much information at a time. This is my twist.

I want to add "grp" variable to event46. The grp is only available if event46 is available however grp can be available when event46 is present and other event is present e.g., case32 or event2 (this is the multi ).

I would like my outcome to have the event46 outcomes in their respective groups.
data have;
input event1 event2 case32 event46 grp $;
datalines;
0 1 0 0 .
1 1 0 0 .
0 0 1 0 .
1 0 0 1 A
0 0 0 1 A 
1 0 0 0 .
0 0 1 1 C
0 0 0 1 C
0 1 0 0 .
0 0 1 0 .
0 1 1 1 B
0 0 0 1 A 
1 0 0 0 .
0 0 1 1 C
0 0 0 0 .
0 0 0 1 B 
0 0 0 1 B 
;
run;
Note: if case32 and event46 are both present, I still want them in the multi. I think I can do this in pro tabulate by adding grp but did not work
proc tabulate data=want;
class event_type/ order=formatted;
table grp, (event_type all="Total")*N='';
/*table grp*(event_type all="Total")*N='' */
run;
Again, sorry for not addition this details earlier. Thank you so much.

I cannot detect the rule involved with why the first two times you have this the group is A but for the last two lines it is B. You need to provide rules, not examples sometimes.

0 0 0 1

CathyVI · Posted 04-26-2024 12:47 PM

@ballardw Thank you. There is no rule involved. The grp variable is part of the original dataset that grouped event46 into A,B,C. event46 is group in the original data, I did not group it but i want my outout into grouping so I see the count of A only, count of B only and count of C only.

Tom · Posted 04-26-2024 01:00 PM

Sounds like you want to convert EVENT46 + GRP into some unknown number of new 0/1 variables. Then your new problem is the same as the original problem.

Your example data does not seem to have any ID variable(s) to uniquely identify the observations. So let's add one, let's call it ROW.

data have;
  row+1;
  input event1 event2 case32 event46 grp $;
datalines;
0 1 0 0 .
1 1 0 0 .
0 0 1 0 .
1 0 0 1 A
0 0 0 1 A 
1 0 0 0 .
0 0 1 1 C
0 0 0 1 C
0 1 0 0 .
0 0 1 0 .
0 1 1 1 B
0 0 0 1 A 
1 0 0 0 .
0 0 1 1 C
0 0 0 0 .
0 0 0 1 B 
0 0 0 1 B 
;

Now we can use PROC TRANSPOSE to create EVENT46_A and EVENT46_B etc. Which you can merge back onto your existing data using the key variable(s).


proc transpose data=have prefix=event46_ out=new_vars(drop=_:);
  by row;
  where event46 ;
  id grp;
  var event46;
run;

data new_have;
  merge have new_vars;
  by row;
  drop event46 grp;
run;

Result:

                                            event46_    event46_    event46_
OBS    row    event1    event2    case32        A           C           B

  1      1       0         1         0          .           .           .
  2      2       1         1         0          .           .           .
  3      3       0         0         1          .           .           .
  4      4       1         0         0          1           .           .
  5      5       0         0         0          1           .           .
  6      6       1         0         0          .           .           .
  7      7       0         0         1          .           1           .
  8      8       0         0         0          .           1           .
  9      9       0         1         0          .           .           .
 10     10       0         0         1          .           .           .
 11     11       0         1         1          .           .           1
 12     12       0         0         0          1           .           .
 13     13       1         0         0          .           .           .
 14     14       0         0         1          .           1           .
 15     15       0         0         0          .           .           .
 16     16       0         0         0          .           .           1
 17     17       0         0         0          .           .           1

FreelanceReinh · Posted 04-26-2024 01:03 PM

@CathyVI wrote:

I would like my outcome to have the event46 outcomes in their respective groups.

So you want to split the count of "event46" into counts of the subgroups "event46 A", "event46 B", etc.?

A simple solution would be to modify the value of event_type in Reeza's code accordingly, e.g., insert as the last statement in the DATA step creating dataset WANT:

if event_type='event46' then event_type=catx(' ','event46',grp);

However, this might not suffice for a fancier report with, say, headers "A", "B", "C" and a spanning header "event46".

CathyVI · Posted 04-26-2024 02:46 PM

@FreelanceReinh @Tom @Patrick Thank you. I am afraid to tag the rest because this is not an efficient way of requesting help. I am solving this task bit by bit. Am very sorry if this is annoying.

data have;
input event1 event2 case32 event46 grp $;
datalines;
0 1 0 0 .
1 2 0 0 .
0 0 1 0 .
1 0 0 1 A
0 0 0 1 A 
1 0 0 0 .
0 0 1 1 C
0 0 0 1 C
0 1 0 0 .
0 0 1 0 .
0 0 2 0 .
0 1 1 1 B
0 0 0 2 A 
2 0 0 0 .
0 0 1 1 C
0 0 0 0 .
0 0 0 1 B 
0 0 0 1 B 

0 2 0 0 .
2 2 0 0 .
0 0 1 0 .
2 0 0 1 A
0 0 0 1 A 
;
run;

The initial data I shared has 0=missing days, 1=any value<=30 days, 2=value between 31-60 days. I mistakenly again forgot the 2. I was only sharing dataset with 0,1. Am sorry.

When I used @Reeza code, the 2 was part of the multi.

data want;
set have;
array _events(*) event1 event2 case32 event46 ;
length event_type $32.;
if sum(of _events(*))>1 then event_type = 'Multiple';
else if sum(of _events(*))=0 then event_type = 'Missing';
else event_type = vname(_events(whichn(1, of _events(*))));
if event_type='event46' then event_type=catx(' ','event46',grp);
run;

This is not what I want. I want 2 to follow the counting process as 1. if you have a single 2 you will be counted differently may if i can name it event1_60, event2_60 or case32_60 and if you have 2 in more than one group you will be multi_60. Again, I know this is inefficient way of requesting help. Am sorry

This is the overall goal am trying to get.

	Event1 only	Event2 only	Case32 only	Event46 A only	Event46 B only	Event46 C Only	Multiple	Total
30 days
31-60 days
Total

Create a count using conditional statement

Re: Create a count using conditional statement

Re: Create a count using conditional statement

Re: Create a count using conditional statement

Re: Create a count using conditional statement

Re: Create a count using conditional statement

Re: Create a count using conditional statement

Re: Create a count using conditional statement

Re: Create a count using conditional statement

Re: Create a count using conditional statement

Re: Create a count using conditional statement

Re: Create a count using conditional statement

Re: Create a count using conditional statement

Re: Create a count using conditional statement

Re: Create a count using conditional statement

Re: Create a count using conditional statement

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away