## count select observations based on conditions across rows by group

Solved
Occasional Contributor
Posts: 19

# count select observations based on conditions across rows by group

[ Edited ]

I have data related to visits within a time period after a certain date.  I've set up flags to indicate when a visit and nextvisit occurs.  The certain date is random and is associated with an ID.  So for example:

ID     Certain_Date     A      Visit   NextVisit     Class

A      1/1/2015            1         1        1             001

A      3/4/2015            2         1        0             001

B      2/17/2015          1         0        0             002

B      3/01/2015          2         1        1             002

B      5/03/2015          3         0        0             002

B      9/14/2015          4         1        0             002

C      5/25/2015          1         1        1             001

D      2/22/2015          1         1        0             003

D      10/21/2015        2         0        0             003

Each certain date is actually an 'index' episode per ID, and A is a flag for how many index episodes.  So A has 2, B has 4, C has 1 and D has 2.  Further, each ID belongs to a class.

The goal is to count how many IDs, by class has how many index episodes, mutually exclusive visits and next visits, where A=1 is the first index, A=2 is the second index and A=3 is the third index, and so on.

Class   One_index_only  Two_index_only  Three_index_only  Four_index_only

001           1                               1                         0                               0

002           0                               0                         0                               1

003           0                               1                         0                               0

And additionally, how many IDs, by class had a first visit with the first index, a first visit with the second index, or 2 or more first visits by any index (mutually exclusive).

I ended up transposing the data horizontally as such:

ID      CLASS    A    Visit    NextVisit   A2    Visit2   NextVisit2   A3    Visit3   NextVisit3   A4   Visit4   NextVisit4

A       001          1       1           1           1         1         0              0        0           0            0      0          0

B       002          1       0           0           1         1         1              1        0            0           1      1           0

C       001          1       1           1           0          0        0              0        0            0           0      0           0

D       003          1       1           0           1          0        0              0        0            0           0      0           0

And from there I counted using conditions that go across the columns, for example class 001 has one ID (D) who has only one Visit and no Visit2, Visit3, Visit4. (If Visit=1 and Visit2=0 and Visit3=0 and Visit4=0)

It gets cumbersome however, when there are multiple index episodes (when A = 5 or more), and I need to count for example, between 2-3 valid visits.  It would require listing out the combinations: Visit=1 and Visit2=1 and Visit3=0 and Visit4=0 and Visit5=0,

Visit=1 and Visit2=0 and Visit3=1 and Visit4=0 and Visit5=0, etc).

Basically, is there an approach that is able to count these conditions without having to transform to a horizontal format per ID, or if transforming is the best method, if there's a better non manual way to count IDs, categorized by class?

Accepted Solutions
Solution
‎03-21-2017 05:00 PM
Posts: 1,288

## Re: count select observations based on conditions across rows by group

Basically, is there an approach that is able to count these conditions without having to transform to a horizontal format per ID, or if transforming is the best method, if there's a better non manual way to count IDs, categorized by class?

Yes.   That is if I understand your goal correctly.  You want frequencies, by class, of

1. Nvisits per id
2. Level of A for the first occurance of visit=1,   A_for_first_visit_1

To do that you need to make one observation per id, with the variables CLASS, NVISITS, and A_for_first_visit_1

data need (keep=id class nvisits A_for_first_visit_1);

set have;

by id;

if last.id then nvisits=a;

retain a_for_first_visit_1;

if first.id then A_for_first_visit_1=.;

if A_for_first_visit_1=. and visit=1 then A_for_first_visit_1=A;

if last.id;

run;

proc tabulate data=need noseps;

class class nvisits  A_for_first_visit_1;

table    class all,nvisits all;

table    class all,a_for_first_visit_1 all;

run;

Note this program assumes (1) that the data set is sorted by id, and (2) there are no "holes" in the sequence of A for each ID  (i.e. A=1, then 2, 3, ...).

All Replies
Super User
Posts: 13,063

## Re: count select observations based on conditions across rows by group

If you are looking for something like a report and not a dataset this may get you started:

```proc tabulate data=have;
class class;
class a;
table class,
a='Index count'*n=''
/misstext='0'
;
run;```

It helps to post data in the form of a datastep. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

Occasional Contributor
Posts: 19

## Re: count select observations based on conditions across rows by group

I was not able to get to this project until now but I did use this and on double checking the numbers, it does not give counts of mutually exclusive conditions.
Solution
‎03-21-2017 05:00 PM
Posts: 1,288

## Re: count select observations based on conditions across rows by group

Basically, is there an approach that is able to count these conditions without having to transform to a horizontal format per ID, or if transforming is the best method, if there's a better non manual way to count IDs, categorized by class?

Yes.   That is if I understand your goal correctly.  You want frequencies, by class, of

1. Nvisits per id
2. Level of A for the first occurance of visit=1,   A_for_first_visit_1

To do that you need to make one observation per id, with the variables CLASS, NVISITS, and A_for_first_visit_1

data need (keep=id class nvisits A_for_first_visit_1);

set have;

by id;

if last.id then nvisits=a;

retain a_for_first_visit_1;

if first.id then A_for_first_visit_1=.;

if A_for_first_visit_1=. and visit=1 then A_for_first_visit_1=A;

if last.id;

run;

proc tabulate data=need noseps;

class class nvisits  A_for_first_visit_1;

table    class all,nvisits all;

table    class all,a_for_first_visit_1 all;

run;

Note this program assumes (1) that the data set is sorted by id, and (2) there are no "holes" in the sequence of A for each ID  (i.e. A=1, then 2, 3, ...).

Occasional Contributor
Posts: 19

## Re: count select observations based on conditions across rows by group

[ Edited ]

This definitely gets closer to the solution I was looking for, thank you.

Super User
Posts: 10,618

## Re: count select observations based on conditions across rows by group

```
data have;
input ID   \$  Certain_Date  : mmddyy10.   A      Visit   NextVisit     Class \$;
format certain_date mmddyy10.;
cards;
A      1/1/2015            1         1        1             001
A      3/4/2015            2         1        0             001
B      2/17/2015          1         0        0             002
B      3/01/2015          2         1        1             002
B      5/03/2015          3         0        0             002
B      9/14/2015          4         1        0             002
C      5/25/2015          1         1        1             001
D      2/22/2015          1         1        0             003
D      10/21/2015        2         0        0             003
;
run;

proc sql;
create table temp as
select class,id,count(*) as n
from have
group by class,id;

create table temp1 as
select class,n,count(*) as count
from temp
group by class,n;
quit;

proc transpose data=temp1 out=temp2 prefix=only_;
by class;
var count;
id n;
run;
proc stdize data=temp2 out=want missing=0 reponly;
run;
```
☑ This topic is solved.