Solved: Re: count select observations based on conditions across rows by group

appleorange · Posted 03-08-2017 01:54 PM

I have data related to visits within a time period after a certain date. I've set up flags to indicate when a visit and nextvisit occurs. The certain date is random and is associated with an ID. So for example:

ID Certain_Date A Visit NextVisit Class

A 1/1/2015 1 1 1 001

A 3/4/2015 2 1 0 001

B 2/17/2015 1 0 0 002

B 3/01/2015 2 1 1 002

B 5/03/2015 3 0 0 002

B 9/14/2015 4 1 0 002

C 5/25/2015 1 1 1 001

D 2/22/2015 1 1 0 003

D 10/21/2015 2 0 0 003

Each certain date is actually an 'index' episode per ID, and A is a flag for how many index episodes. So A has 2, B has 4, C has 1 and D has 2. Further, each ID belongs to a class.

The goal is to count how many IDs, by class has how many index episodes, mutually exclusive visits and next visits, where A=1 is the first index, A=2 is the second index and A=3 is the third index, and so on.

Class One_index_only Two_index_only Three_index_only Four_index_only

001 1 1 0 0

002 0 0 0 1

003 0 1 0 0

And additionally, how many IDs, by class had a first visit with the first index, a first visit with the second index, or 2 or more first visits by any index (mutually exclusive).

I ended up transposing the data horizontally as such:

ID CLASS A Visit NextVisit A2 Visit2 NextVisit2 A3 Visit3 NextVisit3 A4 Visit4 NextVisit4

A 001 1 1 1 1 1 0 0 0 0 0 0 0

B 002 1 0 0 1 1 1 1 0 0 1 1 0

C 001 1 1 1 0 0 0 0 0 0 0 0 0

D 003 1 1 0 1 0 0 0 0 0 0 0 0

And from there I counted using conditions that go across the columns, for example class 001 has one ID (D) who has only one Visit and no Visit2, Visit3, Visit4. (If Visit=1 and Visit2=0 and Visit3=0 and Visit4=0)

It gets cumbersome however, when there are multiple index episodes (when A = 5 or more), and I need to count for example, between 2-3 valid visits. It would require listing out the combinations: Visit=1 and Visit2=1 and Visit3=0 and Visit4=0 and Visit5=0,

Visit=1 and Visit2=0 and Visit3=1 and Visit4=0 and Visit5=0, etc).

Basically, is there an approach that is able to count these conditions without having to transform to a horizontal format per ID, or if transforming is the best method, if there's a better non manual way to count IDs, categorized by class?

mkeintz · Posted 03-08-2017 08:24 PM

Basically, is there an approach that is able to count these conditions without having to transform to a horizontal format per ID, or if transforming is the best method, if there's a better non manual way to count IDs, categorized by class?

Yes. That is if I understand your goal correctly. You want frequencies, by class, of

Nvisits per id
Level of A for the first occurance of visit=1, A_for_first_visit_1

To do that you need to make one observation per id, with the variables CLASS, NVISITS, and A_for_first_visit_1

data need (keep=id class nvisits A_for_first_visit_1);

set have;

by id;

if last.id then nvisits=a;

retain a_for_first_visit_1;

if first.id then A_for_first_visit_1=.;

if A_for_first_visit_1=. and visit=1 then A_for_first_visit_1=A;

if last.id;

run;

proc tabulate data=need noseps;

class class nvisits A_for_first_visit_1;

table class all,nvisits all;

table class all,a_for_first_visit_1 all;

run;

Note this program assumes (1) that the data set is sorted by id, and (2) there are no "holes" in the sequence of A for each ID (i.e. A=1, then 2, 3, ...).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

ballardw · Posted 03-08-2017 02:15 PM

If you are looking for something like a report and not a dataset this may get you started:

proc tabulate data=have;
   class class;
   class a;
   table class,
         a='Index count'*n=''
         /misstext='0'
   ;
run;

It helps to post data in the form of a datastep. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

appleorange · Posted 03-21-2017 03:00 PM

I was not able to get to this project until now but I did use this and on double checking the numbers, it does not give counts of mutually exclusive conditions.

mkeintz · Posted 03-08-2017 08:24 PM

Basically, is there an approach that is able to count these conditions without having to transform to a horizontal format per ID, or if transforming is the best method, if there's a better non manual way to count IDs, categorized by class?

Yes. That is if I understand your goal correctly. You want frequencies, by class, of

Nvisits per id
Level of A for the first occurance of visit=1, A_for_first_visit_1

To do that you need to make one observation per id, with the variables CLASS, NVISITS, and A_for_first_visit_1

data need (keep=id class nvisits A_for_first_visit_1);

set have;

by id;

if last.id then nvisits=a;

retain a_for_first_visit_1;

if first.id then A_for_first_visit_1=.;

if A_for_first_visit_1=. and visit=1 then A_for_first_visit_1=A;

if last.id;

run;

proc tabulate data=need noseps;

class class nvisits A_for_first_visit_1;

table class all,nvisits all;

table class all,a_for_first_visit_1 all;

run;

Note this program assumes (1) that the data set is sorted by id, and (2) there are no "holes" in the sequence of A for each ID (i.e. A=1, then 2, 3, ...).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

appleorange · Posted 03-21-2017 04:58 PM

This definitely gets closer to the solution I was looking for, thank you.

Ksharp · Posted 03-08-2017 11:16 PM


data have;
input ID   $  Certain_Date  : mmddyy10.   A      Visit   NextVisit     Class $;
format certain_date mmddyy10.;
cards;
A      1/1/2015            1         1        1             001
A      3/4/2015            2         1        0             001
B      2/17/2015          1         0        0             002
B      3/01/2015          2         1        1             002
B      5/03/2015          3         0        0             002
B      9/14/2015          4         1        0             002
C      5/25/2015          1         1        1             001
D      2/22/2015          1         1        0             003
D      10/21/2015        2         0        0             003
;
run;

proc sql;
create table temp as
select class,id,count(*) as n
 from have
  group by class,id;
  
create table temp1 as
select class,n,count(*) as count
 from temp
  group by class,n;
quit;

proc transpose data=temp1 out=temp2 prefix=only_;
by class;
var count;
id n;
run;
proc stdize data=temp2 out=want missing=0 reponly;
run;

Classroom Training Available!