DATA Step, Macro, Functions and more

count select observations based on conditions across rows by group

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 14
Accepted Solution

count select observations based on conditions across rows by group

[ Edited ]

I have data related to visits within a time period after a certain date.  I've set up flags to indicate when a visit and nextvisit occurs.  The certain date is random and is associated with an ID.  So for example:

 

ID     Certain_Date     A      Visit   NextVisit     Class

A      1/1/2015            1         1        1             001

A      3/4/2015            2         1        0             001

B      2/17/2015          1         0        0             002

B      3/01/2015          2         1        1             002

B      5/03/2015          3         0        0             002

B      9/14/2015          4         1        0             002

C      5/25/2015          1         1        1             001

D      2/22/2015          1         1        0             003

D      10/21/2015        2         0        0             003

 

Each certain date is actually an 'index' episode per ID, and A is a flag for how many index episodes.  So A has 2, B has 4, C has 1 and D has 2.  Further, each ID belongs to a class.

 

The goal is to count how many IDs, by class has how many index episodes, mutually exclusive visits and next visits, where A=1 is the first index, A=2 is the second index and A=3 is the third index, and so on.

 

Class   One_index_only  Two_index_only  Three_index_only  Four_index_only

001           1                               1                         0                               0

002           0                               0                         0                               1

003           0                               1                         0                               0

 

And additionally, how many IDs, by class had a first visit with the first index, a first visit with the second index, or 2 or more first visits by any index (mutually exclusive).

 

I ended up transposing the data horizontally as such:

 

ID      CLASS    A    Visit    NextVisit   A2    Visit2   NextVisit2   A3    Visit3   NextVisit3   A4   Visit4   NextVisit4

A       001          1       1           1           1         1         0              0        0           0            0      0          0

B       002          1       0           0           1         1         1              1        0            0           1      1           0

C       001          1       1           1           0          0        0              0        0            0           0      0           0

D       003          1       1           0           1          0        0              0        0            0           0      0           0 

 

And from there I counted using conditions that go across the columns, for example class 001 has one ID (D) who has only one Visit and no Visit2, Visit3, Visit4. (If Visit=1 and Visit2=0 and Visit3=0 and Visit4=0)

It gets cumbersome however, when there are multiple index episodes (when A = 5 or more), and I need to count for example, between 2-3 valid visits.  It would require listing out the combinations: Visit=1 and Visit2=1 and Visit3=0 and Visit4=0 and Visit5=0,

Visit=1 and Visit2=0 and Visit3=1 and Visit4=0 and Visit5=0, etc).

 

Basically, is there an approach that is able to count these conditions without having to transform to a horizontal format per ID, or if transforming is the best method, if there's a better non manual way to count IDs, categorized by class?

 

 

 


Accepted Solutions
Solution
‎03-21-2017 05:00 PM
Trusted Advisor
Posts: 1,018

Re: count select observations based on conditions across rows by group

Posted in reply to appleorange

 

Basically, is there an approach that is able to count these conditions without having to transform to a horizontal format per ID, or if transforming is the best method, if there's a better non manual way to count IDs, categorized by class?

 

Yes.   That is if I understand your goal correctly.  You want frequencies, by class, of

  1. Nvisits per id
  2. Level of A for the first occurance of visit=1,   A_for_first_visit_1

To do that you need to make one observation per id, with the variables CLASS, NVISITS, and A_for_first_visit_1

 

data need (keep=id class nvisits A_for_first_visit_1);

  set have;

  by id;

  if last.id then nvisits=a;

  retain a_for_first_visit_1;

  if first.id then A_for_first_visit_1=.;

  if A_for_first_visit_1=. and visit=1 then A_for_first_visit_1=A;

  if last.id;

run;

 

proc tabulate data=need noseps;

  class class nvisits  A_for_first_visit_1;

  table    class all,nvisits all;

  table    class all,a_for_first_visit_1 all;

run;

 

 

Note this program assumes (1) that the data set is sorted by id, and (2) there are no "holes" in the sequence of A for each ID  (i.e. A=1, then 2, 3, ...).

View solution in original post


All Replies
Super User
Posts: 11,343

Re: count select observations based on conditions across rows by group

Posted in reply to appleorange

If you are looking for something like a report and not a dataset this may get you started:

 

proc tabulate data=have;
   class class;
   class a;
   table class,
         a='Index count'*n=''
         /misstext='0'
   ;
run;

It helps to post data in the form of a datastep. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

Occasional Contributor
Posts: 14

Re: count select observations based on conditions across rows by group

I was not able to get to this project until now but I did use this and on double checking the numbers, it does not give counts of mutually exclusive conditions.
Solution
‎03-21-2017 05:00 PM
Trusted Advisor
Posts: 1,018

Re: count select observations based on conditions across rows by group

Posted in reply to appleorange

 

Basically, is there an approach that is able to count these conditions without having to transform to a horizontal format per ID, or if transforming is the best method, if there's a better non manual way to count IDs, categorized by class?

 

Yes.   That is if I understand your goal correctly.  You want frequencies, by class, of

  1. Nvisits per id
  2. Level of A for the first occurance of visit=1,   A_for_first_visit_1

To do that you need to make one observation per id, with the variables CLASS, NVISITS, and A_for_first_visit_1

 

data need (keep=id class nvisits A_for_first_visit_1);

  set have;

  by id;

  if last.id then nvisits=a;

  retain a_for_first_visit_1;

  if first.id then A_for_first_visit_1=.;

  if A_for_first_visit_1=. and visit=1 then A_for_first_visit_1=A;

  if last.id;

run;

 

proc tabulate data=need noseps;

  class class nvisits  A_for_first_visit_1;

  table    class all,nvisits all;

  table    class all,a_for_first_visit_1 all;

run;

 

 

Note this program assumes (1) that the data set is sorted by id, and (2) there are no "holes" in the sequence of A for each ID  (i.e. A=1, then 2, 3, ...).

Occasional Contributor
Posts: 14

Re: count select observations based on conditions across rows by group

[ Edited ]

This definitely gets closer to the solution I was looking for, thank you.

Super User
Posts: 10,020

Re: count select observations based on conditions across rows by group

Posted in reply to appleorange

data have;
input ID   $  Certain_Date  : mmddyy10.   A      Visit   NextVisit     Class $;
format certain_date mmddyy10.;
cards;
A      1/1/2015            1         1        1             001
A      3/4/2015            2         1        0             001
B      2/17/2015          1         0        0             002
B      3/01/2015          2         1        1             002
B      5/03/2015          3         0        0             002
B      9/14/2015          4         1        0             002
C      5/25/2015          1         1        1             001
D      2/22/2015          1         1        0             003
D      10/21/2015        2         0        0             003
;
run;

proc sql;
create table temp as
select class,id,count(*) as n
 from have
  group by class,id;
  
create table temp1 as
select class,n,count(*) as count
 from temp
  group by class,n;
quit;

proc transpose data=temp1 out=temp2 prefix=only_;
by class;
var count;
id n;
run;
proc stdize data=temp2 out=want missing=0 reponly;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 183 views
  • 0 likes
  • 4 in conversation