BookmarkSubscribeRSS Feed
Rcrowder
Calcite | Level 5

I have data where subjects come in for three separate visits during a study to track health issues.

 

I need to flag data that repeats the same issues for the same visit date of each subject.

 

EX:

Sub       Visit    Issue

1            1         A

1            2         B

1            2         C

1            3         A

 

2            1         A

2            2         C

2            2         B

2            2         C

2            3         D

 

Subject 2's second visit should be flagged because there are 2 issues C's posted in the data set.

 

3 REPLIES 3
novinosrin
Tourmaline | Level 20

Can you please post the expected output sample for the input sample? It leads me/perhaps some of us to assumptions that we can avoid

ballardw
Super User

@Rcrowder wrote:

I have data where subjects come in for three separate visits during a study to track health issues.

 

I need to flag data that repeats the same issues for the same visit date of each subject.

 

EX:

Sub       Visit    Issue

1            1         A

1            2         B

1            2         C

1            3         A

 

2            1         A

2            2         C

2            2         B

2            2         C

2            3         D

 

Subject 2's second visit should be flagged because there are 2 issues C's posted in the data set.

 


Best is to provide example data in the form of a data step so we test code. Example:

data work.have;
input Sub       Visit    Issue  $;
datalines;
1            1         A
1            2         B
1            2         C
1            3         A
2            1         A
2            2         C
2            2         B
2            2         C
2            3         D
;

Paste into a code box opened with the forum's {I} or "running man" icons to preserve formatting. the message windows will reformat text and sometimes the reformatting will change the data enough that the data step will not run.

 

 

One easy way to get just the problem records:

proc freq data=work.have noprint;
   tables sub*visit*issue/out=work.issuecount(where=(count>1));
run;

Or to set a flag, 1=True 0=False for duplicates:

proc sort data=work.have;
   by sub visit issue;
run;
data work.want;
   set work.have;
   by sub visit issue;
   flag=  not first.issue ;
run;

A By statement in the data set creates a number of Boolean variables named First. and Last. for each variable on the by statement. If the record is the first at a given level then it is True (or 1). So if something is not the first record with the same Issue value then negating that variable gets the flag.

hashman
Ammonite | Level 13

@Rcrowder:

Methinks the simplest is to use the SAS SQL's self-merge facility:

data have ;                                               
  input Sub Visit Issue :$1. ;                            
  cards ;                                                 
1  1  A                                                   
1  2  B                                                   
1  2  C                                                   
1  3  A                                                   
2  1  A                                                   
2  2  C                                                   
2  2  B                                                   
2  2  C                                                   
2  3  D                                                   
;                                                         
run ;                                                     
                                                          
proc sql ;                                                
  create table want as                                       
  select *, count(issue) > count(distinct issue) as flag
  from   have                                             
  group  sub, visit                                       
  ;                                                       
quit ;                                                    

Note: The original order of the records within (sub,visit) may change, for example:

Sub    Visit    Issue    flag
-----------------------------
 1       1        A        0 
 1       2        C        0 
 1       2        B        0 
 1       3        A        0 
 2       1        A        0 
 2       2        B        1 
 2       2        C        1 
 2       2        C        1 
 2       3        D        0 

But since it's SQL, this is an expected behavior.

 

OTOH, since your data appear to be sorted by (sub,visit), you can get what you want and preserve the original record order using a simple double DoW loop:

data want (drop = _:) ;                               
  do _n_ = 1 by 1 until (last.visit) ;    
    set have ;                            
    by sub visit ;                        
    if _n_ = 1 then _issue = issue ;      
    else if issue = _issue then flag = 1 ;
  end ;                                   
  do _n_ = 1 to _n_ ;                     
    set have ;                            
    flag = ^^ flag ;                      
    output ;                              
  end ;                                   
run ;                                     

What you get as a result in the output is:

Sub    Visit    Issue    flag
-----------------------------
 1       1        A        0 
 1       2        B        0 
 1       2        C        0 
 1       3        A        0 
 2       1        A        0 
 2       2        C        1 
 2       2        B        1 
 2       2        C        1 
 2       3        D        0 

However, your specs are rather vague, so there's a chance you want to flag only the records with the duplicate values of ISSUE rather than the entire (sub,visit) group where they are found. If this is the case, the hash object comes in handy, especially since its use doesn't assume that the input file is sorted - and it also preserves the original record order:

data want (drop = _:) ;                     
  if _n_ = 1 then do ;                      
    dcl hash h () ;                         
    h.definekey  ("sub", "visit", "issue") ;
    h.definedata ("_q") ;                   
    h.definedone () ;                       
    do until (z) ;                          
      set have end = z ;                    
      if h.find() ne 0 then _q = 1 ;        
      else _q + 1 ;                         
      h.replace() ;                         
    end ;                                   
  end ;                                     
  set have ;                                
  h.find() ;                                
  flag = _q > 1 ;                           
run ;                                       

As a result, you'll get:

Sub    Visit    Issue    flag
-----------------------------
 1       1        A        0 
 1       2        B        0 
 1       2        C        0 
 1       3        A        0 
 2       1        A        0 
 2       2        C        1 
 2       2        B        0 
 2       2        C        1 
 2       3        D        0 

Kind regards

Paul D. 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1578 views
  • 0 likes
  • 4 in conversation