BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Amanda3
Fluorite | Level 6

Hi, my dataset includes multiple diagnostic codes from multiple facilities for each patient. I want to only keep observations (ID) for the FACILITY with the greatest number of diagnostic CODEs. Is there an easy way to do this?

 

For example, this:

ID FACILITY CODE

1 A 416000.1

1 A 751101.2

1 B 416000.1

2 A 416000.1

2 A 751101.2

2 B 416000.1

2 B 102000.1

2 B 102146.2

 

would become this:

ID FACILITY CODE

1 A 416000.1

1 A 751101.2

2 B 416000.1

2 B 102000.1

2 B 102146.2

 

Thanks,

 

Amanda

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi @Amanda3  Assuming I understand your requirement

 


data have;
input ID FACILITY $ CODE;
cards;
1 A 416000.1
1 A 751101.2
1 B 416000.1
2 A 416000.1
2 A 751101.2
2 B 416000.1
2 B 102000.1
2 B 102146.2
;
 
proc sql;
create table want(drop=c) as
select *
from (select *,count(*) as c from have group id,facility)
group by id
having max(c)=c;
quit;

View solution in original post

18 REPLIES 18
novinosrin
Tourmaline | Level 20

Hi @Amanda3  Assuming I understand your requirement

 


data have;
input ID FACILITY $ CODE;
cards;
1 A 416000.1
1 A 751101.2
1 B 416000.1
2 A 416000.1
2 A 751101.2
2 B 416000.1
2 B 102000.1
2 B 102146.2
;
 
proc sql;
create table want(drop=c) as
select *
from (select *,count(*) as c from have group id,facility)
group by id
having max(c)=c;
quit;
Amanda3
Fluorite | Level 6
@novinosrin, Thank you! It worked. I should have asked sooner. I cannot believe how much time I wasted.
Amanda3
Fluorite | Level 6

One problem, how do I select only one facility for each ID when facilities have the same max?

 

For example:

ID FACILITY CODE

3 A 416000.1

3 A 751101.2

3 B 416000.1

3 B 416000.1

 

novinosrin
Tourmaline | Level 20

Good question @Amanda3 , So it;s a business question as to how to deal with ties? 

 

You got to tell us, which ones do you want to keep where there is a tie? How would you want to prioritize?

Amanda3
Fluorite | Level 6
Thanks for being so nice. There is a fourth variable SEVERITY (ranging from 1-6, where 1 is least severe and 6 is most severe). I would want to keep patients with the most severe injuries, but I suspect there will be many ties (again). In the case of another tie, it wouldn't matter which facility was kept.
novinosrin
Tourmaline | Level 20

Okay @Amanda3   Take your time, examine the variables and your business requirement thoroughly and once you are clear, please post your modified input sample and the expected output for the new sample. Briefly add the logic notes why as in like your "Severity" variable.  Once we have this squeaky clean, coding is 5 mins work. 

Amanda3
Fluorite | Level 6

@novinosrin, thank you and sorry for the confusion. My dataset includes multiple diagnostic codes from multiple facilities for each patient. I want to only keep observations (ID) for the FACILITY with the greatest number of diagnostic CODEs.

 

data have;

input ID FACILITY $ CODE SEVERITY;

cards;

1 A 416000.1 1

1 A 751101.2 2

1 B 416000.1 1

2 A 416000.1 1

2 A 751101.2 2

2 B 416000.1 1

2 B 102000.1 1

2 B 102146.2 2

3 A 416000.1 1

3 A 751101.2 2

3 B 416000.1 1

3 B 416000.1 1

4 A 416000.1 1

4 A 751101.2 1

4 B 416000.1 1

4 B 416000.1 1;

 

You have already helped me determine how to use SAS to select FACILITY 1A and 2B (since they have the most CODEs), but FACILITY 3 A/B and 4 A/B both have two CODEs. I would like to select FACILITY 3A because it has a greater total SEVERITY (1+2). The SEVERITY scores for facility 4 A/B are tied, so it doesn’t matter which I include. In the WANT dataset I just included first FACILITY (4A).

 

WANT:

1 A 416000.1 1

1 A 751101.2 2

2 B 416000.1 1

2 B 102000.1 1

2 B 102146.2 2

3 A 416000.1 1

3 A 751101.2 2

4 A 416000.1 1

4 A 751101.1 1

 

Thank you again for sparing your valuable time to help!

novinosrin
Tourmaline | Level 20

Hi @Amanda3  Keeping it simple

 

/*Modified the same SQL with a sum of severity and 
descending severity into a temp table for a rank order*/
proc sql;
create table temp(drop=c) as
select *
from (select *,count(*) as c,sum(severity) as s from have group id,facility)
group by id
having max(c)=c
order by id,facility,s desc; 
quit;
/*Using the rank order subsetting the ID facility that falls into the 
top order i.e n=1 and finally dropping all temp logic variables*/
data want;
set temp;
by id facility;
if first.id then n=1;
else if first.facility then  n+1;
if n=1;
drop n s ;
run;
PaigeMiller
Diamond | Level 26

@Amanda3 wrote:

Hi, my dataset includes multiple diagnostic codes from multiple facilities for each patient. I want to only keep observations (ID) for the FACILITY with the greatest number of diagnostic CODEs. Is there an easy way to do this?


I don't understand the logic that gets your from input to output. Why isn't the third line of the input (1 B 416000.1) included in the output? Why aren't the rows that begin with 2 A in the output?

 

 

--
Paige Miller
Amanda3
Fluorite | Level 6

 

@PaigeMiller Sorry for the confusion. The logic that got me from the input to output was that ID#1 went to facility A (2 codes) and facility B (1 code), while ID#2 went to facility A (2 codes) and B (3 codes). I want to keep the facility with the most codes.

 

hashman
Ammonite | Level 13

@Amanda3:

It can't get more concise than @novinosrin's SQL where all procedural details are hidden behind the brain of the SQL optimizer. But the task offers a good proving ground for a nice exercise of making use of BY processing coupled with a hash table. Your sample data are sorted by ID FACILITY but not by CODE. Hence, we can take advantage of the sorted order and use a hash table to find the maximal number of distinct CODE values per FACILITY:

data have ;                                
  input id facility $ code $ ;             
  cards ;                                  
1 A 416000.1                               
1 A 751101.2                               
1 B 416000.1                               
2 A 416000.1                               
2 A 751101.2                               
2 B 416000.1                               
2 B 102000.1                               
2 B 102146.2                               
;                                          
run ;                                      
                                           
data want (drop = _:) ;                    
  if _n_ = 1 then do ;                     
    dcl hash q () ;                        
    q.definekey ("code") ;                 
    q.definedone () ;                      
  end ;                                    
  do _n_ = q.clear() by 0 until (last.id) ;
    do until (last.facility) ;             
      set have ;                           
      by id facility ;                     
      q.ref() ;                            
    end ;                                  
    if q.num_items <= _qmax then continue ;
    _qmax = q.num_items ;                  
    _fmax = facility ;                     
  end ;                                    
  do until (last.id) ;                     
    set have ;                             
    by id ;                                
    if facility = _fmax then output ;      
  end ;                                    
run ;                                      

Now if the input file were sorted by ID FACILITY CODE, BY processing alone would more than suffice:

proc sort data = have out = shave ;                                                                                                     
  by id facility code ;                                                                                                                 
run ;                                                                                                                                   
                                                                                                                                        
data want (drop = _:) ;                                                                                                                 
  do until (last.id) ;                                                                                                                  
    do _q = 0 by 0 until (last.facility) ;                                                                                              
      set shave ;                                                                                                                       
      by id facility code ;                                                                                                             
      _q + first.code ;                                                                                                                 
    end ;                                                                                                                               
    if _q <= _qmax then continue ;                                                                                                      
    _qmax = _q ;                                                                                                                        
    _fmax = facility ;                                                                                                                  
  end ;                                                                                                                                 
  do until (last.id) ;                                                                                                                  
    set shave ;                                                                                                                         
    by id ;                                                                                                                             
    if facility = _fmax then output ;                                                                                                   
  end ;                                                                                                                                 
run ;                    

Kind regards

Paul D.   

novinosrin
Tourmaline | Level 20

Good morning Guru @hashman  , Indeed classic and slick. However, it seems it's OP's very first post and tagged as "NewUser", so without taking away the potential of the prospect of great ability of new users, hmm nevertheless your level of Classics might be too overwhelming?  I recall reading your magnificent "DO" paper 3 years ago at a time when my experience was 2.5 years and that was mind boggling.  

 

Having said that, I know Paige miller and I are certainly benefited. So Thank you and Kudos 🙂

hashman
Ammonite | Level 13

@novinosrin: Thanks 1E6 for all the kind words. 

To your "new user" point, methinks it merely indicates a new user of this discussion forum and not necessarily a new SAS user.

Secondly, IMO, one can learn a lot from struggling to understand code (written by a solid programmer) that at first glance seems incomprehensible. Finally, what we post here is intended not solely for the OPs but for anyone who cares to read them. 

 

Kind regards

Paul D.

    

Amanda3
Fluorite | Level 6
@hashman Thanks for the response. I would have to agree with @novinosrin that you are at a level above. I will have to revisit your response when my skills improve.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 1025 views
  • 4 likes
  • 6 in conversation