BookmarkSubscribeRSS Feed
acrowther
Calcite | Level 5

Hi all:

 

The question I need to answer is: for people who have a defined DX1 value (in this case, 5990), what are the most frequent codes that exist across DX2 through DX30?  I'm trying to build out a profile of the "most common case" based on these codes.  In other words, for people with 5990, the ten most frequent codes are "A (freq, x%) ,B (freq, x%) etc..."   I think I need two steps:

 

  1. Create a dataset containing observations with 5990 in DX1.  This is done already.
  2. For DX2 - DX 30 (the dataset extends through 30 but I trucated to make it easier to see), determine the most frequent values across all 29 variables and across all other observations.  

Many thanks for your help!  (I use 9.4, if that helps.)

 

DX1DX2DX3DX4DX5DX6DX7DX8
599028591302V5869    
5990       
599027684278949390250002720V5869V5866
59904149250004019V5866V5869  
5990       
59904019571545621    
599078814019272053081V5869V4589V5866
59904273125000V5869    
599027650V58695308140192720  
59907870178791     
599040192720V5869    
5990788207809729420V1309   
5990413426478607V125433829  
5990403915856V4511V4573V10511970V1011
10 REPLIES 10
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, the simplest way would be to normalise your data and run a proc freq on it:

data have;
  infile datalines dlm="," dsd;
  input dx1 $ dx2 $ dx3 $ dx4 $ dx5 $ dx6 $ dx7 $ dx8 $;
datalines;
5990,2859,1302,V5869,,,,
5990,,,,,,,,
5990,2768,42789,49390,25000,2720,V5869,V5866
;
run;

data inter (keep=dx1 v);
  set have;
  array vals{7} dx2-dx8;
  do i=1 to 7;
    if vals{i} ne "" then do;
      v=vals{i};
      output;
    end;
  end;
run;

/* Note ouptu is basic and output window only, so modify to your needs */ proc freq data=inter; run;

And for future reference please post test data in the form of a datastep so we don't have to guess at your data's structure.

KachiM
Rhodochrosite | Level 12

"determine the most frequent values across all 29 variables and across all other observations.  "

 

If you show the required output for the sample data, it is possible to visualize what you want as in the above line. Then a better way can be shown.

 

acrowther
Calcite | Level 5

Thanks for your replies.  Below is the datastep (if that helps; I can provide more detailed data if needed) I used to create my data and also the chart I'd like to be able to populate.  DX1 is used as the initial filter to create the derrivative dataset.  From that secondary dataset (workdata.discharge_5990), I'd like to be able to determine the most frequent DX codes that appear in DX2 - DX30 for all observations, rather than just a single row.  

 

The dataset "workdata.discharge_5990" has more than 100,000 observations, if that helps guide the approach.  It does not have a variable that functions as a unique identifier though all the observations are unique.

 

 Top Ten Most Common DX for Paients with DX1 of 5990
 N%
DX A  
DX B  
DX C  
DX D  
DX E  
DX F  
DX G  
DX H  
DX I  
DX J  

 

 

 

 

data workdata.discharge_5990;
	set workdata.emergency_department;
	where dx1 = '5990';
run;

 

KachiM
Rhodochrosite | Level 12

Do the Top 10 DX refers to all DXs(amalgamated)? In other words, can DX2 to DX30 be clubbed for counting purpose?

acrowther
Calcite | Level 5

Great question.  Yes, the Top 10 DX refers to counts from all DXs combined (DX2-DX30).  They can definitely be clubbed for counting purposes.  

Astounding
PROC Star

Must "5990" appear only in DX1, not in any of the other codes?

 

The very first reply you received contains the right approach.  It might still need a little work, along the lines of:

 

  • Expanding to 30 codes instead of 8
  • Changing the order of the reporting rows
  • Moving the results from a report to a data set

But it's a very straightforward, accurate way to approach the problem.  Did you attempt to use it?

Shmuel
Garnet | Level 18

Run next short code and check output dataset. 

You may find it easy to find in what you looking for:

 

proc freq data=have;  /* or   data=have(where=(dx1=5990)); */

      table dx1 * (dx2 - dx30);   /* try also, depending sas version:  dx1 * (dx: ) */

      output  out=freq;

run;

 

      

acrowther
Calcite | Level 5

Thanks for replying.  I ran this code - seems to give a distribution of the most frequent DX2 codes across all observations (and so on for DX3-30).   I think the next step is figuring out which DX values are the mostcommon across DX2-30 (not just within each DX).  Does that make sense?

KachiM
Rhodochrosite | Level 12

A simple way,  but I am not sure that this will meet your ultimate goal.

 

Break  DX2 to DX30 as 30 single STRING variables and save it to a data set. You may ignore the missing values. Sort the data set by STRING in descending order and take first 10 rows.

Shmuel
Garnet | Level 18

Does next code helps you ?

data have;
    infile datalines truncover;
    input dx1 $ dx2 $ dx3 $ dx4 $ dx5 $ dx6 $ dx7 $  dx8 $ ;
/*DX1     DX2     DX3     DX4     DX5     DX6     DX7     DX8 */
DATALINES;
5990    2859    1302    V5869                
5990                             
5990    2768    42789   49390   25000   2720    V5869   V5866
5990    4149    25000   4019    V5866   V5869        
5990                             
5990    4019    5715    45621                
5990    7881    4019    2720    53081   V5869   V4589   V5866
5990    42731   25000   V5869                
5990    27650   V5869   53081   4019    2720         
5990    78701   78791                    
5990    4019    2720    V5869                
5990    78820   78097   29420   V1309            
5990    413     4264    78607   V1254   33829        
5990    40391   5856    V4511   V4573   V1051   1970    V1011
; run;

data temp;
 set have;
   by dx1;
     array dx $ dx2-dx8;
     do i=1 to 7;
        if dx(i) ne ' ' then do;
        dxn = dx(i);
        output;
     end; end;
     keep dx1 dxn;
run;
proc sql;
     create table freq as
     select dx1, dxn, count(dxn) as freq
     from temp
     group by dx1, dxn;
quit;
proc sort data=freq; by descending freq; run;
proc print data=freq(obs=10); run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 4307 views
  • 0 likes
  • 5 in conversation