How do I count observations across variables?

acrowther · Posted 02-03-2017 09:19 AM

Hi all:

The question I need to answer is: for people who have a defined DX1 value (in this case, 5990), what are the most frequent codes that exist across DX2 through DX30? I'm trying to build out a profile of the "most common case" based on these codes. In other words, for people with 5990, the ten most frequent codes are "A (freq, x%) ,B (freq, x%) etc..." I think I need two steps:

Create a dataset containing observations with 5990 in DX1. This is done already.
For DX2 - DX 30 (the dataset extends through 30 but I trucated to make it easier to see), determine the most frequent values across all 29 variables and across all other observations.

Many thanks for your help! (I use 9.4, if that helps.)

DX1	DX2	DX3	DX4	DX5	DX6	DX7	DX8
5990	2859	1302	V5869
5990
5990	2768	42789	49390	25000	2720	V5869	V5866
5990	4149	25000	4019	V5866	V5869
5990
5990	4019	5715	45621
5990	7881	4019	2720	53081	V5869	V4589	V5866
5990	42731	25000	V5869
5990	27650	V5869	53081	4019	2720
5990	78701	78791
5990	4019	2720	V5869
5990	78820	78097	29420	V1309
5990	413	4264	78607	V1254	33829
5990	40391	5856	V4511	V4573	V1051	1970	V1011

RW9 · Posted 02-03-2017 09:36 AM

Well, the simplest way would be to normalise your data and run a proc freq on it:

data have;
  infile datalines dlm="," dsd;
  input dx1 $ dx2 $ dx3 $ dx4 $ dx5 $ dx6 $ dx7 $ dx8 $;
datalines;
5990,2859,1302,V5869,,,,
5990,,,,,,,,
5990,2768,42789,49390,25000,2720,V5869,V5866
;
run;

data inter (keep=dx1 v);
  set have;
  array vals{7} dx2-dx8;
  do i=1 to 7;
    if vals{i} ne "" then do;
      v=vals{i};
      output;
    end;
  end;
run;

/* Note ouptu is basic and output window only, so modify to your needs */
proc freq data=inter;  
run;

And for future reference please post test data in the form of a datastep so we don't have to guess at your data's structure.

KachiM · Posted 02-03-2017 09:52 AM

"determine the most frequent values across all 29 variables and across all other observations. "

If you show the required output for the sample data, it is possible to visualize what you want as in the above line. Then a better way can be shown.

acrowther · Posted 02-03-2017 10:08 AM

Thanks for your replies. Below is the datastep (if that helps; I can provide more detailed data if needed) I used to create my data and also the chart I'd like to be able to populate. DX1 is used as the initial filter to create the derrivative dataset. From that secondary dataset (workdata.discharge_5990), I'd like to be able to determine the most frequent DX codes that appear in DX2 - DX30 for all observations, rather than just a single row.

The dataset "workdata.discharge_5990" has more than 100,000 observations, if that helps guide the approach. It does not have a variable that functions as a unique identifier though all the observations are unique.

	Top Ten Most Common DX for Paients with DX1 of 5990
	N	%
DX A
DX B
DX C
DX D
DX E
DX F
DX G
DX H
DX I
DX J

data workdata.discharge_5990;
	set workdata.emergency_department;
	where dx1 = '5990';
run;

KachiM · Posted 02-03-2017 10:27 AM

Do the Top 10 DX refers to all DXs(amalgamated)? In other words, can DX2 to DX30 be clubbed for counting purpose?

acrowther · Posted 02-03-2017 10:31 AM

Great question. Yes, the Top 10 DX refers to counts from all DXs combined (DX2-DX30). They can definitely be clubbed for counting purposes.

Astounding · Posted 02-03-2017 04:48 PM

Must "5990" appear only in DX1, not in any of the other codes?

The very first reply you received contains the right approach. It might still need a little work, along the lines of:

Expanding to 30 codes instead of 8
Changing the order of the reporting rows
Moving the results from a report to a data set

But it's a very straightforward, accurate way to approach the problem. Did you attempt to use it?

Shmuel · Posted 02-03-2017 10:11 AM

Run next short code and check output dataset.

You may find it easy to find in what you looking for:

proc freq data=have; /* or data=have(where=(dx1=5990)); */

table dx1 * (dx2 - dx30); /* try also, depending sas version: dx1 * (dx: ) */

output out=freq;

run;

acrowther · Posted 02-03-2017 10:55 AM

Thanks for replying. I ran this code - seems to give a distribution of the most frequent DX2 codes across all observations (and so on for DX3-30). I think the next step is figuring out which DX values are the mostcommon across DX2-30 (not just within each DX). Does that make sense?

KachiM · Posted 02-03-2017 11:19 AM

A simple way, but I am not sure that this will meet your ultimate goal.

Break DX2 to DX30 as 30 single STRING variables and save it to a data set. You may ignore the missing values. Sort the data set by STRING in descending order and take first 10 rows.

Shmuel · Posted 02-04-2017 02:05 AM

Does next code helps you ?

data have;
    infile datalines truncover;
    input dx1 $ dx2 $ dx3 $ dx4 $ dx5 $ dx6 $ dx7 $  dx8 $ ;
/*DX1     DX2     DX3     DX4     DX5     DX6     DX7     DX8 */
DATALINES;
5990    2859    1302    V5869                
5990                             
5990    2768    42789   49390   25000   2720    V5869   V5866
5990    4149    25000   4019    V5866   V5869        
5990                             
5990    4019    5715    45621                
5990    7881    4019    2720    53081   V5869   V4589   V5866
5990    42731   25000   V5869                
5990    27650   V5869   53081   4019    2720         
5990    78701   78791                    
5990    4019    2720    V5869                
5990    78820   78097   29420   V1309            
5990    413     4264    78607   V1254   33829        
5990    40391   5856    V4511   V4573   V1051   1970    V1011
; run;

data temp;
 set have;
   by dx1;
     array dx $ dx2-dx8;
     do i=1 to 7;
        if dx(i) ne ' ' then do;
        dxn = dx(i);
        output;
     end; end;
     keep dx1 dxn;
run;
proc sql;
     create table freq as
     select dx1, dxn, count(dxn) as freq
     from temp
     group by dx1, dxn;
quit;
proc sort data=freq; by descending freq; run;
proc print data=freq(obs=10); run;

How do I count observations across variables?

Re: How do I count observations across variables?

Re: How do I count observations across variables?

Re: How do I count observations across variables?

Re: How do I count observations across variables?

Re: How do I count observations across variables?

Re: How do I count observations across variables?

Re: How do I count observations across variables?

Re: How do I count observations across variables?

Re: How do I count observations across variables?

Re: How do I count observations across variables?

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!