## How do I count observations across variables?

Occasional Contributor
Posts: 6

# How do I count observations across variables?

Hi all:

The question I need to answer is: for people who have a defined DX1 value (in this case, 5990), what are the most frequent codes that exist across DX2 through DX30?  I'm trying to build out a profile of the "most common case" based on these codes.  In other words, for people with 5990, the ten most frequent codes are "A (freq, x%) ,B (freq, x%) etc..."   I think I need two steps:

1. Create a dataset containing observations with 5990 in DX1.  This is done already.
2. For DX2 - DX 30 (the dataset extends through 30 but I trucated to make it easier to see), determine the most frequent values across all 29 variables and across all other observations.

Many thanks for your help!  (I use 9.4, if that helps.)

 DX1 DX2 DX3 DX4 DX5 DX6 DX7 DX8 5990 2859 1302 V5869 5990 5990 2768 42789 49390 25000 2720 V5869 V5866 5990 4149 25000 4019 V5866 V5869 5990 5990 4019 5715 45621 5990 7881 4019 2720 53081 V5869 V4589 V5866 5990 42731 25000 V5869 5990 27650 V5869 53081 4019 2720 5990 78701 78791 5990 4019 2720 V5869 5990 78820 78097 29420 V1309 5990 413 4264 78607 V1254 33829 5990 40391 5856 V4511 V4573 V1051 1970 V1011
Super User
Posts: 9,599

## Re: How do I count observations across variables?

Well, the simplest way would be to normalise your data and run a proc freq on it:

```data have;
infile datalines dlm="," dsd;
input dx1 \$ dx2 \$ dx3 \$ dx4 \$ dx5 \$ dx6 \$ dx7 \$ dx8 \$;
datalines;
5990,2859,1302,V5869,,,,
5990,,,,,,,,
5990,2768,42789,49390,25000,2720,V5869,V5866
;
run;

data inter (keep=dx1 v);
set have;
array vals{7} dx2-dx8;
do i=1 to 7;
if vals{i} ne "" then do;
v=vals{i};
output;
end;
end;
run;
/* Note ouptu is basic and output window only, so modify to your needs */
proc freq data=inter;
run;```

And for future reference please post test data in the form of a datastep so we don't have to guess at your data's structure.

Super Contributor
Posts: 324

## Re: How do I count observations across variables?

"determine the most frequent values across all 29 variables and across all other observations.  "

If you show the required output for the sample data, it is possible to visualize what you want as in the above line. Then a better way can be shown.

Occasional Contributor
Posts: 6

## Re: How do I count observations across variables?

Thanks for your replies.  Below is the datastep (if that helps; I can provide more detailed data if needed) I used to create my data and also the chart I'd like to be able to populate.  DX1 is used as the initial filter to create the derrivative dataset.  From that secondary dataset (workdata.discharge_5990), I'd like to be able to determine the most frequent DX codes that appear in DX2 - DX30 for all observations, rather than just a single row.

The dataset "workdata.discharge_5990" has more than 100,000 observations, if that helps guide the approach.  It does not have a variable that functions as a unique identifier though all the observations are unique.

 Top Ten Most Common DX for Paients with DX1 of 5990 N % DX A DX B DX C DX D DX E DX F DX G DX H DX I DX J

``````data workdata.discharge_5990;
set workdata.emergency_department;
where dx1 = '5990';
run;``````

Super Contributor
Posts: 324

## Re: How do I count observations across variables?

Do the Top 10 DX refers to all DXs(amalgamated)? In other words, can DX2 to DX30 be clubbed for counting purpose?

Occasional Contributor
Posts: 6

## Re: How do I count observations across variables?

Great question.  Yes, the Top 10 DX refers to counts from all DXs combined (DX2-DX30).  They can definitely be clubbed for counting purposes.

Super User
Posts: 6,754

## Re: How do I count observations across variables?

Must "5990" appear only in DX1, not in any of the other codes?

The very first reply you received contains the right approach.  It might still need a little work, along the lines of:

• Expanding to 30 codes instead of 8
• Changing the order of the reporting rows
• Moving the results from a report to a data set

But it's a very straightforward, accurate way to approach the problem.  Did you attempt to use it?

Posts: 1,837

## Re: How do I count observations across variables?

[ Edited ]

Run next short code and check output dataset.

You may find it easy to find in what you looking for:

proc freq data=have;  /* or   data=have(where=(dx1=5990)); */

table dx1 * (dx2 - dx30);   /* try also, depending sas version:  dx1 * (dx: ) */

output  out=freq;

run;

Occasional Contributor
Posts: 6

## Re: How do I count observations across variables?

Thanks for replying.  I ran this code - seems to give a distribution of the most frequent DX2 codes across all observations (and so on for DX3-30).   I think the next step is figuring out which DX values are the mostcommon across DX2-30 (not just within each DX).  Does that make sense?

Super Contributor
Posts: 324

## Re: How do I count observations across variables?

A simple way,  but I am not sure that this will meet your ultimate goal.

Break  DX2 to DX30 as 30 single STRING variables and save it to a data set. You may ignore the missing values. Sort the data set by STRING in descending order and take first 10 rows.

Posts: 1,837

## Re: How do I count observations across variables?

Does next code helps you ?

``````data have;
infile datalines truncover;
input dx1 \$ dx2 \$ dx3 \$ dx4 \$ dx5 \$ dx6 \$ dx7 \$  dx8 \$ ;
/*DX1     DX2     DX3     DX4     DX5     DX6     DX7     DX8 */
DATALINES;
5990    2859    1302    V5869
5990
5990    2768    42789   49390   25000   2720    V5869   V5866
5990    4149    25000   4019    V5866   V5869
5990
5990    4019    5715    45621
5990    7881    4019    2720    53081   V5869   V4589   V5866
5990    42731   25000   V5869
5990    27650   V5869   53081   4019    2720
5990    78701   78791
5990    4019    2720    V5869
5990    78820   78097   29420   V1309
5990    413     4264    78607   V1254   33829
5990    40391   5856    V4511   V4573   V1051   1970    V1011
; run;

data temp;
set have;
by dx1;
array dx \$ dx2-dx8;
do i=1 to 7;
if dx(i) ne ' ' then do;
dxn = dx(i);
output;
end; end;
keep dx1 dxn;
run;
proc sql;
create table freq as
select dx1, dxn, count(dxn) as freq
from temp
group by dx1, dxn;
quit;
proc sort data=freq; by descending freq; run;proc print data=freq(obs=10); run;

``````
Discussion stats
• 10 replies
• 1051 views
• 0 likes
• 5 in conversation