How to count frequencies between multiple variables

Accepted Solution Solved
Reply
Occasional Contributor LLW
Occasional Contributor
Posts: 15
Accepted Solution

How to count frequencies between multiple variables

Hi everyone,

I am trying to generate a table of total count of values across multiple values (top 100 most frequent values).  It will be a count of a couple of multiple variables. For example, this could be an example of the data:

IDDx1Dx2Dx3Dx4Dx5Dx6
1ABCD
2BAD
3CDA
4EF AD
5GBCDEF

I would like to tally up all values between Dx1 - Dx6, and I would like to get the following frequency table:

DxFrequency %
D525
A 420
B315
C315
F210
E210
G15

Thank you in advance!

LW


Accepted Solutions
Solution
‎07-07-2015 02:25 AM
Super Contributor
Posts: 336

Re: How to count frequencies between multiple variables

You could also try:

data wide;
infile datalines missover ;
input ID  Dx1 $ Dx2 $ Dx3 $ Dx4 $ Dx5 $ Dx6 $;
datalines;
1 A B C D
2 B A D
3 C D A
4 E F A D
5 G B C D E F
;
run;

Data Want (Keep=ID Dx);
  Set wide;
  Array DA Dx:;
  Do over DA;
    Dx=DA;
If not Missing (Dx);
Output;
  End;
Run;

Proc Freq Data=Want;
Tables Dx/NoPrint Out=Want;
Run;

View solution in original post


All Replies
Regular Contributor
Posts: 213

Re: How to count frequencies between multiple variables

perhaps you could try this?

data wide;

infile datalines missover ;

input ID  Dx1 $ Dx2 $ Dx3 $ Dx4 $ Dx5 $ Dx6 $;

datalines;

1 A B C D

2 B A D

3 C D A

4 E F A D

5 G B C D E F

;

run;

DATA long;

  SET wide; 

  ARRAY adx{*} dx1-dx6 ;

  DO i = 1 to 6 ;

  if missing(adx(i)) then leave;

  else do;

  dx = adx(i);

  OUTPUT;

  END;

  END;

  DROP dx1 - dx6 ;

RUN;

proc freq data=long; tables dx; run;

Esteemed Advisor
Esteemed Advisor
Posts: 7,222

Re: How to count frequencies between multiple variables

An alternative to the array:

proc transpose data=have out=inter;

     var dx1-dx6;

run;

This would give you a long list of the dx variables, you can then use freq or sql counts to get the required output table.

Occasional Contributor LLW
Occasional Contributor
Posts: 15

Re: How to count frequencies between multiple variables

Hello to both of you,

Thank you so much for your help. Is there a way to do it without having to transpose? This is a play set, but my actual dataset is over 300,000 with over 21 columns per entry to transpose.

Thank you so much!

Grand Advisor
Posts: 17,360

Re: How to count frequencies between multiple variables

I think you'll find that many of the diagnosis will be empty so transposing isn't actually much of an issue. SAS should process 1 million rows in a few seconds on a crappy system.

Here's a way to do it with proc freq. Run proc freq with all variables and capture that table to run a proc freq on again. The final table should be what you're after.

data wide;

infile datalines missover ;

input ID  Dx1 $ Dx2 $ Dx3 $ Dx4 $ Dx5 $ Dx6 $;

datalines;

1 A B C D

2 B A D

3 C D A

4 E F A D

5 G B C D E F

;

run;

ods table onewayfreqs=summary1;

proc freq data=wide;

table dx:;

run;

data summary2;

set summary1;

diagnosis=coalescec(of dx1-dx6);

keep diagnosis frequency;

run;

proc freq data=summary2 order=freq;

table diagnosis/out=want;

weight frequency;

run;

proc print data=want;

run;

Solution
‎07-07-2015 02:25 AM
Super Contributor
Posts: 336

Re: How to count frequencies between multiple variables

You could also try:

data wide;
infile datalines missover ;
input ID  Dx1 $ Dx2 $ Dx3 $ Dx4 $ Dx5 $ Dx6 $;
datalines;
1 A B C D
2 B A D
3 C D A
4 E F A D
5 G B C D E F
;
run;

Data Want (Keep=ID Dx);
  Set wide;
  Array DA Dx:;
  Do over DA;
    Dx=DA;
If not Missing (Dx);
Output;
  End;
Run;

Proc Freq Data=Want;
Tables Dx/NoPrint Out=Want;
Run;

Esteemed Advisor
Esteemed Advisor
Posts: 7,222

Re: How to count frequencies between multiple variables

As mentioned above, 300,00 records is nothing.  Also factor into any approach you take what you want to do with the data later.  For instance if you want to do any further summarizing, expanding of the data etc. as the normalised form is generally easier to work with.

Occasional Contributor LLW
Occasional Contributor
Posts: 15

Re: How to count frequencies between multiple variables

Thank you so much to everyone for your help and for the different ways to solve this problem. Thank you again!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 718 views
  • 8 likes
  • 5 in conversation