BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LLW
Fluorite | Level 6 LLW
Fluorite | Level 6

Hi everyone,

I am trying to generate a table of total count of values across multiple values (top 100 most frequent values).  It will be a count of a couple of multiple variables. For example, this could be an example of the data:

IDDx1Dx2Dx3Dx4Dx5Dx6
1ABCD
2BAD
3CDA
4EF AD
5GBCDEF

I would like to tally up all values between Dx1 - Dx6, and I would like to get the following frequency table:

DxFrequency %
D525
A 420
B315
C315
F210
E210
G15

Thank you in advance!

LW

1 ACCEPTED SOLUTION

Accepted Solutions
user24feb
Barite | Level 11

You could also try:

data wide;
infile datalines missover ;
input ID  Dx1 $ Dx2 $ Dx3 $ Dx4 $ Dx5 $ Dx6 $;
datalines;
1 A B C D
2 B A D
3 C D A
4 E F A D
5 G B C D E F
;
run;

Data Want (Keep=ID Dx);
  Set wide;
  Array DA Dx:;
  Do over DA;
    Dx=DA;
If not Missing (Dx);
Output;
  End;
Run;

Proc Freq Data=Want;
Tables Dx/NoPrint Out=Want;
Run;

View solution in original post

7 REPLIES 7
Miracle
Barite | Level 11

perhaps you could try this?

data wide;

infile datalines missover ;

input ID  Dx1 $ Dx2 $ Dx3 $ Dx4 $ Dx5 $ Dx6 $;

datalines;

1 A B C D

2 B A D

3 C D A

4 E F A D

5 G B C D E F

;

run;

DATA long;

  SET wide; 

  ARRAY adx{*} dx1-dx6 ;

  DO i = 1 to 6 ;

  if missing(adx(i)) then leave;

  else do;

  dx = adx(i);

  OUTPUT;

  END;

  END;

  DROP dx1 - dx6 ;

RUN;

proc freq data=long; tables dx; run;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

An alternative to the array:

proc transpose data=have out=inter;

     var dx1-dx6;

run;

This would give you a long list of the dx variables, you can then use freq or sql counts to get the required output table.

LLW
Fluorite | Level 6 LLW
Fluorite | Level 6

Hello to both of you,

Thank you so much for your help. Is there a way to do it without having to transpose? This is a play set, but my actual dataset is over 300,000 with over 21 columns per entry to transpose.

Thank you so much!

Reeza
Super User

I think you'll find that many of the diagnosis will be empty so transposing isn't actually much of an issue. SAS should process 1 million rows in a few seconds on a crappy system.

Here's a way to do it with proc freq. Run proc freq with all variables and capture that table to run a proc freq on again. The final table should be what you're after.

data wide;

infile datalines missover ;

input ID  Dx1 $ Dx2 $ Dx3 $ Dx4 $ Dx5 $ Dx6 $;

datalines;

1 A B C D

2 B A D

3 C D A

4 E F A D

5 G B C D E F

;

run;

ods table onewayfreqs=summary1;

proc freq data=wide;

table dx:;

run;

data summary2;

set summary1;

diagnosis=coalescec(of dx1-dx6);

keep diagnosis frequency;

run;

proc freq data=summary2 order=freq;

table diagnosis/out=want;

weight frequency;

run;

proc print data=want;

run;

user24feb
Barite | Level 11

You could also try:

data wide;
infile datalines missover ;
input ID  Dx1 $ Dx2 $ Dx3 $ Dx4 $ Dx5 $ Dx6 $;
datalines;
1 A B C D
2 B A D
3 C D A
4 E F A D
5 G B C D E F
;
run;

Data Want (Keep=ID Dx);
  Set wide;
  Array DA Dx:;
  Do over DA;
    Dx=DA;
If not Missing (Dx);
Output;
  End;
Run;

Proc Freq Data=Want;
Tables Dx/NoPrint Out=Want;
Run;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

As mentioned above, 300,00 records is nothing.  Also factor into any approach you take what you want to do with the data later.  For instance if you want to do any further summarizing, expanding of the data etc. as the normalised form is generally easier to work with.

LLW
Fluorite | Level 6 LLW
Fluorite | Level 6

Thank you so much to everyone for your help and for the different ways to solve this problem. Thank you again!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 5253 views
  • 8 likes
  • 5 in conversation