Solved: Sumarizing Frequencies

DMP · Posted 02-26-2013 09:11 AM

I have 4 variables that have the same values array. I need to count the sum of frequencies of certain values in all 4 variables when all variables are equal "a".

For example:

I have variables A, B, C, D and they can have a,b,c,d,e,f values. I need to get the sum of frequencies "b", "c", "d", "f" over the A, B, C, D when A="a" and B="a" and C="a" and D="a".

Can anybody help?

Thank you.

Reeza · Posted 02-26-2013 12:05 PM

Arthur solution is the correct methodology.

Step 1 - Identify patients who have alcohol use (flag=1 for alcohol use, flag=0 for no alcohol use)

Step 2 - Transpose data (proc transpose)

Step 3 - Run a proc freq on the transpose data- diagnosis by flag ( should probably also run a chi-sq to check if its different between patients who don't have alcohol abuse).

View solution in original post

art297 · Posted 02-26-2013 09:23 AM

Do you want to create a variable or just do an analysis? In either case, create a format and apply it to all of the variables of concern.

DMP · Posted 02-26-2013 09:35 AM

I just wan to do the analysis. What to you mean by creating a format and apply it to all vb?

art297 · Posted 02-26-2013 09:42 AM

Take a look at: Base SAS(R) 9.2 Procedures Guide

You would just use 'a'=1

other=0

in creating the format, then apply it like they do in the example.

DMP · Posted 02-26-2013 10:07 AM

Sorry, but this does not do what I need...I need the sum of frequencies separately for "b", ,"c"...

I mean I need:

A="b"+ B="b"+ C="b" + D="b"

and

A="c"+ B="c"+ C="c" + D="c"

and

A="d"+ B="d"+ C="d" + D="d"

separately,

and sorry about my first statement it is actually when any of the variables A or B or C or D is ="a" (not all at the same time like I said above)

thank you so much!

art297 · Posted 02-26-2013 10:50 AM

Would something like the following do what you want?:

data have (drop=all);

input (a b c d) ($);

all=catt(a,b,c,d);

count_a=count(all,'a','i');

count_b=count(all,'b','i');

count_c=count(all,'c','i');

count_d=count(all,'d','i');

cards;

a a b a

b b b b

c b c a

d a b b

;

DMP · Posted 02-26-2013 11:43 AM

Tried and it doesn't work.

Maybe if I explain what exactly I am trying to calculate: I have data that contains medical records on patients and they have main diagnostic + 3 other diagnostics. these are my A, B, C, D variables. they have values from ICD-10. I want to find out how many cases of each of the major condition from ICD-10 happens ( frequencies for the b,c,d,e,....up to 100 conditions) when people have as one of the diagnostic alcohol use. I mean, I need to calculate the frequencies for the comorbidities of alcohol use, by geographic region.

I used in the past something very basic but it is for one condition at the time and that will imply re-running the same code for each comorbidity of the each medical condition of interest (couple of hundred times).

art297 · Posted 02-26-2013 12:00 PM

: Then couldn't you use PROC TRANSPOSE to make the file tall (i.e., one record for each of the four codes, with all of the codes in the same column) and then just do a simple frequency distribution?

Reeza · Posted 02-26-2013 12:05 PM

Arthur solution is the correct methodology.

Step 1 - Identify patients who have alcohol use (flag=1 for alcohol use, flag=0 for no alcohol use)

Step 2 - Transpose data (proc transpose)

Step 3 - Run a proc freq on the transpose data- diagnosis by flag ( should probably also run a chi-sq to check if its different between patients who don't have alcohol abuse).

DMP · Posted 02-26-2013 12:16 PM

i think will work this way...just need to do a little data prep b4.

thank you!

SASJedi · Posted 02-26-2013 11:47 AM

Here is one approach:

/* Just making some test data */
data have ;
  input (a b c d) ($);
  cards;
a a a a
a b c d
b b b b
c b c a
c c c c
d b b b
c c c c
;
/* Create an informat for the patterns you want to count */
proc format;
  invalue $allonechar 
        'aaaa'="a"
        'bbbb'="b"
        'cccc'="c"
        'dddd'="d"
        'eeee'="e"
        'ffff'="f"
        other="*" 
  ;
run;
/* Use the informat in PROC SQL to get your counts in a single pass */
proc sql;
  select INPUT(CATT(a,b,c,d),$allonechar.) "All Characters" as AllChar
       , Count(*) as Count
    from have
    group by AllChar
    having AllChar <> "*"
    order by AllChar
    ;
quit;

Output:

The SAS System

A	1
B	1
C	2

Check out my Jedi SAS Tricks for SAS Users

Sumarizing Frequencies

Re: Sumarizing Frequencies

Re: Sumarizing Frequencies

Re: Sumarizing Frequencies

Re: Sumarizing Frequencies

Re: Sumarizing Frequencies

Re: Sumarizing Frequencies

Re: Sumarizing Frequencies

Re: Sumarizing Frequencies

Re: Sumarizing Frequencies

Re: Sumarizing Frequencies

Re: Sumarizing Frequencies

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away