Solved: Re: How can I create a variable of reporting frequency and of the most...

Amihic · Posted 09-06-2018 04:57 PM

Hi everyone!

I have data that looks like this:

ID YA1998 YA2000 YA2002 YA2004 CA1994 CA1996 CA1998

1 M M 14 14 M M 15

2 12 12 M I 13 M M

3 M M M M M M M

4 15 M M M M I I

5 M M 14 M 15 M M

and I would like my data to look like this:

ID YA1998 YA2000 YA2002 YA2004 CA1994 CA1996 CA1998 freq MC

1 M M 14 14 M M 15 3 14

2 12 12 M I 13 M M 3 12

3 M M M M M M M 0 M

4 15 M M M M I I 1 15

5 M M 14 M 15 M M 2 14

Where "freq" is the frequency of YA and CA values combined, and "MC" is the most common answer reported. Note that in the event that there is no most common answer reported (i.e. observation 5, where there are two values, 14 and 15), MC will be set equal to the first number reported.

Note: M and I are two different types of missing. M is missing, and I is invalid missing. In the event that having M and I values create a problem for creating the code, I can change these values to "."

Thanks for the help!

AMIHIC

FreelanceReinh · Posted 09-07-2018 05:43 AM

I'm sorry, but this time (as an exception) it's too elegant to be correct.

if c > mc then mc = v{i};

This would eventually compare counts of values with values and thus it would fail in cases like this:

data have;
input ID  YA1998 YA2000 YA2002 YA2004 CA1994 CA1996 CA1998;
datalines;
6 14 14 15 15 15 15 15
;

I think a quick correction could be:

if c > maxc & v{i} then do;
    maxc = c;
    mc=v{i};
    end;

(and adding maxc to the DROP statement).*

For ID=3 this would yield mc=. rather than mc=.M, though, but maybe this is acceptable.

*Edit: If 0 is a valid answer, the IF condition should read: c > maxc & v{i} > .z.

View solution in original post

novinosrin · Posted 09-06-2018 05:35 PM

the below code does the freq. For the mode, you can scan, transpose and do a proc freq/proc univariate and merge it back, else you can store the counts in hash table or multi dim array and alike other techniques and the get the max of the count. Honestly feeling too lazy, but i am sure somebody will give you the mode as well.

data have;
input ID  (YA1998 YA2000 YA2002 YA2004 CA1994 CA1996 CA1998) ($);
cards;
1         M          M           14          14         M             M         15
2         12         12           M          I            13            M          M
3          M         M            M          M          M             M          M 
4          15         M           M          M          M              I            I
5          M          M          14          M         15             M           M
;

data want;
set have;
array t(*) YA1998--CA1998;
temp=compress(catx(' ',of t(*)),' ','kd');
freq=countw(temp);
run;

ChrisNZ · Posted 09-06-2018 05:47 PM

Freq is easy. There's a function for this. Mode is trickier. Here's FREQ while I ponder the best way for mode.

data HAVE;
 input ID  YA1998 YA2000 YA2002 YA2004 CA1994 CA1996 CA1998;
cards;
1         M          M           14          14         M             M         15
2         12         12           M          I            13            M          M
3         M         M            M          M          M             M          M 
4         15         M           M          M          M              I            I
5          M          M          14          M         15             M           M
run;
data WANT;
  set HAVE;
  FREQ=n(of YA1998 -- CA1998);
run;

High-Performance SAS Coding - Third Edition

novinosrin · Posted 09-06-2018 05:50 PM

@ChrisNZ Sir i am not sure N function will work for char values, although one could assume M and I are just special missing values, which is probably right.

ChrisNZ · Posted 09-06-2018 05:52 PM

@novinosrin I understand from the post that M & I are (numeric) missing values.

@Amihic No need to think for too long about the mode: this question has already been answered here and here (in 2 very similar answers).

High-Performance SAS Coding - Third Edition

PGStats · Posted 09-06-2018 06:09 PM

For once I get to use the MISSING statement!

data have;
missing M I;
input ID  YA1998 YA2000 YA2002 YA2004 CA1994 CA1996 CA1998;
datalines;
1 M M 14 14 M M 15
2 12 12 M I 13 M M
3 M M M M M M M 
4 15 M M M M I I
5 M M 14 M 15 M M
;

data want;
set have;
array v YA1998 -- CA1998;
freq = n(of v{*});

do i = 1 to dim(v);
    c = 0;
    do j = i+1 to dim(v);
        if v{i} = v{j} then c = c + 1;
        end;
    if c > mc then mc = v{i};
    end;
drop i j c;
run;

PG

novinosrin · Posted 09-06-2018 06:47 PM

@PGStats aka ProdigyGeniusStats aka Pierre has no peers. You will make people cry. why are you so brilliant??.

PGStats · Posted 09-06-2018 10:32 PM

PG

FreelanceReinh · Posted 09-07-2018 05:43 AM

I'm sorry, but this time (as an exception) it's too elegant to be correct.

if c > mc then mc = v{i};

This would eventually compare counts of values with values and thus it would fail in cases like this:

data have;
input ID  YA1998 YA2000 YA2002 YA2004 CA1994 CA1996 CA1998;
datalines;
6 14 14 15 15 15 15 15
;

I think a quick correction could be:

if c > maxc & v{i} then do;
    maxc = c;
    mc=v{i};
    end;

(and adding maxc to the DROP statement).*

For ID=3 this would yield mc=. rather than mc=.M, though, but maybe this is acceptable.

*Edit: If 0 is a valid answer, the IF condition should read: c > maxc & v{i} > .z.

PGStats · Posted 09-07-2018 12:00 PM

You are perfectly right @FreelanceReinh. You've got a sharp eye. Thanks.

PG

How can I create a variable of reporting frequency and of the most common answer?

Re: How can I create a variable of reporting frequency and of the most common answer?

Re: How can I create a variable of reporting frequency and of the most common answer?

Re: How can I create a variable of reporting frequency and of the most common answer?

Re: How can I create a variable of reporting frequency and of the most common answer?

Re: How can I create a variable of reporting frequency and of the most common answer?

Re: How can I create a variable of reporting frequency and of the most common answer?

Re: How can I create a variable of reporting frequency and of the most common answer?

Re: How can I create a variable of reporting frequency and of the most common answer?

Re: How can I create a variable of reporting frequency and of the most common answer?

Re: How can I create a variable of reporting frequency and of the most common answer?

Classroom Training Available!