BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Amihic
Fluorite | Level 6

Hi everyone!

 

I have data that looks like this:

 

ID  YA1998 YA2000 YA2002 YA2004 CA1994 CA1996 CA1998

1         M          M           14          14         M             M         15

2         12         12           M          I            13            M          M

3          M         M            M          M          M             M          M 

4          15         M           M          M          M              I            I

5          M          M          14          M         15             M           M

 

and I would like my data to look like this:

 

ID  YA1998 YA2000 YA2002 YA2004 CA1994 CA1996 CA1998  freq  MC

1         M          M           14         14         M             M         15        3      14

2         12         12           M          I            13           M          M        3      12

3          M         M            M          M          M            M          M        0       M

4          15         M           M          M          M            I            I          1       15

5          M          M           14         M          15          M           M        2       14

 

Where "freq" is the frequency of YA and CA values combined, and "MC" is the most common answer reported. Note that in the event that there is no most common answer reported (i.e. observation 5, where there are two values, 14 and 15), MC will be set equal to the first number reported.  

Note: M and I are two different types of missing. M is missing, and I is invalid missing. In the event that having M and I values create a problem for creating the code, I can change these values to "." 

 

Thanks for the help!

 

AMIHIC

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

I'm sorry, but this time (as an exception) it's too elegant to be correct.

if c > mc then mc = v{i};

This would eventually compare counts of values with values and thus it would fail in cases like this:

data have;
input ID  YA1998 YA2000 YA2002 YA2004 CA1994 CA1996 CA1998;
datalines;
6 14 14 15 15 15 15 15
;

I think a quick correction could be:

if c > maxc & v{i} then do;
    maxc = c;
    mc=v{i};
    end;

(and adding maxc to the DROP statement).*

For ID=3 this would yield mc=. rather than mc=.M, though, but maybe this is acceptable.

 

*Edit: If 0 is a valid answer, the IF condition should read: c > maxc & v{i} > .z.

View solution in original post

9 REPLIES 9
novinosrin
Tourmaline | Level 20

the below code does the freq. For the mode, you can scan, transpose and do a proc freq/proc univariate and merge it back, else you can store the counts in hash table or multi dim array and alike other techniques and the get the max of the count. Honestly feeling too lazy, but i am sure somebody will give you the mode as well. 

 

data have;
input ID  (YA1998 YA2000 YA2002 YA2004 CA1994 CA1996 CA1998) ($);
cards;
1         M          M           14          14         M             M         15
2         12         12           M          I            13            M          M
3          M         M            M          M          M             M          M 
4          15         M           M          M          M              I            I
5          M          M          14          M         15             M           M
;

data want;
set have;
array t(*) YA1998--CA1998;
temp=compress(catx(' ',of t(*)),' ','kd');
freq=countw(temp);
run;
ChrisNZ
Tourmaline | Level 20

Freq is easy. There's a function for this. Mode is trickier. Here's FREQ while I ponder the best way for mode.

data HAVE;
 input ID  YA1998 YA2000 YA2002 YA2004 CA1994 CA1996 CA1998;
cards;
1         M          M           14          14         M             M         15
2         12         12           M          I            13            M          M
3         M         M            M          M          M             M          M 
4         15         M           M          M          M              I            I
5          M          M          14          M         15             M           M
run;
data WANT;
  set HAVE;
  FREQ=n(of YA1998 -- CA1998);
run;

 

novinosrin
Tourmaline | Level 20

@ChrisNZ Sir i am not sure N function will work for char values, although one could assume M and I are just special missing values, which is probably right. 

ChrisNZ
Tourmaline | Level 20

@novinosrin  I understand from the post that M & I are (numeric) missing values.

 

@Amihic  No need to think for too long about the mode: this question has already been answered here and here (in 2 very similar answers).

 

PGStats
Opal | Level 21

For once I get to use the MISSING statement!

 

data have;
missing M I;
input ID  YA1998 YA2000 YA2002 YA2004 CA1994 CA1996 CA1998;
datalines;
1 M M 14 14 M M 15
2 12 12 M I 13 M M
3 M M M M M M M 
4 15 M M M M I I
5 M M 14 M 15 M M
;

data want;
set have;
array v YA1998 -- CA1998;
freq = n(of v{*});

do i = 1 to dim(v);
    c = 0;
    do j = i+1 to dim(v);
        if v{i} = v{j} then c = c + 1;
        end;
    if c > mc then mc = v{i};
    end;
drop i j c;
run;
PG
novinosrin
Tourmaline | Level 20

@PGStats  aka ProdigyGeniusStats aka Pierre has no peers.  You will make people cry. why are you so brilliant??. 

FreelanceReinh
Jade | Level 19

I'm sorry, but this time (as an exception) it's too elegant to be correct.

if c > mc then mc = v{i};

This would eventually compare counts of values with values and thus it would fail in cases like this:

data have;
input ID  YA1998 YA2000 YA2002 YA2004 CA1994 CA1996 CA1998;
datalines;
6 14 14 15 15 15 15 15
;

I think a quick correction could be:

if c > maxc & v{i} then do;
    maxc = c;
    mc=v{i};
    end;

(and adding maxc to the DROP statement).*

For ID=3 this would yield mc=. rather than mc=.M, though, but maybe this is acceptable.

 

*Edit: If 0 is a valid answer, the IF condition should read: c > maxc & v{i} > .z.

PGStats
Opal | Level 21

You are perfectly right @FreelanceReinh. You've got a sharp eye. Thanks.

PG

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1655 views
  • 11 likes
  • 5 in conversation