Hi everyone!
I have data that looks like this:
ID YA1998 YA2000 YA2002 YA2004 CA1994 CA1996 CA1998
1 M M 14 14 M M 15
2 12 12 M I 13 M M
3 M M M M M M M
4 15 M M M M I I
5 M M 14 M 15 M M
and I would like my data to look like this:
ID YA1998 YA2000 YA2002 YA2004 CA1994 CA1996 CA1998 freq MC
1 M M 14 14 M M 15 3 14
2 12 12 M I 13 M M 3 12
3 M M M M M M M 0 M
4 15 M M M M I I 1 15
5 M M 14 M 15 M M 2 14
Where "freq" is the frequency of YA and CA values combined, and "MC" is the most common answer reported. Note that in the event that there is no most common answer reported (i.e. observation 5, where there are two values, 14 and 15), MC will be set equal to the first number reported.
Note: M and I are two different types of missing. M is missing, and I is invalid missing. In the event that having M and I values create a problem for creating the code, I can change these values to "."
Thanks for the help!
AMIHIC
I'm sorry, but this time (as an exception) it's too elegant to be correct.
if c > mc then mc = v{i};
This would eventually compare counts of values with values and thus it would fail in cases like this:
data have;
input ID YA1998 YA2000 YA2002 YA2004 CA1994 CA1996 CA1998;
datalines;
6 14 14 15 15 15 15 15
;
I think a quick correction could be:
if c > maxc & v{i} then do;
maxc = c;
mc=v{i};
end;
(and adding maxc to the DROP statement).*
For ID=3 this would yield mc=. rather than mc=.M, though, but maybe this is acceptable.
*Edit: If 0 is a valid answer, the IF condition should read: c > maxc & v{i} > .z.
the below code does the freq. For the mode, you can scan, transpose and do a proc freq/proc univariate and merge it back, else you can store the counts in hash table or multi dim array and alike other techniques and the get the max of the count. Honestly feeling too lazy, but i am sure somebody will give you the mode as well.
data have;
input ID (YA1998 YA2000 YA2002 YA2004 CA1994 CA1996 CA1998) ($);
cards;
1 M M 14 14 M M 15
2 12 12 M I 13 M M
3 M M M M M M M
4 15 M M M M I I
5 M M 14 M 15 M M
;
data want;
set have;
array t(*) YA1998--CA1998;
temp=compress(catx(' ',of t(*)),' ','kd');
freq=countw(temp);
run;
Freq is easy. There's a function for this. Mode is trickier. Here's FREQ while I ponder the best way for mode.
data HAVE;
input ID YA1998 YA2000 YA2002 YA2004 CA1994 CA1996 CA1998;
cards;
1 M M 14 14 M M 15
2 12 12 M I 13 M M
3 M M M M M M M
4 15 M M M M I I
5 M M 14 M 15 M M
run;
data WANT;
set HAVE;
FREQ=n(of YA1998 -- CA1998);
run;
@ChrisNZ Sir i am not sure N function will work for char values, although one could assume M and I are just special missing values, which is probably right.
@novinosrin I understand from the post that M & I are (numeric) missing values.
@Amihic No need to think for too long about the mode: this question has already been answered here and here (in 2 very similar answers).
For once I get to use the MISSING statement!
data have;
missing M I;
input ID YA1998 YA2000 YA2002 YA2004 CA1994 CA1996 CA1998;
datalines;
1 M M 14 14 M M 15
2 12 12 M I 13 M M
3 M M M M M M M
4 15 M M M M I I
5 M M 14 M 15 M M
;
data want;
set have;
array v YA1998 -- CA1998;
freq = n(of v{*});
do i = 1 to dim(v);
c = 0;
do j = i+1 to dim(v);
if v{i} = v{j} then c = c + 1;
end;
if c > mc then mc = v{i};
end;
drop i j c;
run;
@PGStats aka ProdigyGeniusStats aka Pierre has no peers. You will make people cry. why are you so brilliant??.
I'm sorry, but this time (as an exception) it's too elegant to be correct.
if c > mc then mc = v{i};
This would eventually compare counts of values with values and thus it would fail in cases like this:
data have;
input ID YA1998 YA2000 YA2002 YA2004 CA1994 CA1996 CA1998;
datalines;
6 14 14 15 15 15 15 15
;
I think a quick correction could be:
if c > maxc & v{i} then do;
maxc = c;
mc=v{i};
end;
(and adding maxc to the DROP statement).*
For ID=3 this would yield mc=. rather than mc=.M, though, but maybe this is acceptable.
*Edit: If 0 is a valid answer, the IF condition should read: c > maxc & v{i} > .z.
You are perfectly right @FreelanceReinh. You've got a sharp eye. Thanks.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.