Hi all,
I am very new and have no idea where to start on this question. I have a data set a sample of which I have placed below. It has two columns with chromosomes and genetic positions, and then each following column belongs to a different ID. For all columns after the first two, the desired values are 0, 1, or 2. However, I have missing data. How do I impute the missing values with the most common value in that row (i.e., for the first row, the missing value would be changed to 0, and for the second row, it would be changed to two, and so on). Thanks for any guidance anyone can provide. I am working in SAS 9.4.
Chrom | Pos | ID1 | ID2 | ID3 | ID4 | ID5 | ID6 | ID7 | ID8 |
1 | 8456 | 0 | 0 | 0 | 0 | 0 | . | 1 | 1 |
1 | 9948 | 0 | 0 | 2 | . | 1 | 2 | 2 | 2 |
1 | 10058 | 1 | 1 | 1 | 1 | 1 | 1 | . | 1 |
2 | 6535 | 0 | . | 0 | 0 | 0 | 0 | 1 | 1 |
2 | 8967 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | . |
2 | 10234 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 |
3 | 1023 | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 |
3 | 9978 | 1 | 2 | 0 | . | 2 | 2 | 2 | 2 |
3 | 15032 | 0 | 0 | 0 | 1 | 1 | 2 | 1 | 1 |
If you are going to use built-in SAS functions, you could use PROC MEANS/SUMMARY or PROC UNIVARIATE to get the mode from a column of data. Or you can write your own function to get the mode from a row (since I don't think SAS provides a way to do this from a row).
I would do it via PROC SUMMARY, after changing the data to long format via PROC TRANSPOSE.
UNTESTED CODE
proc transpose data=have out=have_t;
by pos notsorted;
var id1-id8;
run;
proc summary data=have_t nway;
class pos;
var col1;
output out=_mode_ mode=mode_value;
run;
From there you can merge the mode into the original data set and replace missings with the mode
proc sort data=have;
by pos;
run;
data want;
merge have _mode_;
by pos;
array id id1-id8;
do i=1 to dim(id);
if missing(id(i)) then id(i)=mode_value;
end;
drop i;
run;
If you want tested code, please provide the original data as SAS data step code, which you can type in yourself, or via these instructions.
If you are going to use built-in SAS functions, you could use PROC MEANS/SUMMARY or PROC UNIVARIATE to get the mode from a column of data. Or you can write your own function to get the mode from a row (since I don't think SAS provides a way to do this from a row).
I would do it via PROC SUMMARY, after changing the data to long format via PROC TRANSPOSE.
UNTESTED CODE
proc transpose data=have out=have_t;
by pos notsorted;
var id1-id8;
run;
proc summary data=have_t nway;
class pos;
var col1;
output out=_mode_ mode=mode_value;
run;
From there you can merge the mode into the original data set and replace missings with the mode
proc sort data=have;
by pos;
run;
data want;
merge have _mode_;
by pos;
array id id1-id8;
do i=1 to dim(id);
if missing(id(i)) then id(i)=mode_value;
end;
drop i;
run;
If you want tested code, please provide the original data as SAS data step code, which you can type in yourself, or via these instructions.
While @PaigeMiller 's solution is direct and easy to understand, here's another option that uses a single data step.
data have;
infile cards dlm='09'x;
input Chrom Pos ID1 ID2 ID3 ID4 ID5 ID6 ID7 ID8;
cards;
1 8456 0 0 0 0 0 . 1 1
1 9948 0 0 2 . 1 2 2 2
1 10058 1 1 1 1 1 1 . 1
2 6535 0 . 0 0 0 0 1 1
2 8967 1 1 0 1 1 1 0 .
2 10234 1 1 1 0 0 1 1 1
3 1023 0 0 1 2 0 0 0 0
3 9978 1 2 0 . 2 2 2 2
3 15032 0 0 0 1 1 2 1 1
;
run;
data want;
set have;
array freqs(0:2) _temporary_; * to capture freq of each value 0-2;
array ids(8) id1-id8;
* calc freqs of id values;
do _i = 1 to dim(ids);
if not missing(ids(_i)) then freqs(ids(_i)) + 1;
end;
_mode = whichn(max(of freqs(*)), of freqs(*)) - 1; * mode is the index with highest freq;
put _mode= freqs(0)= freqs(1)= freqs(2)=;
* set missing value(s) to the mode;
do _i = 1 to dim(ids);
if missing(ids(_i)) then ids(_i) = _mode;
end;
call missing(of freqs(*));
drop _:;
run;
Nice job, @average_joe
My reluctance to write my own code to compute the mode is that SAS has already done so, and SAS has debugged it carefully and it is proven in dozens (or maybe millions) of real world applications. So my general advice, especially to less experienced programmers like @SAS49, is to use those built-in SAS functions as much as possible. We have seen, here in the SAS communities, people write their own version of code for simple statistics, and get it wrong. I have even seen people insist on writing their own calculations of the mean, and get it wrong.
This applies less to more experienced programmers (such as @average_joe ) and to anyone who knows how to use the WHICHN function. But even though I would say this advice doesn't apply to me, I still try to follow this advice as much as I can. And so you won't find me writing my own data step code for mode or average or kurtosis or any similar statistics. Plus, writing my own code for these statistics takes a lot more time than using the built-in functions, and it requires me to spend time to convince myself that it is providing the right answers in the use cases that I have, and to debug and fix it if needed.
joe's code is not in general .If there are some value other than 0 1 2, or ID1 ID2 ...... all are character variables ?
If you have SAS/IML .
data have;
infile cards expandtabs;
input Chrom Pos ID1 ID2 ID3 ID4 ID5 ID6 ID7 ID8;
cards;
1 8456 0 0 0 0 0 . 1 1
1 9948 0 0 2 . 1 2 2 2
1 10058 1 1 1 1 1 1 . 1
2 6535 0 . 0 0 0 0 1 1
2 8967 1 1 0 1 1 1 0 .
2 10234 1 1 1 0 0 1 1 1
3 1023 0 0 1 2 0 0 0 0
3 9978 1 2 0 . 2 2 2 2
3 15032 0 0 0 1 1 2 1 1
;
run;
proc iml;
use have(keep=ID:);
read all var _all_ into x[c=vname];
close;
do i=1 to nrow(x);
row=x[i,];
call tabulate(level,freq,row);
if countmiss(row) then do;
idx=loc(row=.);
row[idx]=level[freq[<:>]];
x[i,]=row;
end;
end;
create temp from x[c=vname];
append from x;
close;
quit;
data want;
merge have(drop=ID:) temp;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.