BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SAS49
Obsidian | Level 7

Hi all,

 

I am very new and have no idea where to start on this question.  I have a data set a sample of which I have placed below.  It has two columns with chromosomes and genetic positions, and then each following column belongs to a different ID.  For all columns after the first two, the desired values are 0, 1, or 2.  However, I have missing data.  How do I impute the missing values with the most common value in that row (i.e., for the first row, the missing value would be changed to 0, and for the second row, it would be changed to two, and so on).  Thanks for any guidance anyone can provide.  I am working in SAS 9.4.

 

ChromPosID1ID2ID3ID4ID5ID6ID7ID8
1845600000.11
19948002.1222
110058111111.1
265350.000011
289671101110.
21023411100111
3102300120000
39978120.2222
31503200011211

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

If you are going to use built-in SAS functions, you could use PROC MEANS/SUMMARY or PROC UNIVARIATE to get the mode from a column of data. Or you can write your own function to get the mode from a row (since I don't think SAS provides a way to do this from a row).

 

I would do it via PROC SUMMARY, after changing the data to long format via PROC TRANSPOSE.

 

UNTESTED CODE

 

proc transpose data=have out=have_t;
    by pos notsorted;
    var id1-id8;
run;
proc summary data=have_t nway;
    class pos;
    var col1;
    output out=_mode_ mode=mode_value;
run;

 

From there you can merge the mode into the original data set and replace missings with the mode

 

proc sort data=have;
    by pos;
run;
data want;
    merge have _mode_;
    by pos;
    array id id1-id8;
    do i=1 to dim(id);
        if missing(id(i)) then id(i)=mode_value;
    end;
    drop i;
run;

 

If you want tested code, please provide the original data as SAS data step code, which you can type in yourself, or via these instructions.

 

 

 

--
Paige Miller

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

If you are going to use built-in SAS functions, you could use PROC MEANS/SUMMARY or PROC UNIVARIATE to get the mode from a column of data. Or you can write your own function to get the mode from a row (since I don't think SAS provides a way to do this from a row).

 

I would do it via PROC SUMMARY, after changing the data to long format via PROC TRANSPOSE.

 

UNTESTED CODE

 

proc transpose data=have out=have_t;
    by pos notsorted;
    var id1-id8;
run;
proc summary data=have_t nway;
    class pos;
    var col1;
    output out=_mode_ mode=mode_value;
run;

 

From there you can merge the mode into the original data set and replace missings with the mode

 

proc sort data=have;
    by pos;
run;
data want;
    merge have _mode_;
    by pos;
    array id id1-id8;
    do i=1 to dim(id);
        if missing(id(i)) then id(i)=mode_value;
    end;
    drop i;
run;

 

If you want tested code, please provide the original data as SAS data step code, which you can type in yourself, or via these instructions.

 

 

 

--
Paige Miller
average_joe
Obsidian | Level 7

While @PaigeMiller 's solution is direct and easy to understand, here's another option that uses a single data step.

 

data have;
infile cards dlm='09'x;
input Chrom	Pos	ID1	ID2	ID3	ID4	ID5	ID6	ID7	ID8;
cards;
1	8456	0	0	0	0	0	.	1	1
1	9948	0	0	2	.	1	2	2	2
1	10058	1	1	1	1	1	1	.	1
2	6535	0	.	0	0	0	0	1	1
2	8967	1	1	0	1	1	1	0	.
2	10234	1	1	1	0	0	1	1	1
3	1023	0	0	1	2	0	0	0	0
3	9978	1	2	0	.	2	2	2	2
3	15032	0	0	0	1	1	2	1	1
;
run;


data want;
set have;

array freqs(0:2) _temporary_;  * to capture freq of each value 0-2;
array ids(8) id1-id8;

* calc freqs of id values;
do _i = 1 to dim(ids);
    if not missing(ids(_i)) then freqs(ids(_i)) + 1; 
end;
_mode = whichn(max(of freqs(*)), of freqs(*)) - 1; * mode is the index with highest freq;
put _mode= freqs(0)=  freqs(1)=  freqs(2)=;

* set missing value(s) to the mode;
do _i = 1 to dim(ids);
    if missing(ids(_i)) then ids(_i) = _mode; 
end;

call missing(of freqs(*));
drop _:;
run;
PaigeMiller
Diamond | Level 26

Nice job, @average_joe 

 

My reluctance to write my own code to compute the mode is that SAS has already done so, and SAS has debugged it carefully and it is proven in dozens (or maybe millions) of real world applications. So my general advice, especially to less experienced programmers like @SAS49, is to use those built-in SAS functions as much as possible. We have seen, here in the SAS communities, people write their own version of code for simple statistics, and get it wrong. I have even seen people insist on writing their own calculations of the mean, and get it wrong.

 

This applies less to more experienced programmers (such as @average_joe ) and to anyone who knows how to use the WHICHN function. But even though I would say this advice doesn't apply to me, I still try to follow this advice as much as I can. And so you won't find me writing my own data step code for mode or average or kurtosis or any similar statistics. Plus, writing my own code for these statistics takes a lot more time than using the built-in functions, and it requires me to spend time to convince myself that it is providing the right answers in the use cases that I have, and to debug and fix it if needed.

--
Paige Miller
Ksharp
Super User

joe's code is not in general .If there are some value other than 0 1 2, or ID1 ID2 ...... all are character variables ?

If you have SAS/IML .

 

data have;
infile cards expandtabs;
input Chrom	Pos	ID1	ID2	ID3	ID4	ID5	ID6	ID7	ID8;
cards;
1	8456	0	0	0	0	0	.	1	1
1	9948	0	0	2	.	1	2	2	2
1	10058	1	1	1	1	1	1	.	1
2	6535	0	.	0	0	0	0	1	1
2	8967	1	1	0	1	1	1	0	.
2	10234	1	1	1	0	0	1	1	1
3	1023	0	0	1	2	0	0	0	0
3	9978	1	2	0	.	2	2	2	2
3	15032	0	0	0	1	1	2	1	1
;
run;
proc iml;
use have(keep=ID:);
read all var _all_ into x[c=vname]; 
close;
do i=1 to nrow(x);
 row=x[i,];
 call tabulate(level,freq,row); 
 if countmiss(row) then do;
  idx=loc(row=.); 
  row[idx]=level[freq[<:>]];
   x[i,]=row;
 end;
end;
create temp from x[c=vname];
append from x;
close;
quit;
data want;
 merge have(drop=ID:) temp;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1076 views
  • 1 like
  • 4 in conversation