BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RedBishop
Fluorite | Level 6

Dear community,

 

I was hoping for a bit of help with a simple question. I have a dataset with two columns such as this:

Column1 Column2

A 123

A .

A 123

A 456

B .

B 789

 

All I'm looking for is to fill in the missings of Column2 with the mode by Column 1, like this:

Column1 Column2

A 123

A 123

A 123

A 456

B 789

B 789

 

I know that it's possible with a select distinct and left join in proc sql, but since I am an R user, I'd highly appreciate a loop solution.

In R, one solution is this:

for(X in unique(Data[,"Column1"]))

{

   Data[Data[,"Column1"]==X&is.na(Data[,"Column2"]),"Column2"] <- which.max(table(Data[Data[,"Column1"]==X,"Column2"]))

}

 

Surely there's some equivalent in SAS?

Thanks a lot!

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Try this

 

data have;
input Column1 $ Column2;
datalines;
A 123
A .
A 123
A 456
B .
B 789
;

data want (drop=c);
   do _N_=1 by 1 until (last.Column1);
      set have;
      by Column1;
      c = max(c, Column2);
   end;
   do _N_=1 to _N_;
      set have;
      Column2 = c;
      output;
   end;
run;

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

Please explain the algorithm that replaces the missing A with 123 and the missing B with 789. What if the missing A was underneath a 123 and above a 456, what would the algorithm do in that case?

--
Paige Miller
RedBishop
Fluorite | Level 6
It takes the mode (most common observation) of Column2, grouped by Column1.
PaigeMiller
Diamond | Level 26

Most common observation is the mode, but if there are ties (for example two records with 123 and two records with 456) SAS will pick one arbitrarily.

 

proc summary data=have;
    class column1;
   var columns;
    output out=mode mode=mode;
run;
data want;
    merge have mode;
    if missing(column2) then column2=mode;
run;

Adding: I see you have marked an answer as correct, but it does not seem to compute the "most common observation". So which is it?

--
Paige Miller
RedBishop
Fluorite | Level 6
The answer of draycut uses "max" instead of "mode", but correct enough for me.
PeterClemmensen
Tourmaline | Level 20

Try this

 

data have;
input Column1 $ Column2;
datalines;
A 123
A .
A 123
A 456
B .
B 789
;

data want (drop=c);
   do _N_=1 by 1 until (last.Column1);
      set have;
      by Column1;
      c = max(c, Column2);
   end;
   do _N_=1 to _N_;
      set have;
      Column2 = c;
      output;
   end;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 770 views
  • 2 likes
  • 3 in conversation