BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
m1986MM
Obsidian | Level 7

Hello everyone,

In the following table I want to fill the missing value of MSA with non-missing values of those observations whose CBSA values are equal, so that the column MSA_filled be created. I tried retain but it didn't work. Do I need to use coalesce?

I appreciate your suggestions.

CBSAMSAMSA_filled
111.

45

1114545
111.45
111.45
2226767
222.67
3338989
333.89
3338989
5558888
5558888
555.88
555.88
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

How about :

data have;
input CBSA     MSA      ;
cards;
111     .     
111     45     
111     .     
111     .     
222     67     
222     .     
333     89     
333     .     
333     89     
555     88     
555     88
555     .
555     .
;
run;
data want; 
retain MSA_filled _CBSA;
 do until(not missing(MSA) or last.CBSA);
  set have;
  by CBSA;
 end;
 if CBSA ne _CBSA then call missing(MSA_filled);
 if not missing(MSA) then MSA_filled=MSA ;
 do until(not missing(MSA) or last.CBSA);
  set have;
  by CBSA;
  output;
 end;
_CBSA=CBSA;
drop _CBSA;
run;

Xia Keshan

View solution in original post

10 REPLIES 10
Reeza
Super User

Retain won't work on  as there is already a field.

Sort descending on MSA, create a new variable that is retained and fill down from there.

m1986MM
Obsidian | Level 7

The problem is that if there isn't any non-missing value for a specific CBSA then the cell will be filled with the previous values, but I want it to be missing, like the table below.

CBSAMSAMSA_filled
3338989
333.89
3338989
5558888
5558888
555.88
555.88
222..
222..
111.

45

1114545
111.45
111.45
Reeza
Super User

Use By Group processing on CBSA - if first.CBSA then set it to missing

Uditg_ucla
Calcite | Level 5

There are plenty of ways to achieve this. However, the one that seems most intuitive to me is the following:

*Create a master list;

data Master; set Table;

  if missing(MSA) then delete;

run;

*Make sure Master has just one entry for each CBSA;

proc sort data=Master nodupkey; by CBSA; run;

*Create new table as needed;

proc sql;

  create table Table2 as

  select t1.*, t2.MSA as MSA_filled

  from Table as t1

  left join Master as t2

  on t1.CBSA = t2.CBSA

;quit;

Ksharp
Super User

How about :

data have;
input CBSA     MSA      ;
cards;
111     .     
111     45     
111     .     
111     .     
222     67     
222     .     
333     89     
333     .     
333     89     
555     88     
555     88
555     .
555     .
;
run;
data want; 
retain MSA_filled _CBSA;
 do until(not missing(MSA) or last.CBSA);
  set have;
  by CBSA;
 end;
 if CBSA ne _CBSA then call missing(MSA_filled);
 if not missing(MSA) then MSA_filled=MSA ;
 do until(not missing(MSA) or last.CBSA);
  set have;
  by CBSA;
  output;
 end;
_CBSA=CBSA;
drop _CBSA;
run;

Xia Keshan

Haikuo
Onyx | Level 15

Actually IMHO the simplest approach would be like this,

proc sql;

create table want as

select cbsa, max(msa,max(msa)) /*min(msa)*/ as msa

from have

group by cbsa;

quit;

min or max doesn't matter since they both ignore missing values.

m1986MM
Obsidian | Level 7

So, what's the point of using retain?

Ksharp
Super User

If there is only one unique MSA for each group . You can do this .

Code: Program

data have;
input CBSA   MSA   ;
cards;
111 . 
111 45 
111 . 
111 . 
222 67 
222 . 
333 89 
333 . 
333 89 
555 88 
555 88
555 .
555 .
;
run;
data want;
merge have have(keep=CBSA MSA rename=(MSA=MSA_filled) where=(MSA_filled is not missing));
by CBSA;
run;

Haikuo
Onyx | Level 15

I don't quite understand your question. 'RETAIN' is a concept for Date Step, where data is processed Sequentially. SQL does NOT use/need 'RETAIN'. As you may wonder in my code: max(msa, max(msa)), the second max() is the summary function to get the max of the group, it can be replaced by min(), which is also a summary function. The first max() is not a summary function, it is to compare msa of certain row with the group max, it can also be replaced by min() and coalesce(). 

m1986MM
Obsidian | Level 7

Thank you. Actually I wanted to know what's the main difference between SQL code and RETAIN. It seems that in my case both methods work.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2319 views
  • 9 likes
  • 5 in conversation