Help using Base SAS procedures

Filling the missing value

Accepted Solution Solved
Reply
Contributor
Posts: 65
Accepted Solution

Filling the missing value

Hello everyone,

In the following table I want to fill the missing value of MSA with non-missing values of those observations whose CBSA values are equal, so that the column MSA_filled be created. I tried retain but it didn't work. Do I need to use coalesce?

I appreciate your suggestions.

CBSAMSAMSA_filled
111.

45

1114545
111.45
111.45
2226767
222.67
3338989
333.89
3338989
5558888
5558888
555.88
555.88

Accepted Solutions
Solution
‎08-22-2015 02:34 AM
Super User
Posts: 10,020

Re: Filling the missing value

How about :

data have;
input CBSA     MSA      ;
cards;
111     .     
111     45     
111     .     
111     .     
222     67     
222     .     
333     89     
333     .     
333     89     
555     88     
555     88
555     .
555     .
;
run;
data want; 
retain MSA_filled _CBSA;
 do until(not missing(MSA) or last.CBSA);
  set have;
  by CBSA;
 end;
 if CBSA ne _CBSA then call missing(MSA_filled);
 if not missing(MSA) then MSA_filled=MSA ;
 do until(not missing(MSA) or last.CBSA);
  set have;
  by CBSA;
  output;
 end;
_CBSA=CBSA;
drop _CBSA;
run;

Xia Keshan

View solution in original post


All Replies
Super User
Posts: 19,772

Re: Filling the missing value

Retain won't work on  as there is already a field.

Sort descending on MSA, create a new variable that is retained and fill down from there.

Contributor
Posts: 65

Re: Filling the missing value

The problem is that if there isn't any non-missing value for a specific CBSA then the cell will be filled with the previous values, but I want it to be missing, like the table below.

CBSAMSAMSA_filled
3338989
333.89
3338989
5558888
5558888
555.88
555.88
222..
222..
111.

45

1114545
111.45
111.45
Super User
Posts: 19,772

Re: Filling the missing value

Use By Group processing on CBSA - if first.CBSA then set it to missing

New Contributor
Posts: 2

Re: Filling the missing value

There are plenty of ways to achieve this. However, the one that seems most intuitive to me is the following:

*Create a master list;

data Master; set Table;

  if missing(MSA) then delete;

run;

*Make sure Master has just one entry for each CBSA;

proc sort data=Master nodupkey; by CBSA; run;

*Create new table as needed;

proc sql;

  create table Table2 as

  select t1.*, t2.MSA as MSA_filled

  from Table as t1

  left join Master as t2

  on t1.CBSA = t2.CBSA

;quit;

Solution
‎08-22-2015 02:34 AM
Super User
Posts: 10,020

Re: Filling the missing value

How about :

data have;
input CBSA     MSA      ;
cards;
111     .     
111     45     
111     .     
111     .     
222     67     
222     .     
333     89     
333     .     
333     89     
555     88     
555     88
555     .
555     .
;
run;
data want; 
retain MSA_filled _CBSA;
 do until(not missing(MSA) or last.CBSA);
  set have;
  by CBSA;
 end;
 if CBSA ne _CBSA then call missing(MSA_filled);
 if not missing(MSA) then MSA_filled=MSA ;
 do until(not missing(MSA) or last.CBSA);
  set have;
  by CBSA;
  output;
 end;
_CBSA=CBSA;
drop _CBSA;
run;

Xia Keshan

Respected Advisor
Posts: 3,156

Re: Filling the missing value

Actually IMHO the simplest approach would be like this,

proc sql;

create table want as

select cbsa, max(msa,max(msa)) /*min(msa)*/ as msa

from have

group by cbsa;

quit;

min or max doesn't matter since they both ignore missing values.

Contributor
Posts: 65

Re: Filling the missing value

So, what's the point of using retain?

Super User
Posts: 10,020

Re: Filling the missing value

If there is only one unique MSA for each group . You can do this .

Code: Program

data have;
input CBSA   MSA   ;
cards;
111 . 
111 45 
111 . 
111 . 
222 67 
222 . 
333 89 
333 . 
333 89 
555 88 
555 88
555 .
555 .
;
run;
data want;
merge have have(keep=CBSA MSA rename=(MSA=MSA_filled) where=(MSA_filled is not missing));
by CBSA;
run;

Respected Advisor
Posts: 3,156

Re: Filling the missing value

I don't quite understand your question. 'RETAIN' is a concept for Date Step, where data is processed Sequentially. SQL does NOT use/need 'RETAIN'. As you may wonder in my code: max(msa, max(msa)), the second max() is the summary function to get the max of the group, it can be replaced by min(), which is also a summary function. The first max() is not a summary function, it is to compare msa of certain row with the group max, it can also be replaced by min() and coalesce(). 

Contributor
Posts: 65

Re: Filling the missing value

Thank you. Actually I wanted to know what's the main difference between SQL code and RETAIN. It seems that in my case both methods work.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 600 views
  • 9 likes
  • 5 in conversation