Solved: Re: Filling the missing value

m1986MM · Posted 08-21-2015 04:47 PM

Hello everyone,

In the following table I want to fill the missing value of MSA with non-missing values of those observations whose CBSA values are equal, so that the column MSA_filled be created. I tried retain but it didn't work. Do I need to use coalesce?

I appreciate your suggestions.

CBSA	MSA	MSA_filled
111	.	45
111	45	45
111	.	45
111	.	45
222	67	67
222	.	67
333	89	89
333	.	89
333	89	89
555	88	88
555	88	88
555	.	88
555	.	88

Ksharp · Posted 08-22-2015 02:34 AM

How about :

data have;
input CBSA     MSA      ;
cards;
111     .     
111     45     
111     .     
111     .     
222     67     
222     .     
333     89     
333     .     
333     89     
555     88     
555     88
555     .
555     .
;
run;
data want; 
retain MSA_filled _CBSA;
 do until(not missing(MSA) or last.CBSA);
  set have;
  by CBSA;
 end;
 if CBSA ne _CBSA then call missing(MSA_filled);
 if not missing(MSA) then MSA_filled=MSA ;
 do until(not missing(MSA) or last.CBSA);
  set have;
  by CBSA;
  output;
 end;
_CBSA=CBSA;
drop _CBSA;
run;

Xia Keshan

View solution in original post

Reeza · Posted 08-21-2015 04:59 PM

Retain won't work on as there is already a field.

Sort descending on MSA, create a new variable that is retained and fill down from there.

m1986MM · Posted 08-21-2015 05:20 PM

The problem is that if there isn't any non-missing value for a specific CBSA then the cell will be filled with the previous values, but I want it to be missing, like the table below.

CBSA	MSA	MSA_filled
333	89	89
333	.	89
333	89	89
555	88	88
555	88	88
555	.	88
555	.	88
222	.	.
222	.	.
111	.	45
111	45	45
111	.	45
111	.	45

Reeza · Posted 08-21-2015 05:23 PM

Use By Group processing on CBSA - if first.CBSA then set it to missing

Uditg_ucla · Posted 08-21-2015 06:09 PM

There are plenty of ways to achieve this. However, the one that seems most intuitive to me is the following:

*Create a master list;

data Master; set Table;

if missing(MSA) then delete;

run;

*Make sure Master has just one entry for each CBSA;

proc sort data=Master nodupkey; by CBSA; run;

*Create new table as needed;

proc sql;

create table Table2 as

select t1.*, t2.MSA as MSA_filled

from Table as t1

left join Master as t2

on t1.CBSA = t2.CBSA

;quit;

Ksharp · Posted 08-22-2015 02:34 AM

How about :

data have;
input CBSA     MSA      ;
cards;
111     .     
111     45     
111     .     
111     .     
222     67     
222     .     
333     89     
333     .     
333     89     
555     88     
555     88
555     .
555     .
;
run;
data want; 
retain MSA_filled _CBSA;
 do until(not missing(MSA) or last.CBSA);
  set have;
  by CBSA;
 end;
 if CBSA ne _CBSA then call missing(MSA_filled);
 if not missing(MSA) then MSA_filled=MSA ;
 do until(not missing(MSA) or last.CBSA);
  set have;
  by CBSA;
  output;
 end;
_CBSA=CBSA;
drop _CBSA;
run;

Xia Keshan

Haikuo · Posted 08-22-2015 03:26 PM

Actually IMHO the simplest approach would be like this,

proc sql;

create table want as

select cbsa, max(msa,max(msa)) /*min(msa)*/ as msa

from have

group by cbsa;

quit;

min or max doesn't matter since they both ignore missing values.

m1986MM · Posted 08-22-2015 05:13 PM

So, what's the point of using retain?

Ksharp · Posted 08-22-2015 11:48 PM

If there is only one unique MSA for each group . You can do this .

Code: Program

data have;
input CBSA   MSA   ;
cards;
111 .  
111 45  
111 .  
111 .  
222 67  
222 .  
333 89  
333 .  
333 89  
555 88  
555 88
555 .
555 .
;
run;
data want;
 merge have have(keep=CBSA MSA rename=(MSA=MSA_filled) where=(MSA_filled is not missing));
 by CBSA;
run;

Haikuo · Posted 08-23-2015 12:46 PM

I don't quite understand your question. 'RETAIN' is a concept for Date Step, where data is processed Sequentially. SQL does NOT use/need 'RETAIN'. As you may wonder in my code: max(msa, max(msa)), the second max() is the summary function to get the max of the group, it can be replaced by min(), which is also a summary function. The first max() is not a summary function, it is to compare msa of certain row with the group max, it can also be replaced by min() and coalesce().

m1986MM · Posted 08-23-2015 12:50 PM

Thank you. Actually I wanted to know what's the main difference between SQL code and RETAIN. It seems that in my case both methods work.

CBSA	MSA	MSA_filled
111	.	45
111	45	45
111	.	45
111	.	45
222	67	67
222	.	67
333	89	89
333	.	89
333	89	89
555	88	88
555	88	88
555	.	88
555	.	88

CBSA	MSA	MSA_filled
111	.	45
111	45	45
111	.	45
111	.	45
222	67	67
222	.	67
333	89	89
333	.	89
333	89	89
555	88	88
555	88	88
555	.	88
555	.	88

Code: Program

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away

CBSA	MSA	MSA_filled
111	.	45
111	45	45
111	.	45
111	.	45
222	67	67
222	.	67
333	89	89
333	.	89
333	89	89
555	88	88
555	88	88
555	.	88
555	.	88