BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
m1986MM
Obsidian | Level 7

Hello everyone,

In the following table I want to fill the missing value of MSA with non-missing values of those observations whose CBSA values are equal, so that the column MSA_filled be created. I tried retain but it didn't work. Do I need to use coalesce?

I appreciate your suggestions.

CBSAMSAMSA_filled
111.

45

1114545
111.45
111.45
2226767
222.67
3338989
333.89
3338989
5558888
5558888
555.88
555.88
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

How about :

data have;
input CBSA     MSA      ;
cards;
111     .     
111     45     
111     .     
111     .     
222     67     
222     .     
333     89     
333     .     
333     89     
555     88     
555     88
555     .
555     .
;
run;
data want; 
retain MSA_filled _CBSA;
 do until(not missing(MSA) or last.CBSA);
  set have;
  by CBSA;
 end;
 if CBSA ne _CBSA then call missing(MSA_filled);
 if not missing(MSA) then MSA_filled=MSA ;
 do until(not missing(MSA) or last.CBSA);
  set have;
  by CBSA;
  output;
 end;
_CBSA=CBSA;
drop _CBSA;
run;

Xia Keshan

View solution in original post

10 REPLIES 10
Reeza
Super User

Retain won't work on  as there is already a field.

Sort descending on MSA, create a new variable that is retained and fill down from there.

m1986MM
Obsidian | Level 7

The problem is that if there isn't any non-missing value for a specific CBSA then the cell will be filled with the previous values, but I want it to be missing, like the table below.

CBSAMSAMSA_filled
3338989
333.89
3338989
5558888
5558888
555.88
555.88
222..
222..
111.

45

1114545
111.45
111.45
Reeza
Super User

Use By Group processing on CBSA - if first.CBSA then set it to missing

Uditg_ucla
Calcite | Level 5

There are plenty of ways to achieve this. However, the one that seems most intuitive to me is the following:

*Create a master list;

data Master; set Table;

  if missing(MSA) then delete;

run;

*Make sure Master has just one entry for each CBSA;

proc sort data=Master nodupkey; by CBSA; run;

*Create new table as needed;

proc sql;

  create table Table2 as

  select t1.*, t2.MSA as MSA_filled

  from Table as t1

  left join Master as t2

  on t1.CBSA = t2.CBSA

;quit;

Ksharp
Super User

How about :

data have;
input CBSA     MSA      ;
cards;
111     .     
111     45     
111     .     
111     .     
222     67     
222     .     
333     89     
333     .     
333     89     
555     88     
555     88
555     .
555     .
;
run;
data want; 
retain MSA_filled _CBSA;
 do until(not missing(MSA) or last.CBSA);
  set have;
  by CBSA;
 end;
 if CBSA ne _CBSA then call missing(MSA_filled);
 if not missing(MSA) then MSA_filled=MSA ;
 do until(not missing(MSA) or last.CBSA);
  set have;
  by CBSA;
  output;
 end;
_CBSA=CBSA;
drop _CBSA;
run;

Xia Keshan

Haikuo
Onyx | Level 15

Actually IMHO the simplest approach would be like this,

proc sql;

create table want as

select cbsa, max(msa,max(msa)) /*min(msa)*/ as msa

from have

group by cbsa;

quit;

min or max doesn't matter since they both ignore missing values.

m1986MM
Obsidian | Level 7

So, what's the point of using retain?

Ksharp
Super User

If there is only one unique MSA for each group . You can do this .

Code: Program

data have;
input CBSA   MSA   ;
cards;
111 . 
111 45 
111 . 
111 . 
222 67 
222 . 
333 89 
333 . 
333 89 
555 88 
555 88
555 .
555 .
;
run;
data want;
merge have have(keep=CBSA MSA rename=(MSA=MSA_filled) where=(MSA_filled is not missing));
by CBSA;
run;

Haikuo
Onyx | Level 15

I don't quite understand your question. 'RETAIN' is a concept for Date Step, where data is processed Sequentially. SQL does NOT use/need 'RETAIN'. As you may wonder in my code: max(msa, max(msa)), the second max() is the summary function to get the max of the group, it can be replaced by min(), which is also a summary function. The first max() is not a summary function, it is to compare msa of certain row with the group max, it can also be replaced by min() and coalesce(). 

m1986MM
Obsidian | Level 7

Thank you. Actually I wanted to know what's the main difference between SQL code and RETAIN. It seems that in my case both methods work.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1552 views
  • 9 likes
  • 5 in conversation