Hello everyone,
In the following table I want to fill the missing value of MSA with non-missing values of those observations whose CBSA values are equal, so that the column MSA_filled be created. I tried retain but it didn't work. Do I need to use coalesce?
I appreciate your suggestions.
CBSA | MSA | MSA_filled |
---|---|---|
111 | . | 45 |
111 | 45 | 45 |
111 | . | 45 |
111 | . | 45 |
222 | 67 | 67 |
222 | . | 67 |
333 | 89 | 89 |
333 | . | 89 |
333 | 89 | 89 |
555 | 88 | 88 |
555 | 88 | 88 |
555 | . | 88 |
555 | . | 88 |
How about :
data have; input CBSA MSA ; cards; 111 . 111 45 111 . 111 . 222 67 222 . 333 89 333 . 333 89 555 88 555 88 555 . 555 . ; run; data want; retain MSA_filled _CBSA; do until(not missing(MSA) or last.CBSA); set have; by CBSA; end; if CBSA ne _CBSA then call missing(MSA_filled); if not missing(MSA) then MSA_filled=MSA ; do until(not missing(MSA) or last.CBSA); set have; by CBSA; output; end; _CBSA=CBSA; drop _CBSA; run;
Xia Keshan
Retain won't work on as there is already a field.
Sort descending on MSA, create a new variable that is retained and fill down from there.
The problem is that if there isn't any non-missing value for a specific CBSA then the cell will be filled with the previous values, but I want it to be missing, like the table below.
CBSA | MSA | MSA_filled |
333 | 89 | 89 |
333 | . | 89 |
333 | 89 | 89 |
555 | 88 | 88 |
555 | 88 | 88 |
555 | . | 88 |
555 | . | 88 |
222 | . | . |
222 | . | . |
111 | . | 45 |
111 | 45 | 45 |
111 | . | 45 |
111 | . | 45 |
Use By Group processing on CBSA - if first.CBSA then set it to missing
There are plenty of ways to achieve this. However, the one that seems most intuitive to me is the following:
*Create a master list;
data Master; set Table;
if missing(MSA) then delete;
run;
*Make sure Master has just one entry for each CBSA;
proc sort data=Master nodupkey; by CBSA; run;
*Create new table as needed;
proc sql;
create table Table2 as
select t1.*, t2.MSA as MSA_filled
from Table as t1
left join Master as t2
on t1.CBSA = t2.CBSA
;quit;
How about :
data have; input CBSA MSA ; cards; 111 . 111 45 111 . 111 . 222 67 222 . 333 89 333 . 333 89 555 88 555 88 555 . 555 . ; run; data want; retain MSA_filled _CBSA; do until(not missing(MSA) or last.CBSA); set have; by CBSA; end; if CBSA ne _CBSA then call missing(MSA_filled); if not missing(MSA) then MSA_filled=MSA ; do until(not missing(MSA) or last.CBSA); set have; by CBSA; output; end; _CBSA=CBSA; drop _CBSA; run;
Xia Keshan
Actually IMHO the simplest approach would be like this,
proc sql;
create table want as
select cbsa, max(msa,max(msa)) /*min(msa)*/ as msa
from have
group by cbsa;
quit;
min or max doesn't matter since they both ignore missing values.
So, what's the point of using retain?
If there is only one unique MSA for each group . You can do this .
data have;
input CBSA MSA ;
cards;
111 .
111 45
111 .
111 .
222 67
222 .
333 89
333 .
333 89
555 88
555 88
555 .
555 .
;
run;
data want;
merge have have(keep=CBSA MSA rename=(MSA=MSA_filled) where=(MSA_filled is not missing));
by CBSA;
run;
I don't quite understand your question. 'RETAIN' is a concept for Date Step, where data is processed Sequentially. SQL does NOT use/need 'RETAIN'. As you may wonder in my code: max(msa, max(msa)), the second max() is the summary function to get the max of the group, it can be replaced by min(), which is also a summary function. The first max() is not a summary function, it is to compare msa of certain row with the group max, it can also be replaced by min() and coalesce().
Thank you. Actually I wanted to know what's the main difference between SQL code and RETAIN. It seems that in my case both methods work.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.