Hello everyone,
I have a large dataset containing many missing values. Below I provide a simple example of dataset:
Institution_Name | Id | Unique_Numbr | date |
A | 111 | 8 | 3/14/14 |
A | 111 | 3/15/14 | |
A | 112 | 9 | 3/16/14 |
A | 113 | 3/17/14 | |
B | 211 | 1 | 3/18/14 |
B | 211 | 3/19/14 | |
C | 311 | 4 | 3/20/14 |
C | 311 | 3/21/14 | |
C | 311 | 3/22/14 | |
C | 312 | 2 | 3/23/14 |
I need to replace the missing values with the unique number that has exactly the same Institution_name and id, otherwise let it to be missing value. So the output table will be like below:
Institution_Name | Id | Unique_Numbr | date |
A | 111 | 8 | 3/14/14 |
A | 111 | 8 | 3/15/14 |
A | 112 | 9 | 3/16/14 |
A | 113 | 3/17/14 | |
B | 211 | 1 | 3/18/14 |
B | 211 | 1 | 3/19/14 |
C | 311 | 4 | 3/20/14 |
C | 311 | 4 | 3/21/14 |
C | 311 | 4 | 3/22/14 |
C | 312 | 2 | 3/23/14 |
I appreciate your suggestions.
data have;
infile cards expandtabs;
input (Institution_Name Id Unique_Numbr date) ($);
cards;
A 111 8 3/14/14
A 111 . 3/15/14
A 112 9 3/16/14
A 113 . 3/17/14
B 211 1 3/18/14
B 211 . 3/19/14
C 311 4 3/20/14
C 311 . 3/21/14
C 311 . 3/22/14
C 312 2 3/23/14
;
run;
data want;
set have;
by Institution_Name Id;
retain _Unique_Numbr;
if first.Id then _Unique_Numbr=Unique_Numbr;
if not missing(Unique_Numbr) then _Unique_Numbr=Unique_Numbr;
drop Unique_Numbr ;
run;
Here is an SQL solution:
proc sql ;
create table want as
select a.institution_name, a.id, b.unique_numbr, a.date
from have a
left join
(select distinct institution_name, id, unique_numbr
from have
where unique_numbr is not missing) b
on a.institution_name = b.institution_name
and a.id = b.id;
And here's a solution using hash object - which would be more efficient.
data want2;
if _n_ = 0 then set have;
declare hash h(dataset:'have(where=(unique_numbr is not missing))');
h.definekey('institution_name', 'id');
h.definedata('unique_numbr');
h.definedone();
do until (the_end);
set have end=the_end;
if unique_numbr = . then
rc = h.find();
output;
end;
stop;
drop rc;
run;
data have;
infile cards expandtabs;
input (Institution_Name Id Unique_Numbr date) ($);
cards;
A 111 8 3/14/14
A 111 . 3/15/14
A 112 9 3/16/14
A 113 . 3/17/14
B 211 1 3/18/14
B 211 . 3/19/14
C 311 4 3/20/14
C 311 . 3/21/14
C 311 . 3/22/14
C 312 2 3/23/14
;
run;
data want;
set have;
by Institution_Name Id;
retain _Unique_Numbr;
if first.Id then _Unique_Numbr=Unique_Numbr;
if not missing(Unique_Numbr) then _Unique_Numbr=Unique_Numbr;
drop Unique_Numbr ;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.