Hello everyone,
I have a large dataset containing many missing values. Below I provide a simple example of dataset:
Institution_Name | Id | Unique_Numbr | date |
A | 111 | 8 | 3/14/14 |
A | 111 | 3/15/14 | |
A | 112 | 9 | 3/16/14 |
A | 113 | 3/17/14 | |
B | 211 | 1 | 3/18/14 |
B | 211 | 3/19/14 | |
C | 311 | 4 | 3/20/14 |
C | 311 | 3/21/14 | |
C | 311 | 3/22/14 | |
C | 312 | 2 | 3/23/14 |
I need to replace the missing values with the unique number that has exactly the same Institution_name and id, otherwise let it to be missing value. So the output table will be like below:
Institution_Name | Id | Unique_Numbr | date |
A | 111 | 8 | 3/14/14 |
A | 111 | 8 | 3/15/14 |
A | 112 | 9 | 3/16/14 |
A | 113 | 3/17/14 | |
B | 211 | 1 | 3/18/14 |
B | 211 | 1 | 3/19/14 |
C | 311 | 4 | 3/20/14 |
C | 311 | 4 | 3/21/14 |
C | 311 | 4 | 3/22/14 |
C | 312 | 2 | 3/23/14 |
I appreciate your suggestions.
data have;
infile cards expandtabs;
input (Institution_Name Id Unique_Numbr date) ($);
cards;
A 111 8 3/14/14
A 111 . 3/15/14
A 112 9 3/16/14
A 113 . 3/17/14
B 211 1 3/18/14
B 211 . 3/19/14
C 311 4 3/20/14
C 311 . 3/21/14
C 311 . 3/22/14
C 312 2 3/23/14
;
run;
data want;
set have;
by Institution_Name Id;
retain _Unique_Numbr;
if first.Id then _Unique_Numbr=Unique_Numbr;
if not missing(Unique_Numbr) then _Unique_Numbr=Unique_Numbr;
drop Unique_Numbr ;
run;
Here is an SQL solution:
proc sql ;
create table want as
select a.institution_name, a.id, b.unique_numbr, a.date
from have a
left join
(select distinct institution_name, id, unique_numbr
from have
where unique_numbr is not missing) b
on a.institution_name = b.institution_name
and a.id = b.id;
And here's a solution using hash object - which would be more efficient.
data want2;
if _n_ = 0 then set have;
declare hash h(dataset:'have(where=(unique_numbr is not missing))');
h.definekey('institution_name', 'id');
h.definedata('unique_numbr');
h.definedone();
do until (the_end);
set have end=the_end;
if unique_numbr = . then
rc = h.find();
output;
end;
stop;
drop rc;
run;
data have;
infile cards expandtabs;
input (Institution_Name Id Unique_Numbr date) ($);
cards;
A 111 8 3/14/14
A 111 . 3/15/14
A 112 9 3/16/14
A 113 . 3/17/14
B 211 1 3/18/14
B 211 . 3/19/14
C 311 4 3/20/14
C 311 . 3/21/14
C 311 . 3/22/14
C 312 2 3/23/14
;
run;
data want;
set have;
by Institution_Name Id;
retain _Unique_Numbr;
if first.Id then _Unique_Numbr=Unique_Numbr;
if not missing(Unique_Numbr) then _Unique_Numbr=Unique_Numbr;
drop Unique_Numbr ;
run;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.