Hello everyone,
I have a large dataset containing many missing values. Below I provide a simple example of dataset:
Institution_Name | Id | Unique_Numbr | date |
A | 111 | 8 | 3/14/14 |
A | 111 | 3/15/14 | |
A | 112 | 9 | 3/16/14 |
A | 113 | 3/17/14 | |
B | 211 | 1 | 3/18/14 |
B | 211 | 3/19/14 | |
C | 311 | 4 | 3/20/14 |
C | 311 | 3/21/14 | |
C | 311 | 3/22/14 | |
C | 312 | 2 | 3/23/14 |
I need to replace the missing values with the unique number that has exactly the same Institution_name and id, otherwise let it to be missing value. So the output table will be like below:
Institution_Name | Id | Unique_Numbr | date |
A | 111 | 8 | 3/14/14 |
A | 111 | 8 | 3/15/14 |
A | 112 | 9 | 3/16/14 |
A | 113 | 3/17/14 | |
B | 211 | 1 | 3/18/14 |
B | 211 | 1 | 3/19/14 |
C | 311 | 4 | 3/20/14 |
C | 311 | 4 | 3/21/14 |
C | 311 | 4 | 3/22/14 |
C | 312 | 2 | 3/23/14 |
I appreciate your suggestions.
data have;
infile cards expandtabs;
input (Institution_Name Id Unique_Numbr date) ($);
cards;
A 111 8 3/14/14
A 111 . 3/15/14
A 112 9 3/16/14
A 113 . 3/17/14
B 211 1 3/18/14
B 211 . 3/19/14
C 311 4 3/20/14
C 311 . 3/21/14
C 311 . 3/22/14
C 312 2 3/23/14
;
run;
data want;
set have;
by Institution_Name Id;
retain _Unique_Numbr;
if first.Id then _Unique_Numbr=Unique_Numbr;
if not missing(Unique_Numbr) then _Unique_Numbr=Unique_Numbr;
drop Unique_Numbr ;
run;
Here is an SQL solution:
proc sql ;
create table want as
select a.institution_name, a.id, b.unique_numbr, a.date
from have a
left join
(select distinct institution_name, id, unique_numbr
from have
where unique_numbr is not missing) b
on a.institution_name = b.institution_name
and a.id = b.id;
And here's a solution using hash object - which would be more efficient.
data want2;
if _n_ = 0 then set have;
declare hash h(dataset:'have(where=(unique_numbr is not missing))');
h.definekey('institution_name', 'id');
h.definedata('unique_numbr');
h.definedone();
do until (the_end);
set have end=the_end;
if unique_numbr = . then
rc = h.find();
output;
end;
stop;
drop rc;
run;
data have;
infile cards expandtabs;
input (Institution_Name Id Unique_Numbr date) ($);
cards;
A 111 8 3/14/14
A 111 . 3/15/14
A 112 9 3/16/14
A 113 . 3/17/14
B 211 1 3/18/14
B 211 . 3/19/14
C 311 4 3/20/14
C 311 . 3/21/14
C 311 . 3/22/14
C 312 2 3/23/14
;
run;
data want;
set have;
by Institution_Name Id;
retain _Unique_Numbr;
if first.Id then _Unique_Numbr=Unique_Numbr;
if not missing(Unique_Numbr) then _Unique_Numbr=Unique_Numbr;
drop Unique_Numbr ;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.