BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
m1986MM
Obsidian | Level 7

Hello everyone,

I have a large dataset containing many missing values. Below I provide a simple example of dataset:

Institution_NameIdUnique_Numbrdate
A11183/14/14
A1113/15/14
A11293/16/14
A1133/17/14
B21113/18/14
B2113/19/14
C31143/20/14
C3113/21/14
C3113/22/14
C31223/23/14

I need to replace the missing values with the unique number that has exactly the same Institution_name and id, otherwise let it to be missing value. So the output table will be like below:

Institution_NameIdUnique_Numbrdate
A11183/14/14
A11183/15/14
A11293/16/14
A1133/17/14
B21113/18/14
B21113/19/14
C31143/20/14
C31143/21/14
C31143/22/14
C3122

3/23/14

I appreciate your suggestions.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Code: Program

data have;
infile cards expandtabs;
input (Institution_Name Id Unique_Numbr date) ($);
cards;
A 111 8 3/14/14
A 111 . 3/15/14
A 112 9 3/16/14
A 113 . 3/17/14
B 211 1 3/18/14
B 211 . 3/19/14
C 311 4 3/20/14
C 311 . 3/21/14
C 311 . 3/22/14
C 312 2 3/23/14
;
run;
data want;
set have;
by Institution_Name Id;
retain _Unique_Numbr;
if first.Id then _Unique_Numbr=Unique_Numbr;
if not missing(Unique_Numbr) then _Unique_Numbr=Unique_Numbr;
drop Unique_Numbr ;
run;

View solution in original post

4 REPLIES 4
JerryLeBreton
Pyrite | Level 9

Here is an SQL solution:

proc sql ;

  create table want as

  select a.institution_name, a.id, b.unique_numbr, a.date

  from have a

    left join

  (select distinct institution_name, id, unique_numbr

     from have

     where unique_numbr is not missing)  b

  on a.institution_name = b.institution_name

  and a.id = b.id;

JerryLeBreton
Pyrite | Level 9

And here's a solution using hash object - which would be more efficient.

data want2;
  if _n_ = 0 then  set have;

  declare hash h(dataset:'have(where=(unique_numbr is not missing))');
    h.definekey('institution_name', 'id');
    h.definedata('unique_numbr');
    h.definedone();

  do until (the_end);
     set have end=the_end;
     if unique_numbr = . then
        rc = h.find();
     output;
  end;
  stop;
  drop rc;
  run;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

There are many topics on here like this, if you want to update the dataset inplace then use the update method. 

Ksharp
Super User

Code: Program

data have;
infile cards expandtabs;
input (Institution_Name Id Unique_Numbr date) ($);
cards;
A 111 8 3/14/14
A 111 . 3/15/14
A 112 9 3/16/14
A 113 . 3/17/14
B 211 1 3/18/14
B 211 . 3/19/14
C 311 4 3/20/14
C 311 . 3/21/14
C 311 . 3/22/14
C 312 2 3/23/14
;
run;
data want;
set have;
by Institution_Name Id;
retain _Unique_Numbr;
if first.Id then _Unique_Numbr=Unique_Numbr;
if not missing(Unique_Numbr) then _Unique_Numbr=Unique_Numbr;
drop Unique_Numbr ;
run;
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 2049 views
  • 6 likes
  • 4 in conversation