BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
m1986MM
Obsidian | Level 7

Hello everyone,

I have a large dataset containing many missing values. Below I provide a simple example of dataset:

Institution_NameIdUnique_Numbrdate
A11183/14/14
A1113/15/14
A11293/16/14
A1133/17/14
B21113/18/14
B2113/19/14
C31143/20/14
C3113/21/14
C3113/22/14
C31223/23/14

I need to replace the missing values with the unique number that has exactly the same Institution_name and id, otherwise let it to be missing value. So the output table will be like below:

Institution_NameIdUnique_Numbrdate
A11183/14/14
A11183/15/14
A11293/16/14
A1133/17/14
B21113/18/14
B21113/19/14
C31143/20/14
C31143/21/14
C31143/22/14
C3122

3/23/14

I appreciate your suggestions.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Code: Program

data have;
infile cards expandtabs;
input (Institution_Name Id Unique_Numbr date) ($);
cards;
A 111 8 3/14/14
A 111 . 3/15/14
A 112 9 3/16/14
A 113 . 3/17/14
B 211 1 3/18/14
B 211 . 3/19/14
C 311 4 3/20/14
C 311 . 3/21/14
C 311 . 3/22/14
C 312 2 3/23/14
;
run;
data want;
set have;
by Institution_Name Id;
retain _Unique_Numbr;
if first.Id then _Unique_Numbr=Unique_Numbr;
if not missing(Unique_Numbr) then _Unique_Numbr=Unique_Numbr;
drop Unique_Numbr ;
run;

View solution in original post

4 REPLIES 4
JerryLeBreton
Pyrite | Level 9

Here is an SQL solution:

proc sql ;

  create table want as

  select a.institution_name, a.id, b.unique_numbr, a.date

  from have a

    left join

  (select distinct institution_name, id, unique_numbr

     from have

     where unique_numbr is not missing)  b

  on a.institution_name = b.institution_name

  and a.id = b.id;

JerryLeBreton
Pyrite | Level 9

And here's a solution using hash object - which would be more efficient.

data want2;
  if _n_ = 0 then  set have;

  declare hash h(dataset:'have(where=(unique_numbr is not missing))');
    h.definekey('institution_name', 'id');
    h.definedata('unique_numbr');
    h.definedone();

  do until (the_end);
     set have end=the_end;
     if unique_numbr = . then
        rc = h.find();
     output;
  end;
  stop;
  drop rc;
  run;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

There are many topics on here like this, if you want to update the dataset inplace then use the update method. 

Ksharp
Super User

Code: Program

data have;
infile cards expandtabs;
input (Institution_Name Id Unique_Numbr date) ($);
cards;
A 111 8 3/14/14
A 111 . 3/15/14
A 112 9 3/16/14
A 113 . 3/17/14
B 211 1 3/18/14
B 211 . 3/19/14
C 311 4 3/20/14
C 311 . 3/21/14
C 311 . 3/22/14
C 312 2 3/23/14
;
run;
data want;
set have;
by Institution_Name Id;
retain _Unique_Numbr;
if first.Id then _Unique_Numbr=Unique_Numbr;
if not missing(Unique_Numbr) then _Unique_Numbr=Unique_Numbr;
drop Unique_Numbr ;
run;

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 827 views
  • 6 likes
  • 4 in conversation