DATA Step, Macro, Functions and more

Substituting reference values

Accepted Solution Solved
Reply
Contributor
Posts: 43
Accepted Solution

Substituting reference values

Hi,

I have two datasets. Ref contains refrence values, with ID and 1000 variables but just one row.

B dataset contains ID, DOB, and same 1000 variables. Now Some of those 1000 variables in dataset B are populated and some are missing. Wherever those vallues are missing I need to pick the value of that variable from Ref dataset .

Do anyone has a short way of doing it. I dont want to rename 1000 variables in ref dataset and then after merging substitute the value.

I was thinking of using coalesce function in SQL, but that also needs naming all the variables. Any help would be appreciated.

Thanks!


Accepted Solutions
Solution
‎11-14-2012 06:13 PM
PROC Star
Posts: 7,487

Re: Substituting reference values

Posted in reply to RichardinOz

Richard: Actually, I was just looking at using the update statement for this in the case where the reference file only contains one record.

The following was what I was/am proposing:

data reference;

  input x1-x3;

  cards;

1 2 3

;

data have;

  informat dob date9.;

  input id dob x1-x3;

  cards;

1 14nov2012 2 4 6

2 14nov2012 3 . 7

3 14nov2012 . . 8

4 14nov2012 4 6 8

5 14nov2012 . . .

;

data expanded_reference;

  point=1;

  set have (keep=id dob);

  set reference point=point;

run;

data want;

  update expanded_reference have;

  by id;

run;

View solution in original post


All Replies
Super User
Posts: 19,851

Re: Substituting reference values

Posted in reply to maggi2410

Transpose your data to have one variable but 1000 rows per ID, merge them by ID, variable name, and coalesce the one variable. Then re-transpose.

Or look at the update statement but I'm not sure how that works...

Super Contributor
Posts: 644

Re: Substituting reference values

Update is simple enough, and might be the right answer.  You can update into a new table, so your original data is still available.

data want ;

     update B (in = in_B)

               Ref

               ;

     by     Id ;

     If     in_B ;

run ;

Caveat - data has to be sorted by the by variables.  Nulls in Ref will not overwrite non null data in B.  However, non-null data in Ref will update whatever is in B so if some data in B is more up to date you might need a more complex solution.

I was puzzled by the phrase "but just one row" in your description.  Does this mean just one row per ID? if so, does this imply more than one row in B?  If so, update is not what you want.

Richard in Oz.

Solution
‎11-14-2012 06:13 PM
PROC Star
Posts: 7,487

Re: Substituting reference values

Posted in reply to RichardinOz

Richard: Actually, I was just looking at using the update statement for this in the case where the reference file only contains one record.

The following was what I was/am proposing:

data reference;

  input x1-x3;

  cards;

1 2 3

;

data have;

  informat dob date9.;

  input id dob x1-x3;

  cards;

1 14nov2012 2 4 6

2 14nov2012 3 . 7

3 14nov2012 . . 8

4 14nov2012 4 6 8

5 14nov2012 . . .

;

data expanded_reference;

  point=1;

  set have (keep=id dob);

  set reference point=point;

run;

data want;

  update expanded_reference have;

  by id;

run;

Contributor
Posts: 43

Re: Substituting reference values

@art: U r awesome. Your solution is just so unique and perfect.Thanku so much. Update is really helpful. I should read more about it.

@richard: Many thanks for introducing update!

PROC Star
Posts: 7,487

Re: Substituting reference values

Posted in reply to maggi2410

: Don't even THINK about posting what I KNOW you are thinking of posting!

Respected Advisor
Posts: 3,156

Re: Substituting reference values

Art, I ain't saying anything. LOL.

Haikuo

PROC Star
Posts: 7,487

Re: Substituting reference values

Posted in reply to maggi2410

For the 1,000 variables, are they all numeric or all character, or some mixture of the two?

Respected Advisor
Posts: 3,156

Re: Substituting reference values

Posted in reply to maggi2410

Here is one using more "conventional" approach, raw data was stolen from Art's post:

data reference;

  input x1-x3;

  cards;

1 2 3

;

data have;

  informat dob date9.;

  input id dob x1-x3;

  cards;

1 14nov2012 2 4 6

2 14nov2012 3 . 7

3 14nov2012 . . 8

4 14nov2012 4 6 8

5 14nov2012 . . .

;

data want;

if _n_=1 then set reference (rename=x1-x3=_x1-_x3);

set have;

array have x1-x3;

array ref _x1-_x3;

do i=1 to dim(have);

have(i)=coalesce(have(i),ref(i));

end;

drop _: i;

run;

Haikuo

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 259 views
  • 3 likes
  • 5 in conversation