Solved: Re: Substituting reference values

maggi2410 · Posted 11-14-2012 05:31 PM

Hi,

I have two datasets. Ref contains refrence values, with ID and 1000 variables but just one row.

B dataset contains ID, DOB, and same 1000 variables. Now Some of those 1000 variables in dataset B are populated and some are missing. Wherever those vallues are missing I need to pick the value of that variable from Ref dataset .

Do anyone has a short way of doing it. I dont want to rename 1000 variables in ref dataset and then after merging substitute the value.

I was thinking of using coalesce function in SQL, but that also needs naming all the variables. Any help would be appreciated.

Thanks!

art297 · Posted 11-14-2012 06:13 PM

Richard: Actually, I was just looking at using the update statement for this in the case where the reference file only contains one record.

The following was what I was/am proposing:

data reference;

input x1-x3;

cards;

1 2 3

;

data have;

informat dob date9.;

input id dob x1-x3;

cards;

1 14nov2012 2 4 6

2 14nov2012 3 . 7

3 14nov2012 . . 8

4 14nov2012 4 6 8

5 14nov2012 . . .

;

data expanded_reference;

point=1;

set have (keep=id dob);

set reference point=point;

run;

data want;

update expanded_reference have;

by id;

run;

View solution in original post

Reeza · Posted 11-14-2012 05:35 PM

Transpose your data to have one variable but 1000 rows per ID, merge them by ID, variable name, and coalesce the one variable. Then re-transpose.

Or look at the update statement but I'm not sure how that works...

RichardinOz · Posted 11-14-2012 06:05 PM

Update is simple enough, and might be the right answer. You can update into a new table, so your original data is still available.

data want ;

update B (in = in_B)

Ref

;

by Id ;

If in_B ;

run ;

Caveat - data has to be sorted by the by variables. Nulls in Ref will not overwrite non null data in B. However, non-null data in Ref will update whatever is in B so if some data in B is more up to date you might need a more complex solution.

I was puzzled by the phrase "but just one row" in your description. Does this mean just one row per ID? if so, does this imply more than one row in B? If so, update is not what you want.

Richard in Oz.

art297 · Posted 11-14-2012 06:13 PM

Richard: Actually, I was just looking at using the update statement for this in the case where the reference file only contains one record.

The following was what I was/am proposing:

data reference;

input x1-x3;

cards;

1 2 3

;

data have;

informat dob date9.;

input id dob x1-x3;

cards;

1 14nov2012 2 4 6

2 14nov2012 3 . 7

3 14nov2012 . . 8

4 14nov2012 4 6 8

5 14nov2012 . . .

;

data expanded_reference;

point=1;

set have (keep=id dob);

set reference point=point;

run;

data want;

update expanded_reference have;

by id;

run;

maggi2410 · Posted 11-14-2012 07:52 PM

@art: U r awesome. Your solution is just so unique and perfect.Thanku so much. Update is really helpful. I should read more about it.

@richard: Many thanks for introducing update!

art297 · Posted 11-14-2012 07:57 PM

: Don't even THINK about posting what I KNOW you are thinking of posting!

Haikuo · Posted 11-14-2012 08:25 PM

Art, I ain't saying anything. LOL.

Haikuo

art297 · Posted 11-14-2012 05:55 PM

For the 1,000 variables, are they all numeric or all character, or some mixture of the two?

Haikuo · Posted 11-14-2012 09:02 PM

Here is one using more "conventional" approach, raw data was stolen from Art's post:

data reference;

input x1-x3;

cards;

1 2 3

;

data have;

informat dob date9.;

input id dob x1-x3;

cards;

1 14nov2012 2 4 6

2 14nov2012 3 . 7

3 14nov2012 . . 8

4 14nov2012 4 6 8

5 14nov2012 . . .

;

data want;

if _n_=1 then set reference (rename=x1-x3=_x1-_x3);

set have;

array have x1-x3;

array ref _x1-_x3;

do i=1 to dim(have);

have(i)=coalesce(have(i),ref(i));

end;

drop _: i;

run;

Haikuo

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away