Hi,
I have two datasets. Ref contains refrence values, with ID and 1000 variables but just one row.
B dataset contains ID, DOB, and same 1000 variables. Now Some of those 1000 variables in dataset B are populated and some are missing. Wherever those vallues are missing I need to pick the value of that variable from Ref dataset .
Do anyone has a short way of doing it. I dont want to rename 1000 variables in ref dataset and then after merging substitute the value.
I was thinking of using coalesce function in SQL, but that also needs naming all the variables. Any help would be appreciated.
Thanks!
Richard: Actually, I was just looking at using the update statement for this in the case where the reference file only contains one record.
The following was what I was/am proposing:
data reference;
input x1-x3;
cards;
1 2 3
;
data have;
informat dob date9.;
input id dob x1-x3;
cards;
1 14nov2012 2 4 6
2 14nov2012 3 . 7
3 14nov2012 . . 8
4 14nov2012 4 6 8
5 14nov2012 . . .
;
data expanded_reference;
point=1;
set have (keep=id dob);
set reference point=point;
run;
data want;
update expanded_reference have;
by id;
run;
Transpose your data to have one variable but 1000 rows per ID, merge them by ID, variable name, and coalesce the one variable. Then re-transpose.
Or look at the update statement but I'm not sure how that works...
Update is simple enough, and might be the right answer. You can update into a new table, so your original data is still available.
data want ;
update B (in = in_B)
Ref
;
by Id ;
If in_B ;
run ;
Caveat - data has to be sorted by the by variables. Nulls in Ref will not overwrite non null data in B. However, non-null data in Ref will update whatever is in B so if some data in B is more up to date you might need a more complex solution.
I was puzzled by the phrase "but just one row" in your description. Does this mean just one row per ID? if so, does this imply more than one row in B? If so, update is not what you want.
Richard in Oz.
Richard: Actually, I was just looking at using the update statement for this in the case where the reference file only contains one record.
The following was what I was/am proposing:
data reference;
input x1-x3;
cards;
1 2 3
;
data have;
informat dob date9.;
input id dob x1-x3;
cards;
1 14nov2012 2 4 6
2 14nov2012 3 . 7
3 14nov2012 . . 8
4 14nov2012 4 6 8
5 14nov2012 . . .
;
data expanded_reference;
point=1;
set have (keep=id dob);
set reference point=point;
run;
data want;
update expanded_reference have;
by id;
run;
@art: U r awesome. Your solution is just so unique and perfect.Thanku so much. Update is really helpful. I should read more about it.
@richard: Many thanks for introducing update!
Art, I ain't saying anything. LOL.
Haikuo
For the 1,000 variables, are they all numeric or all character, or some mixture of the two?
Here is one using more "conventional" approach, raw data was stolen from Art's post:
data reference;
input x1-x3;
cards;
1 2 3
;
data have;
informat dob date9.;
input id dob x1-x3;
cards;
1 14nov2012 2 4 6
2 14nov2012 3 . 7
3 14nov2012 . . 8
4 14nov2012 4 6 8
5 14nov2012 . . .
;
data want;
if _n_=1 then set reference (rename=x1-x3=_x1-_x3);
set have;
array have x1-x3;
array ref _x1-_x3;
do i=1 to dim(have);
have(i)=coalesce(have(i),ref(i));
end;
drop _: i;
run;
Haikuo
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.