Hello,
I have two datasets that look like the following:
Dataset 1:
| sbdv | num | loc |
| 150 | 0.876 | 1 |
| 160 | 0.993 | 1 |
| 171 | 0.937 | 1 |
Dataset 2:
| sbdv | num | loc |
| 120 | 0.987 | 0 |
| 121 | 0.965 | 0 |
| 123 | 0.945 | 0 |
| 125 | 0.912 | 0 |
I want to create a new dataset that contains all of dataset 1 plus a match of observations in dataset 2 of the observations with the minimum difference in 'num' between each obs in dataset 1 and all the obs in dataset 2.
So for sbdv = 150, I want to calculate the difference between its 'num' (0.876) and sbdv 120, 121, 123, and 125's respective 'num' values (0.987, 0.965, 0.945, 0.912). Then I want to save each 'match' to a new dataset containing all of dataset 1 and the matches for each observation.
So for this example, the new dataset would be:
| sbdv | num | loc |
| 150 | 0.876 | 1 |
| 160 | 0.993 | 1 |
| 171 | 0.937 | 1 |
| 125 | 0.912 | 0 |
| 120 | 0.987 | 0 |
| 123 | 0.945 | 0 |
I am guessing there is a straight forward solution, but I haven't been able to find anything that explicitly will do this.
Assuming the pairing has to be one-to-one, this can be expressed as a linear assignment problem and solved easily with PROC OPTNET (a proc of SAS/OR):
data ds1;
input sbdv num loc;
datalines;
150 0.876 1
160 0.993 1
171 0.937 1
;
data ds2;
input sbdv num loc;
datalines;
120 0.987 0
121 0.965 0
123 0.945 0
125 0.912 0
;
proc sql;
create table pairs as
select
ds1.sbdv as from,
ds2.sbdv as to,
abs(ds1.num-ds2.num) as weight
from ds1, ds2;
quit;
proc optnet data_links=pairs direction=directed;
linear_assignment out=selected_pairs;
run;
proc sql;
create table want as
select sbdv, num, loc
from
(select sbdv, num, loc from selected_pairs, ds1 where from=sbdv)
union all
(select sbdv, num, loc from selected_pairs, ds2 where to=sbdv);
quit;
You don't need to know which observation from ds 2 match each observation from ds 1?
No, just match ds 1 obs to ds 2 based on ds1 num - ds2 num
When you say "minimum difference" do you mean absolute value of difference?
Assuming the pairing has to be one-to-one, this can be expressed as a linear assignment problem and solved easily with PROC OPTNET (a proc of SAS/OR):
data ds1;
input sbdv num loc;
datalines;
150 0.876 1
160 0.993 1
171 0.937 1
;
data ds2;
input sbdv num loc;
datalines;
120 0.987 0
121 0.965 0
123 0.945 0
125 0.912 0
;
proc sql;
create table pairs as
select
ds1.sbdv as from,
ds2.sbdv as to,
abs(ds1.num-ds2.num) as weight
from ds1, ds2;
quit;
proc optnet data_links=pairs direction=directed;
linear_assignment out=selected_pairs;
run;
proc sql;
create table want as
select sbdv, num, loc
from
(select sbdv, num, loc from selected_pairs, ds1 where from=sbdv)
union all
(select sbdv, num, loc from selected_pairs, ds2 where to=sbdv);
quit;
data ds1;
input sbdv num loc;
datalines;
150 0.876 1
160 0.993 1
171 0.937 1
;
data ds2;
input _sbdv _num _loc;
datalines;
120 0.987 0
121 0.965 0
123 0.945 0
125 0.912 0
;
data want;
if _n_=1 then do;
if 0 then set ds2;
declare hash h(dataset:'ds2');
declare hiter hi('h');
h.definekey('_sbdv');
h.definedata(all:'y');
h.definedone();
end;
set ds1;
min=999999;
do while(hi.next()=0);
dis=abs(num-_num);
if dis<min then do;min=dis;sbdv1=_sbdv;num1=_num;loc1=_loc;end;
end;
h.remove(key:sbdv1);
drop _: min dis;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.