SAS Programming

Dregerator · Posted 11-08-2021 03:32 PM

Hi,

I have two datasets that have different b values. I want the nearest c value from the second dataset to be used for my first dataset when using b between two dataset

For example:

Dataset 1

a	b
1	702
1	788
1	1148
2	417
2	441
2	458
2	465
2	495
2	502
2	707
2	766
2	863
3	1547
4	977
4	1547

Dataset 2

a	b	c
1	598	13.1
1	652	12.9
1	701	12.5
1	766	12.3
1	807	14.6
1	1095	10.1
1	1142	.
1	1219	9.7
2	295	16.1
2	305	18.1
2	362	16.4
2	565	16.9
2	642	18.4
2	740	22.5
2	834	15.5
2	922	17.8
2	1007	27.6
2	1090	13.9
2	1286	14.7
2	1568	15.1
2	1561	11.9
3	903	10.5
3	1011	7.8
3	1099	8
3	1561	.
3	1656	5.4
4	862	1.7
4	906	1.1
4	990	1.8

Final Dataset:

a	b	c
1	702	12.5
1	788	14.6
1	1148	10.1
2	417	16.4
2	441	16.4
2	458	16.4
2	465	16.9
2	495	16.9
2	502	16.9
2	707	22.5
2	766	22.5
2	863	15.5
3	1547	5.4
4	977	1.8
4	1547	1.8

Any suggestions as to how I can create something similar to the final dataset given the values in dataset 1 and dataset 2?

Thank you in advance

Reeza · Posted 11-08-2021 08:05 PM

This can get you started but doesn't handle the ties yet. Unfortunately I don't have time to resolve that, but I think you could figure it out or hopefully someone else can chime in. This only looks at absolute difference so it works for exactly the question asked but if you have ties you'll end up with duplicates in the output.

proc sql;
create table want as
select t1.a, t1.b, t2.c, abs(t1.b - t2.b) as diff 
from data1 as t1
left join data2 as t2
on t1.a=t2.a
and not missing(t2.c)
group by t1.a, t1.b
having diff=min(diff)
;
quit;

View solution in original post

Reeza · Posted 11-08-2021 03:47 PM

I want the nearest c value from the second dataset to be used for my first dataset when using b between two dataset

What are the rules for nearest? Does it need to be above or below or does it not matter? If it doesn't matter, what happens if there's a tie and you have one with a value exactly below and above?

Depending on the criteria this could be a simple merge or something more complicated.

Spoiler

@Dregerator wrote:

Hi,

I have two datasets that have different b values. I want the nearest c value from the second dataset to be used for my first dataset when using b between two dataset

For example:

Dataset 1

a b

1 702

1 788

1 1148

2 417

2 441

2 458

2 465

2 495

2 502

2 707

2 766

2 863

3 1547

4 977

4 1547

Dataset 2

a b c

1 598 13.1

1 652 12.9

1 701 12.5

1 766 12.3

1 807 14.6

1 1095 10.1

1 1142 .

1 1219 9.7

2 295 16.1

2 305 18.1

2 362 16.4

2 565 16.9

2 642 18.4

2 740 22.5

2 834 15.5

2 922 17.8

2 1007 27.6

2 1090 13.9

2 1286 14.7

2 1568 15.1

2 1561 11.9

3 903 10.5

3 1011 7.8

3 1099 8

3 1561 .

3 1656 5.4

4 862 1.7

4 906 1.1

4 990 1.8

Final Dataset:

a b c

1 702 12.5

1 788 14.6

1 1148 10.1

2 417 16.4

2 441 16.4

2 458 16.4

2 465 16.9

2 495 16.9

2 502 16.9

2 707 22.5

2 766 22.5

2 863 15.5

3 1547 5.4

4 977 1.8

4 1547 1.8

Any suggestions as to how I can create something similar to the final dataset given the values in dataset 1 and dataset 2?

Thank you in advance

wrote: Hi, I have two datasets that have different b values. I want the nearest c value from the second dataset to be used for my first dataset when using b between two dataset For example: Dataset 1 a b 1 702 1 788 1 1148 2 417 2 441 2 458 2 465 2 495 2 502 2 707 2 766 2 863 3 1547 4 977 4 1547 Dataset 2 a b c 1 598 13.1 1 652 12.9 1 701 12.5 1 766 12.3 1 807 14.6 1 1095 10.1 1 1142 . 1 1219 9.7 2 295 16.1 2 305 18.1 2 362 16.4 2 565 16.9 2 642 18.4 2 740 22.5 2 834 15.5 2 922 17.8 2 1007 27.6 2 1090 13.9 2 1286 14.7 2 1568 15.1 2 1561 11.9 3 903 10.5 3 1011 7.8 3 1099 8 3 1561 . 3 1656 5.4 4 862 1.7 4 906 1.1 4 990 1.8 Final Dataset: a b c 1 702 12.5 1 788 14.6 1 1148 10.1 2 417 16.4 2 441 16.4 2 458 16.4 2 465 16.9 2 495 16.9 2 502 16.9 2 707 22.5 2 766 22.5 2 863 15.5 3 1547 5.4 4 977 1.8 4 1547 1.8 Any suggestions as to how I can create something similar to the final dataset given the values in dataset 1 and dataset 2? Thank you in advance

Dregerator · Posted 11-08-2021 03:52 PM

It does not matter...I would just like the closest value to dataset b because I'm trying to superimpose values in dataset a on a graph where dataset b is already displayed, but need the nearest/closest values of c so they are on top of each other. Not sure if this answers your question

Reeza · Posted 11-08-2021 04:30 PM

Unfortunately not. What happens if there's a tie for the smallest difference and one is above and one is below? Then which value do you take?

@Dregerator wrote:

It does not matter...I would just like the closest value to dataset b because I'm trying to superimpose values in dataset a on a graph where dataset b is already displayed, but need the nearest/closest values of c so they are on top of each other. Not sure if this answers your question

Dregerator · Posted 11-08-2021 04:51 PM

Can we do one above for example purposes?

Reeza · Posted 11-08-2021 05:41 PM

Data set A

a	b
1	702

Dataset 2

a	b	c
1	598	13.1
1	652	12.9
1	701	12.5
1	703	12.3

Now what should data set 3 look like?

Dregerator · Posted 11-08-2021 05:43 PM

It will be 12.5

a	b	c
1	702	12.5

Reeza · Posted 11-08-2021 05:44 PM

So below then, 701 because 703 is only 1 away as well.

Reeza · Posted 11-08-2021 08:05 PM

This can get you started but doesn't handle the ties yet. Unfortunately I don't have time to resolve that, but I think you could figure it out or hopefully someone else can chime in. This only looks at absolute difference so it works for exactly the question asked but if you have ties you'll end up with duplicates in the output.

proc sql;
create table want as
select t1.a, t1.b, t2.c, abs(t1.b - t2.b) as diff 
from data1 as t1
left join data2 as t2
on t1.a=t2.a
and not missing(t2.c)
group by t1.a, t1.b
having diff=min(diff)
;
quit;

PeterClemmensen · Posted 11-09-2021 02:40 AM

Here is an alternative


data one;
input a b;
datalines;
1 702
1 788
1 1148
2 417
2 441
2 458
2 465
2 495
2 502
2 707
2 766
2 863
3 1547
4 977
4 1547
;

data two;
input a b c;
datalines;
1 598 13.1
1 652 12.9
1 701 12.5
1 766 12.3
1 807 14.6
1 1095 10.1
1 1142 .
1 1219 9.7
2 295 16.1
2 305 18.1
2 362 16.4
2 565 16.9
2 642 18.4
2 740 22.5
2 834 15.5
2 922 17.8
2 1007 27.6
2 1090 13.9
2 1286 14.7
2 1568 15.1
2 1561 11.9
3 903 10.5
3 1011 7.8
3 1099 8
3 1561 .
3 1656 5.4
4 862 1.7
4 906 1.1
4 990 1.8
;

data want(keep = a b c);
 
   if _N_ = 1 then do;
      dcl hash h(dataset : "two(rename = b = bb where = (c))", multidata : "Y");
      h.definekey("a");
      h.definedata("bb", "c");
      h.definedone();
   end;
 
   set one;
   bb = .; c = .; con = constant("big");
 
   do while (h.do_over() = 0);
      if abs(bb - b) < con then do;
         con = abs(bb - b);
         cc = c;
      end;
   end;
 
   c = cc;
run;

The DATA to DATA Step Macro
Blog: SASnrd

SAS Programming

Finding the nearest value and using it for a dataset

Re: Finding the nearest value and using it for a dataset

Re: Finding the nearest value and using it for a dataset

Re: Finding the nearest value and using it for a dataset

Re: Finding the nearest value and using it for a dataset

Re: Finding the nearest value and using it for a dataset

Re: Finding the nearest value and using it for a dataset

Re: Finding the nearest value and using it for a dataset

Re: Finding the nearest value and using it for a dataset

Re: Finding the nearest value and using it for a dataset

Re: Finding the nearest value and using it for a dataset

ModelOps, MLOps, and Finding Value in Analytics

4GL Hackers: Finding your nearest Bitcoin ATM (given distance, purcha...

nearest and next nearest date organized accordingly

How to find minimum value higher than current value-30?

Finding if the Accounts are available in the datasets

Follow Us

What is...

SAS Programming

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...