BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GeetVish
Calcite | Level 5

 I have two datasets A and B with a common column severity. I need to identify which row of B has the severity closest to the severity of the each row of A. It is called the nearest neighbor problem.

I am  reading in both A and B and then for each row of A search for the nearest neighbor in B using a do loop. However, I am not sure how to create index for B so that a do loop can be done. 

 


Please NOTE : There are no common variables  and the subjects are in different studies. The purpose of the analysis is to figure out who in B is the closest subject to each one in A. . 

In terms of disease severity.

 

The final output should be below:-- please HELP AND URGENT :-

Severity_A

subject_A

Severity_B

subject_B

1.2

1

1.5

6

1.2

1

2.6

7

1.2

1

4.5

8

1.2

1

5.1

9

1.2

1

5.7

10

2.3

2

1.5

6

2.3

2

2.6

7

2.3

2

4.5

8

2.3

2

5.1

9

2.3

2

5.7

10

 

 

 

Any help will be greatly appreciated. Thank you very much!

1 ACCEPTED SOLUTION

Accepted Solutions
DanielLangley
Quartz | Level 8

Without knowing your expected output we can only have our best guess. Try this.

To get to the final results uncomment the commented code.

 

Data A;
input Severity subject;
cards;
1.2 1
2.3 2
3.4 3
5.3 4
6.7 5
;
run;
   
Data B;
input Severity subject;
cards;
1.5 6
2.6 7
4.5 8
5.1 9
5.7 10
;
run;

proc sql;
create table c as
	select 	a.Severity as A_Severity
	,		a.Subject as A_Subject
	,		b.Severity as B_Severity
	,		b.Subject as B_Subject
	,		abs(A.Severity - B.Severity) as Distance
	from 	work.A as A
	,		work.B as B

/*	group by a.Subject*/
/*	having calculated Distance = min(Calculated Distance)*/
	order by A_Subject, Calculated Distance
;
quit;

View solution in original post

9 REPLIES 9
HB
Barite | Level 11 HB
Barite | Level 11

Actual data to go with your desired results would be helpful.

 

Do you need unique nearest neighbors?  Meaning that if in Set A we have severity values of 1.3 and 1.5, and in Set B we have a 1.4 and a 2.7.  Can the 1.3 and the 1.5 both be matched to the 1.4 or do we need to use unique values?

 

  

GeetVish
Calcite | Level 5

 They can both be matched since we want to know whom in B are similar to those in A.

GeetVish
Calcite | Level 5

can you help me ?

DanielLangley
Quartz | Level 8

Without knowing your expected output we can only have our best guess. Try this.

To get to the final results uncomment the commented code.

 

Data A;
input Severity subject;
cards;
1.2 1
2.3 2
3.4 3
5.3 4
6.7 5
;
run;
   
Data B;
input Severity subject;
cards;
1.5 6
2.6 7
4.5 8
5.1 9
5.7 10
;
run;

proc sql;
create table c as
	select 	a.Severity as A_Severity
	,		a.Subject as A_Subject
	,		b.Severity as B_Severity
	,		b.Subject as B_Subject
	,		abs(A.Severity - B.Severity) as Distance
	from 	work.A as A
	,		work.B as B

/*	group by a.Subject*/
/*	having calculated Distance = min(Calculated Distance)*/
	order by A_Subject, Calculated Distance
;
quit;
GeetVish
Calcite | Level 5

Thanks a lot , I will let you know if this is matching with requirement. But really appreciate your response and help 

 

GeetVish
Calcite | Level 5

the answer is perfect, Thanks 

ballardw
Super User

Please clarify the apparent inconsistency in your text:


@GeetVish wrote:

 I have two datasets A and B with a common column severity. I need to identify which row of B has the severity closest to the severity of the each row of A. It is called the nearest neighbor problem.

I am  reading in both A and B and then for each row of A search for the nearest neighbor in B using a do loop. However, I am not sure how to create index for B so that a do loop can be done. 

 


Please NOTE : There are no common variables  and the subjects are in different studies. The purpose of the analysis is to figure out who in B is the closest subject to each one in A. . 

In terms of disease severity.


Please provide examples of appropriate variables from BOTH input data sets, best as data step code to ensure we have the correct names and types. Then provide example output expected from that example input. Your example output as shown also does not show a "nearest neighbor" result but a "five nearest neighbors" result. If the desire is 5 nearest then please say so as that is a significantly different problem.

 

Since Proc Sql does not support do loops I am having a hard time imagining what you may be contemplating.

 

GeetVish
Calcite | Level 5

Dataset A:
Severity subject
1.2 1
2.3 2
3.4 3
5.3 4
6.7 5
   
Dataset B:
Severity subject
1.5 6
2.6 7
4.5 8
5.1 9
5.7 10

GeetVish
Calcite | Level 5

can you help me ?

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2347 views
  • 0 likes
  • 4 in conversation