Pyrite | Level 9

## Earth mover's distance (EMD)

Has anyone used PROC KDE, or any other procedure to perform "Earth Mover's Distance" calculations?

There is a "Do Loop" blog on the topic from 2013, and I cannot find anything else on the topic since that time.

While the blog is helpful and the procedure relatively straight forward, there are nuances used that are application dependent, and I'm hoping to find others that have performed EMD calculations.

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
SAS Super FREQ

## Re: Earth mover's distance (EMD)

1. Please provide references/citations for the L1 density computations so that we can understand what you are trying to do.

2. If your data represent (size, density) pairs, it seems like you can estimate the centers of concentration (="peaks") in a conventional way. Why do you think you need to use an L1 distance for these data?

4 REPLIES 4
SAS Super FREQ

## Re: Earth mover's distance (EMD)

I'm not sure what metric you are using, but perhaps it is another name for the L1 or "city block" metric.

What sort of calculations are you trying to compute? PROC KDE is for density estimation. It uses a Gaussian kernel function that uses the squared Euclidean distance between two points to estimate the density. Although in theory, you could compute the density by using another metric, I haven't seen that done. It's not clear how you would select an optimal bandwidth, since the automated bandwidth selection algorithms in PROC KDE are based on the Gaussian kernel.

If you say more about the "nuances ...that are application dependent," perhaps we can say more.

Pyrite | Level 9

## Re: Earth mover's distance (EMD)

Hi Rick, thanks for the reply.

I'm preparing for the calculations and don't have all the details, yet, so may need to followup with again in a

few days.  Am just trying to find anything I can on the topic , in preparation.

My understanding is that the peaks to be compared are densities of "globules" in a solution

that have been sorted via an analytical method by size and density.

There will be multiple peaks to compare.   I expect a reference peak will be chosen

and then the other peaks will be compared to it.  I see where in PROC KDE that is possible.

I have not settled the Bandwidth question yet.  Thanks again !

SAS Super FREQ

## Re: Earth mover's distance (EMD)

1. Please provide references/citations for the L1 density computations so that we can understand what you are trying to do.

2. If your data represent (size, density) pairs, it seems like you can estimate the centers of concentration (="peaks") in a conventional way. Why do you think you need to use an L1 distance for these data?

Pyrite | Level 9

## Re: Earth mover's distance (EMD)

Rick,

Thank you.  I will get back to you with those answers, and in the interim will mark this as resolved, since it may take a day or two.

Best,

Robert

Discussion stats
• 4 replies
• 2632 views
• 0 likes
• 2 in conversation