BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rmacarthur
Pyrite | Level 9

Has anyone used PROC KDE, or any other procedure to perform "Earth Mover's Distance" calculations?

There is a "Do Loop" blog on the topic from 2013, and I cannot find anything else on the topic since that time.

While the blog is helpful and the procedure relatively straight forward, there are nuances used that are application dependent, and I'm hoping to find others that have performed EMD calculations.

Thank you  

 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

1. Please provide references/citations for the L1 density computations so that we can understand what you are trying to do.

2. If your data represent (size, density) pairs, it seems like you can estimate the centers of concentration (="peaks") in a conventional way. Why do you think you need to use an L1 distance for these data?

 

View solution in original post

4 REPLIES 4
Rick_SAS
SAS Super FREQ

I'm not sure what metric you are using, but perhaps it is another name for the L1 or "city block" metric.

 

What sort of calculations are you trying to compute? PROC KDE is for density estimation. It uses a Gaussian kernel function that uses the squared Euclidean distance between two points to estimate the density. Although in theory, you could compute the density by using another metric, I haven't seen that done. It's not clear how you would select an optimal bandwidth, since the automated bandwidth selection algorithms in PROC KDE are based on the Gaussian kernel.

 

If you say more about the "nuances ...that are application dependent," perhaps we can say more.

rmacarthur
Pyrite | Level 9

Hi Rick, thanks for the reply.

I'm preparing for the calculations and don't have all the details, yet, so may need to followup with again in a

few days.  Am just trying to find anything I can on the topic , in preparation.  

My understanding is that the peaks to be compared are densities of "globules" in a solution

that have been sorted via an analytical method by size and density.

There will be multiple peaks to compare.   I expect a reference peak will be chosen

and then the other peaks will be compared to it.  I see where in PROC KDE that is possible.

I have not settled the Bandwidth question yet.  Thanks again !  

Rick_SAS
SAS Super FREQ

1. Please provide references/citations for the L1 density computations so that we can understand what you are trying to do.

2. If your data represent (size, density) pairs, it seems like you can estimate the centers of concentration (="peaks") in a conventional way. Why do you think you need to use an L1 distance for these data?

 

rmacarthur
Pyrite | Level 9

Rick, 

Thank you.  I will get back to you with those answers, and in the interim will mark this as resolved, since it may take a day or two.

Best, 

Robert

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2246 views
  • 0 likes
  • 2 in conversation