A simple distribution problem

DowChang · Posted 07-15-2012 09:01 AM

I open my second challenge problem.

Yes, we can make the SAS proc; such as FREQ or TABULATE, to make the distribution.

For example, here is a table as attached.

Yes, it is easy to make the frequency distribution by the State, County, or other administration units in this table.

Then, by compare the number of observations, we can find the unit; such as State, with the highest density.

What if you want to find the 2 sq miles; any, with the highest density by using the table, how you get it?

Thanks for your interests,

Dow

Doc_Duke · Posted 07-15-2012 03:21 PM

I don't use SAS Communities to play games. Please put something in your title to indicate that this is not an actual problem that you are trying to solve. That way I can save time and not look at it.

Some people may be interested in brain teasers, bot others of us are not. I'd appreciate you adding a pre-filter for them.

Doc Muhlbaier

Duke

PGStats · Posted 07-15-2012 03:57 PM

I get decent results with the following :

/* transform the Long and Lat into cartesian coordinates centered
roughly on the median location. Use state_case as a unique case No. */

%let latRef=36;
%let lonRef=-87;

proc sql;
create table cases as
select yr,
     catx("_",state,'case'n) as caseNo,
     geodist(latitude,longitud,&latref,longitud,"M")*sign(latitude-(&latref)) as X_Mi,
     geodist(latitude,longitud,latitude,&lonref,"M")*sign(longitud-(&lonref)) as Y_Mi
from Table
order by yr;
quit;

/* Cluster the locations within circles of 2 mi radius */
proc fastclus data=cases radius=2 strict out=fo outseed=fos maxiter=50 noprint;
by yr;
var X_Mi Y_Mi;
id caseNo;
run;

proc sql;
/* Extract the largest cluster for each year */
create table fdense as
select yr, CLUSTER, _FREQ_ from fos where _FREQ_>1 group by yr
having _FREQ_=max(_FREQ_);

/* Get the cases included in each of the largest clusters */
create table fdenseCases as
select fo.Yr, fDense.CLUSTER, fo.caseNo from fo inner join fdense
on fo.yr=fdense.yr and fo.CLUSTER=fdense.CLUSTER;

/* List the clusters */
select T.Yr, D.CLUSTER, T.State, T.case, T.LATITUDE, T.LONGITUD
from table as T inner join fdenseCases as D
on T.yr=D.yr and catx("_",T.state,T.case)=D.caseNo
order by T.yr, D.CLUSTER, T.state, T.case;
quit;

PG

DowChang · Posted 07-15-2012 07:53 PM

Thanks for provide with codes, PG

I see you have not only skills but also attitude.

Your SAS codes are very good, yet I have two questions.

First, I want to 2 sq miles area. So, the radius is one miles.

I see you use 2 miles radius in your codes.

Second, would you offer me 2 pieces information when you get a chance to run your codes?

the number of observations in the highest density area that you found
and their case number that has been provided in the table

Good Luck,

Dow

Ksharp · Posted 07-15-2012 10:45 PM

Or Do you check PROC KDE

DowChang · Posted 07-16-2012 08:19 PM

Ksharp,

Thanks for your comments.

The 2 pieces of information can show the problem is solved.

the number of observations in the highest density area that you found
and their case number that has been provided in the table

I am waiting.

Dow

PGStats · Posted 07-16-2012 08:45 PM

Ksharp, I don't see how to find fixed dimension clusters with KDE. I'd like to learn if there's a way.

Dow, I didn't know exactly what you meant by 2 square miles. It could mean a 2x2mi square (area=4sqMi), a circle of .7979mi radius (area=2sqMi), a 1.4142x1.4142mi square (area=2sqMi), etc. The solution I proposed only works for circular areas. So, here is part of my answer for a circlular area of radius=1.0mi :

Yr CLUSTER _FREQ_

2001 59 2

2001 24 2

2001 65 2

2002 33 3

2003 4 3

2003 36 3

2004 49 3

2005 25 2

2005 21 2

2005 20 2

2005 13 2

2005 8 2

2005 97 2

2005 65 2

2005 34 2

2006 95 2

2006 82 2

2006 68 2

2006 64 2

2006 9 2

2007 16 4

2008 39 5

The CLUSTER numbers are arbitrary numbers. Most years contain many clusters of equal size (number of cases). There is no cluster in 2000. None of those clusters crosses a state boundary. As for the exact ID of the cases in each cluster, I will let you find out by yourself :smileysilly:!

PG

Ksharp · Posted 07-16-2012 10:18 PM

PG,

Sorry, I didn't complete the OP's post. If OP want the frequency of a plot , your proc fastclus is a good idea.

Ksharp

DowChang · Posted 07-17-2012 09:30 AM

PG,

Highly appreciated!

The maximum observations in any one mile x one mile area may be not in one particular STATE.

It may be took place in an area among two or three STATES.

So, you do not use the STATE and YEAR in your codes.

Thanks,

Dow

PGStats · Posted 07-17-2012 01:57 PM

If I combine years, as I had done states, and radius=1mi then the code simplifies to :

/* transform the Long and Lat into cartesian coordinates centered
roughly on the median location. Use yr_state_case as a unique case No. */

%let latRef=36;
%let lonRef=-87;

proc sql;
create table cases as
select
     catx("_",yr,state,'case'n) as caseNo length=12,
     geodist(latitude,longitud,&latref,longitud,"M")*sign(latitude-(&latref)) as X_Mi,
     geodist(latitude,longitud,latitude,&lonref,"M")*sign(longitud-(&lonref)) as Y_Mi
from Table;
quit;

/* Cluster the locations within circles of 2 mi radius */
proc fastclus data=cases radius=1 strict out=fo outseed=fos maxiter=50 noprint;
var X_Mi Y_Mi;
id caseNo;
run;

proc sql;
/* Extract the largest cluster for each year */
create table fdense as
select CLUSTER, _FREQ_ from fos where _FREQ_>1
having _FREQ_=max(_FREQ_);

/* Get the cases included in each of the largest clusters */
create table fdenseCases as
select fDense.CLUSTER, fo.caseNo from fo inner join fdense
on fo.CLUSTER=fdense.CLUSTER;

/* List the cluster members */
select D.CLUSTER, T.yr, T.State, T.case, T.LATITUDE, T.LONGITUD
from table as T inner join fdenseCases as D
on catx("_",T.yr,T.state,T.case)=D.caseNo
order by D.CLUSTER, T.yr, T.state, T.case;
quit;

and my answer is

Cluster Yr State Case LATITUDE LONGITUD

20 2001 12 47 28.18873 -82.7401

20 2001 12 1468 28.19213 -82.7402

20 2001 12 2052 28.18811 -82.7401

20 2001 12 2345 28.19519 -82.74

20 2001 12 2412 28.18806 -82.7401

20 2002 12 104 28.19811 -82.7399

20 2003 12 336 28.20081 -82.7399

20 2004 12 979 28.18789 -82.7402

20 2005 12 202 28.20157 -82.7399

20 2005 12 356 28.20545 -82.7398

20 2005 12 2553 28.20546 -82.7398

20 2005 12 2851 28.20809 -82.7395

20 2006 12 3072 28.20179 -82.7399

20 2007 12 848 28.18428 -82.74

PG

DowChang · Posted 07-17-2012 09:07 PM

PG,

Great efforts, but the output is incorrect.

In my table, each case is a pedestrian crash fatality on the roadway that took place between 1999 and 2008.

It is impossible more than 100 pedestrians killed in any one mile x one mile area.

Try again but don't use the "cluster",

Good Luck,

Dow

PGStats · Posted 07-17-2012 09:33 PM

The table above lists the 14 cases that occured over the years in a circle of one mile radius. There is only one such cluster (arbitrary number 20) with 14 cases, all of which occured in state number 12. You should be able to trace the cases back to your table.

PG

DowChang · Posted 07-18-2012 12:29 AM

PG,

Thanks for your answer,

There are 6 cases have to include in your output for the miles from them to case 47 is less than one.

Case	Yr	State	LATITUDE	LONGITUD	Miles to Case 47
166	2001	12	28.17922778	-82.74007778	0.66
503	2002	12	28.17723889	-82.74036667	0.79
739	2008	12	28.17531667	-82.74039722	0.93
1098	2008	12	28.18067778	-82.74034444	0.56
1490	2003	12	28.17856389	-82.74029722	0.7
2489	2004	12	28.18131389	-82.74025278	0.51

I would like to confirm one thing.

Since you use one mile radius, so it is a 2 miles x 2 miles area. Am I right?

In your output, the miles from case 739 to case 356 are 2.1, to case 2553 are 2.1, and to case 2851 are 2.3.

So, these three cases are not in a 2 miles x 2 miles area.

You can verify my data if you like to.

The highest density for 2 miles x 2 miles area is in State 36, not in State 12.

Dow

Ksharp · Posted 07-17-2012 11:33 PM

I will take it. I found the max count of it is 11 , but for different zero(id).

NOTE: Mine is also a circle .

proc import datafile='c:\Table 071612.xls' out=have dbms=excel replace;getnames=yes;run;

data have;set have; id+1;run;

proc sql;
create table distance(where=( distance le 1)) as
select a.id,
     geodist(a.latitude,a.longitud,b.latitude,b.longitud ,"M") as distance
from have as a,have as b
where a.yr=b.yr  ;
quit;

proc sql;
create table want as
 select id,count(*) as freq
  from distance
   group by id
    order by freq desc;
quit;

Ksharp

DowChang · Posted 07-18-2012 12:39 AM

Thank you, Ksharp

You are expert.

But, the maximum is not 11 in a 2 miles x 2 miles area.

The highest density for 2 miles x 2 miles area is in State 36, not in State 12.

Dow

A simple distribution problem

Re: A simple distribution problem

Re: A simple distribution problem

Re: A simple distribution problem

Re: A simple distribution problem

Re: A simple distribution problem

Re: A simple distribution problem

Re: A simple distribution problem

Re: A simple distribution problem

Re: A simple distribution problem

Re: A simple distribution problem

Re: A simple distribution problem

Re: A simple distribution problem

Re: A simple distribution problem

Re: A simple distribution problem

SAS Innovate 2026 Registration is Open

SAS Training: Just a Click Away