## A simple distribution problem

Occasional Contributor
Posts: 18

# A simple distribution problem

I open my second challenge problem.

Yes, we can make the SAS proc; such as FREQ or TABULATE, to make the distribution.

For example, here is a table as attached.

Yes, it is easy to make the frequency distribution by the State, County, or other administration units in this table.

Then, by compare the number of observations, we can find the unit; such as State, with the highest density.

What if you want to find the 2 sq miles; any, with the highest density by using the table, how you get it?

Dow

Posts: 2,125

## Re: A simple distribution problem

I don't use SAS Communities to play games.  Please put something in your title to indicate that this is not an actual problem that you are trying to solve.  That way I can save time and not look at it.

Some people may be interested in brain teasers, bot others of us are not.  I'd appreciate you adding a pre-filter for them.

Doc Muhlbaier

Duke

Posts: 5,539

## Re: A simple distribution problem

I get decent results with the following :

/* transform the Long and Lat into cartesian coordinates centered
roughly on the median location. Use state_case as a unique case No. */

%let latRef=36;
%let lonRef=-87;

proc sql;
create table cases as
select yr,
catx("_",state,'case'n) as caseNo,
geodist(latitude,longitud,&latref,longitud,"M")*sign(latitude-(&latref)) as X_Mi,
geodist(latitude,longitud,latitude,&lonref,"M")*sign(longitud-(&lonref)) as Y_Mi
from Table
order by yr;
quit;

/* Cluster the locations within circles of 2 mi radius */
proc fastclus data=cases radius=2 strict out=fo outseed=fos maxiter=50 noprint;
by yr;
var X_Mi Y_Mi;
id caseNo;
run;

proc sql;
/* Extract the largest cluster for each year */
create table fdense as
select yr, CLUSTER, _FREQ_ from fos where _FREQ_>1 group by yr
having _FREQ_=max(_FREQ_);

/* Get the cases included in each of the largest clusters */
create table fdenseCases as
select fo.Yr, fDense.CLUSTER, fo.caseNo from fo inner join fdense
on fo.yr=fdense.yr and fo.CLUSTER=fdense.CLUSTER;

/* List the clusters */
select T.Yr, D.CLUSTER, T.State, T.case, T.LATITUDE, T.LONGITUD
from table as T inner join fdenseCases as D
on T.yr=D.yr and catx("_",T.state,T.case)=D.caseNo
order by T.yr, D.CLUSTER, T.state, T.case;
quit;

PG

PG
Occasional Contributor
Posts: 18

## Re: A simple distribution problem

Thanks for provide with codes, PG

I see you have not only skills but also attitude.

Your SAS codes are very good, yet I have two questions.

First, I want to 2 sq miles area. So, the radius is one miles.

Second, would you offer me 2 pieces information when you get a chance to run your codes?

• the number of observations in the highest density area that you found
• and their case number that has been provided in the table

Good Luck,

Dow

Super User
Posts: 10,784

## Re: A simple distribution problem

Or Do you check PROC KDE

Occasional Contributor
Posts: 18

## Re: A simple distribution problem

Ksharp,

The 2 pieces of information can show the problem is solved.

• the number of observations in the highest density area that you found
• and their case number that has been provided in the table

I am waiting.

Dow

Posts: 5,539

## Re: A simple distribution problem

Ksharp, I don't see how to find fixed dimension clusters with KDE. I'd like to learn if there's a way.

Dow, I didn't know exactly what you meant by 2 square miles. It could mean a 2x2mi square (area=4sqMi), a  circle of .7979mi radius (area=2sqMi),  a 1.4142x1.4142mi square (area=2sqMi), etc. The solution I proposed only works for circular areas. So, here is part of my answer for a circlular area of radius=1.0mi :

Yr CLUSTER _FREQ_

2001 59 2

2001 24 2

2001 65 2

2002 33 3

2003  4 3

2003 36 3

2004 49 3

2005 25 2

2005 21 2

2005 20 2

2005 13 2

2005  8 2

2005 97 2

2005 65 2

2005 34 2

2006 95 2

2006 82 2

2006 68 2

2006 64 2

2006  9 2

2007 16 4

2008 39 5

The CLUSTER numbers are arbitrary numbers. Most years contain many clusters of equal size (number of cases). There is no cluster in 2000. None of those clusters crosses a state boundary. As for the exact ID of the cases in each cluster, I will let you find out by yourself :smileysilly:!

PG

PG
Super User
Posts: 10,784

## Re: A simple distribution problem

PG,

Sorry, I didn't complete the OP's post. If OP want the frequency of a plot , your proc fastclus is a good idea.

Ksharp

Occasional Contributor
Posts: 18

## Re: A simple distribution problem

PG,

Highly appreciated!

The maximum observations in any one mile x one mile area may be not in one particular STATE.

It may be took place in an area among two or three STATES.

So, you do not use the STATE and YEAR in your codes.

Thanks,

Dow

Posts: 5,539

## Re: A simple distribution problem

If I combine years, as I had done states, and radius=1mi then the code simplifies to :

/* transform the Long and Lat into cartesian coordinates centered
roughly on the median location. Use yr_state_case as a unique case No. */

%let latRef=36;
%let lonRef=-87;

proc sql;
create table cases as
select
catx("_",yr,state,'case'n) as caseNo length=12,
geodist(latitude,longitud,&latref,longitud,"M")*sign(latitude-(&latref)) as X_Mi,
geodist(latitude,longitud,latitude,&lonref,"M")*sign(longitud-(&lonref)) as Y_Mi
from Table;
quit;

/* Cluster the locations within circles of 2 mi radius */
proc fastclus data=cases radius=1 strict out=fo outseed=fos maxiter=50 noprint;
var X_Mi Y_Mi;
id caseNo;
run;

proc sql;
/* Extract the largest cluster for each year */
create table fdense as
select CLUSTER, _FREQ_ from fos where _FREQ_>1
having _FREQ_=max(_FREQ_);

/* Get the cases included in each of the largest clusters */
create table fdenseCases as
select fDense.CLUSTER, fo.caseNo from fo inner join fdense
on fo.CLUSTER=fdense.CLUSTER;

/* List the cluster members */
select D.CLUSTER, T.yr, T.State, T.case, T.LATITUDE, T.LONGITUD
from table as T inner join fdenseCases as D
on catx("_",T.yr,T.state,T.case)=D.caseNo
order by D.CLUSTER, T.yr, T.state, T.case;
quit;

Cluster        Yr     State      Case  LATITUDE  LONGITUD

20      2001        12        47  28.18873  -82.7401

20      2001        12      1468  28.19213  -82.7402

20      2001        12      2052  28.18811  -82.7401

20      2001        12      2345  28.19519    -82.74

20      2001        12      2412  28.18806  -82.7401

20      2002        12       104  28.19811  -82.7399

20      2003        12       336  28.20081  -82.7399

20      2004        12       979  28.18789  -82.7402

20      2005        12       202  28.20157  -82.7399

20      2005        12       356  28.20545  -82.7398

20      2005        12      2553  28.20546  -82.7398

20      2005        12      2851  28.20809  -82.7395

20      2006        12      3072  28.20179  -82.7399

20      2007        12       848  28.18428    -82.74

PG

PG
Occasional Contributor
Posts: 18

## Re: A simple distribution problem

PG,

Great efforts, but the output is incorrect.

In my table, each case is a pedestrian crash fatality on the roadway that took place between 1999 and 2008.

It is impossible more than 100 pedestrians killed in any one mile x one mile area.

Try again but don't use the "cluster",

Good Luck,

Dow

Posts: 5,539

## Re: A simple distribution problem

The table above lists the 14 cases that occured over the years in a circle of one mile radius. There is only one such cluster (arbitrary number 20) with 14 cases, all of which occured in state number 12. You should be able to trace the cases back to your table.

PG

PG
Occasional Contributor
Posts: 18

## Re: A simple distribution problem

PG,

There are  6 cases have to include in your output for the miles from them to case 47 is less than one.

 Case Yr State LATITUDE LONGITUD Miles to Case 47 166 2001 12 28.17922778 -82.74007778 0.66 503 2002 12 28.17723889 -82.74036667 0.79 739 2008 12 28.17531667 -82.74039722 0.93 1098 2008 12 28.18067778 -82.74034444 0.56 1490 2003 12 28.17856389 -82.74029722 0.7 2489 2004 12 28.18131389 -82.74025278 0.51

I would like to confirm one thing.

Since you use one mile radius, so it is a 2 miles x 2 miles area. Am I right?

In your output, the miles from case 739 to case 356 are 2.1, to case 2553 are 2.1, and to case 2851 are 2.3.

So, these three cases are not in a 2 miles x 2 miles area.

You can verify my data if you like to.

The highest density for 2 miles x 2 miles area is in State 36, not in State 12.

Dow

Super User
Posts: 10,784

## Re: A simple distribution problem

I will take it. I found the max count of it is 11 , but for different zero(id).

NOTE: Mine is also a circle .

```proc import datafile='c:\Table 071612.xls' out=have dbms=excel replace;getnames=yes;run;

data have;set have; id+1;run;

proc sql;
create table distance(where=( distance le 1)) as
select a.id,
geodist(a.latitude,a.longitud,b.latitude,b.longitud ,"M") as distance
from have as a,have as b
where a.yr=b.yr  ;
quit;

proc sql;
create table want as
select id,count(*) as freq
from distance
group by id
order by freq desc;
quit;

```

Ksharp

Occasional Contributor
Posts: 18

## Re: A simple distribution problem

Thank you, Ksharp

You are expert.

But, the maximum is not 11 in a 2 miles x 2 miles area.

The highest density for 2 miles x 2 miles area is in State 36, not in State 12.

Dow

Discussion stats
• 31 replies
• 678 views
• 0 likes
• 6 in conversation