I open my second challenge problem.
Yes, we can make the SAS proc; such as FREQ or TABULATE, to make the distribution.
For example, here is a table as attached.
Yes, it is easy to make the frequency distribution by the State, County, or other administration units in this table.
Then, by compare the number of observations, we can find the unit; such as State, with the highest density.
What if you want to find the 2 sq miles; any, with the highest density by using the table, how you get it?
Thanks for your interests,
Dow
I don't use SAS Communities to play games. Please put something in your title to indicate that this is not an actual problem that you are trying to solve. That way I can save time and not look at it.
Some people may be interested in brain teasers, bot others of us are not. I'd appreciate you adding a pre-filter for them.
Doc Muhlbaier
Duke
I get decent results with the following :
/* transform the Long and Lat into cartesian coordinates centered
roughly on the median location. Use state_case as a unique case No. */
%let latRef=36;
%let lonRef=-87;
proc sql;
create table cases as
select yr,
catx("_",state,'case'n) as caseNo,
geodist(latitude,longitud,&latref,longitud,"M")*sign(latitude-(&latref)) as X_Mi,
geodist(latitude,longitud,latitude,&lonref,"M")*sign(longitud-(&lonref)) as Y_Mi
from Table
order by yr;
quit;
/* Cluster the locations within circles of 2 mi radius */
proc fastclus data=cases radius=2 strict out=fo outseed=fos maxiter=50 noprint;
by yr;
var X_Mi Y_Mi;
id caseNo;
run;
proc sql;
/* Extract the largest cluster for each year */
create table fdense as
select yr, CLUSTER, _FREQ_ from fos where _FREQ_>1 group by yr
having _FREQ_=max(_FREQ_);
/* Get the cases included in each of the largest clusters */
create table fdenseCases as
select fo.Yr, fDense.CLUSTER, fo.caseNo from fo inner join fdense
on fo.yr=fdense.yr and fo.CLUSTER=fdense.CLUSTER;
/* List the clusters */
select T.Yr, D.CLUSTER, T.State, T.case, T.LATITUDE, T.LONGITUD
from table as T inner join fdenseCases as D
on T.yr=D.yr and catx("_",T.state,T.case)=D.caseNo
order by T.yr, D.CLUSTER, T.state, T.case;
quit;
PG
Thanks for provide with codes, PG
I see you have not only skills but also attitude.
Your SAS codes are very good, yet I have two questions.
First, I want to 2 sq miles area. So, the radius is one miles.
I see you use 2 miles radius in your codes.
Second, would you offer me 2 pieces information when you get a chance to run your codes?
Good Luck,
Dow
Or Do you check PROC KDE
Ksharp,
Thanks for your comments.
The 2 pieces of information can show the problem is solved.
I am waiting.
Dow
Ksharp, I don't see how to find fixed dimension clusters with KDE. I'd like to learn if there's a way.
Dow, I didn't know exactly what you meant by 2 square miles. It could mean a 2x2mi square (area=4sqMi), a circle of .7979mi radius (area=2sqMi), a 1.4142x1.4142mi square (area=2sqMi), etc. The solution I proposed only works for circular areas. So, here is part of my answer for a circlular area of radius=1.0mi :
Yr CLUSTER _FREQ_
2001 59 2
2001 24 2
2001 65 2
2002 33 3
2003 4 3
2003 36 3
2004 49 3
2005 25 2
2005 21 2
2005 20 2
2005 13 2
2005 8 2
2005 97 2
2005 65 2
2005 34 2
2006 95 2
2006 82 2
2006 68 2
2006 64 2
2006 9 2
2007 16 4
2008 39 5
The CLUSTER numbers are arbitrary numbers. Most years contain many clusters of equal size (number of cases). There is no cluster in 2000. None of those clusters crosses a state boundary. As for the exact ID of the cases in each cluster, I will let you find out by yourself :smileysilly:!
PG
PG,
Sorry, I didn't complete the OP's post. If OP want the frequency of a plot , your proc fastclus is a good idea.
Ksharp
PG,
Highly appreciated!
The maximum observations in any one mile x one mile area may be not in one particular STATE.
It may be took place in an area among two or three STATES.
So, you do not use the STATE and YEAR in your codes.
Thanks,
Dow
If I combine years, as I had done states, and radius=1mi then the code simplifies to :
/* transform the Long and Lat into cartesian coordinates centered
roughly on the median location. Use yr_state_case as a unique case No. */
%let latRef=36;
%let lonRef=-87;
proc sql;
create table cases as
select
catx("_",yr,state,'case'n) as caseNo length=12,
geodist(latitude,longitud,&latref,longitud,"M")*sign(latitude-(&latref)) as X_Mi,
geodist(latitude,longitud,latitude,&lonref,"M")*sign(longitud-(&lonref)) as Y_Mi
from Table;
quit;
/* Cluster the locations within circles of 2 mi radius */
proc fastclus data=cases radius=1 strict out=fo outseed=fos maxiter=50 noprint;
var X_Mi Y_Mi;
id caseNo;
run;
proc sql;
/* Extract the largest cluster for each year */
create table fdense as
select CLUSTER, _FREQ_ from fos where _FREQ_>1
having _FREQ_=max(_FREQ_);
/* Get the cases included in each of the largest clusters */
create table fdenseCases as
select fDense.CLUSTER, fo.caseNo from fo inner join fdense
on fo.CLUSTER=fdense.CLUSTER;
/* List the cluster members */
select D.CLUSTER, T.yr, T.State, T.case, T.LATITUDE, T.LONGITUD
from table as T inner join fdenseCases as D
on catx("_",T.yr,T.state,T.case)=D.caseNo
order by D.CLUSTER, T.yr, T.state, T.case;
quit;
and my answer is
Cluster Yr State Case LATITUDE LONGITUD
20 2001 12 47 28.18873 -82.7401
20 2001 12 1468 28.19213 -82.7402
20 2001 12 2052 28.18811 -82.7401
20 2001 12 2345 28.19519 -82.74
20 2001 12 2412 28.18806 -82.7401
20 2002 12 104 28.19811 -82.7399
20 2003 12 336 28.20081 -82.7399
20 2004 12 979 28.18789 -82.7402
20 2005 12 202 28.20157 -82.7399
20 2005 12 356 28.20545 -82.7398
20 2005 12 2553 28.20546 -82.7398
20 2005 12 2851 28.20809 -82.7395
20 2006 12 3072 28.20179 -82.7399
20 2007 12 848 28.18428 -82.74
PG
PG,
Great efforts, but the output is incorrect.
In my table, each case is a pedestrian crash fatality on the roadway that took place between 1999 and 2008.
It is impossible more than 100 pedestrians killed in any one mile x one mile area.
Try again but don't use the "cluster",
Good Luck,
Dow
The table above lists the 14 cases that occured over the years in a circle of one mile radius. There is only one such cluster (arbitrary number 20) with 14 cases, all of which occured in state number 12. You should be able to trace the cases back to your table.
PG
PG,
Thanks for your answer,
There are 6 cases have to include in your output for the miles from them to case 47 is less than one.
Case | Yr | State | LATITUDE | LONGITUD | Miles to Case 47 |
166 | 2001 | 12 | 28.17922778 | -82.74007778 | 0.66 |
503 | 2002 | 12 | 28.17723889 | -82.74036667 | 0.79 |
739 | 2008 | 12 | 28.17531667 | -82.74039722 | 0.93 |
1098 | 2008 | 12 | 28.18067778 | -82.74034444 | 0.56 |
1490 | 2003 | 12 | 28.17856389 | -82.74029722 | 0.7 |
2489 | 2004 | 12 | 28.18131389 | -82.74025278 | 0.51 |
I would like to confirm one thing.
Since you use one mile radius, so it is a 2 miles x 2 miles area. Am I right?
In your output, the miles from case 739 to case 356 are 2.1, to case 2553 are 2.1, and to case 2851 are 2.3.
So, these three cases are not in a 2 miles x 2 miles area.
You can verify my data if you like to.
The highest density for 2 miles x 2 miles area is in State 36, not in State 12.
Dow
I will take it. I found the max count of it is 11 , but for different zero(id).
NOTE: Mine is also a circle .
proc import datafile='c:\Table 071612.xls' out=have dbms=excel replace;getnames=yes;run; data have;set have; id+1;run; proc sql; create table distance(where=( distance le 1)) as select a.id, geodist(a.latitude,a.longitud,b.latitude,b.longitud ,"M") as distance from have as a,have as b where a.yr=b.yr ; quit; proc sql; create table want as select id,count(*) as freq from distance group by id order by freq desc; quit;
Ksharp
Thank you, Ksharp
You are expert.
But, the maximum is not 11 in a 2 miles x 2 miles area.
The highest density for 2 miles x 2 miles area is in State 36, not in State 12.
Dow
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.