SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Etoo12121
Obsidian | Level 7

Hi all,

 

I’m working on a bit of a complicated project. I have school information data that shows the total number of math students in a given year for multiple schools. What I want to do is look at each school for each year and determine the number of math students within a 50 mile and 100 mile radius of the school. I can use the geocode function to attain the longitude and latitude of each school. Geodist will also work to add in the distance. Geodist is limited though because it only works using 1 starting point and end point at a time. I’m not sure how to go about this as I have multiple starting points and so many possible end points.

 

So ideally, in order to do this, I would have to calculate the distance from school A to school B&C. Then do the same for school B to all the other schools. Then school C to every school. Once I've gone through every possible combination, I can create a field that keeps only the schools with distances that are less than or equal to 50 miles. Once I have the schools that are within 50 miles of school A, I can then sum all their students to get the number of math students within 50 miles. This seems tedious to do for each school since I have so many and would have to redo it for the 100 mile radius.

 

Is there a better approach?

 

 Attached is sample data and below is a sample of my desired output.

 

Etoo12121_1-1622653374872.png

 

 

 

Any help is highly appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
Ok so you're really looking for the distance between each set of schools first? Have any of the school locations changed over time? If not, then first make a distinct list of the locations and then do a cross join in SQL to get your distances. Something like this:

proc sql;
create table distance_between_schools as
select t1.schoolname as SchoolStart, t2.schoolName as SchoolEnd, geodist(t1.lat, t2.lat, t1.long, t2.long) as distance
from school_list as t1
cross join school_list as t2
where t1.school_name > t2.school_name
order by 1,2,3;
quit;

View solution in original post

5 REPLIES 5
Reeza
Super User
Where's the student locations? How do you know that the school has x students within 50miles?
Etoo12121
Obsidian | Level 7
I don't. That's what i'm trying to figure out. I have dummy data in the output just to show what I want. I have sample data attached in test2 and my desired outcome in the screen grab. In order for me to address the students, I first need to determine which schools are within the specified radius of each school. That's where I'm having issues
Reeza
Super User
Ok so you're really looking for the distance between each set of schools first? Have any of the school locations changed over time? If not, then first make a distinct list of the locations and then do a cross join in SQL to get your distances. Something like this:

proc sql;
create table distance_between_schools as
select t1.schoolname as SchoolStart, t2.schoolName as SchoolEnd, geodist(t1.lat, t2.lat, t1.long, t2.long) as distance
from school_list as t1
cross join school_list as t2
where t1.school_name > t2.school_name
order by 1,2,3;
quit;
ballardw
Super User

Proc GINSIDE to the rescue, if you have access to it.

Description from the documentation:

The GINSIDE procedure compares a data set of X and Y coordinates to a map data set containing map polygons. The procedure determines whether the X and Y points fall inside or outside of the map polygons.

So the data set of X and Y coordinates would be the location of the students (or of the contributing school, it isn't very clear from your description)

The map data set would be a set of coordinates around the school of interest. Some trig and data step to cycle through a do loop calculating the boundaries of a circle of the desired radius should work. Two circles would be two different boundaries with a different Identifier.

 

However, if you would have multiple schools that would have the radius overlap you may have some additional work.

I suspect this might work best with a separate data set for each school / year combination for the "center" school.

 

I don't open XLSX files from unknown sources. So won't go into any code.

 

 

Reeza
Super User
I don't think OP has map polygons though - a GIS tool can easily do this in a few minutes though - QGIS would allow you to select all schools within 50/100 miles and create that file for you easily.
If you have SAS VA and ArcGIS option it may also do some of this.
QGIST
https://www.qgistutorials.com/en/docs/nearest_neighbor_analysis.html

The step you're on now would be referred to as "nearest neighbor analysis" IMO.

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2245 views
  • 0 likes
  • 3 in conversation