BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
uwabe3
Calcite | Level 5

I am no sure how to describe this, but I have a data set of home sales including foreclosures. I want to count the number of foreclosures near each observation based on distance and time as well as if the foreclosure meets one other, dummy criteria.

 

Below is the data.

 

h is a dummy variable for quality

f is a dummy variable for foreclosure sale

X and Y are lat/long.

t is the month the foreclosure sale

 

Count all the other observations that meet these critera:

Within  distance: say 1 mile (I know how to use GEODIST to find this)

Within time: say 30 days so abs(t_subject - t_obs) < 30)

Where f=1

 

I also want to split this count between obs where h=0 and h=1

 

So for the first datapoint. I want to find all other datapoints that are within 1 mile of 

-113.219

37.18241

 

And within 30 days of 20239 or 20209 to 20269

 

Where f=1.

 

I want a count for when h=0 and h=1.

 

Here's the data:

hfxyt
01-113.21937.1824120239
10-113.75137.7372820404
10-113.12437.6956820428
01-113.86237.7072920194
10-113.37337.1416920200
10-113.48537.5213420724
10-114.05136.8975320395
10-113.66637.4560120410
10-113.42136.8634820511
11-113.5937.0861920652
11-113.90437.4006320098
00-113.99737.1375320050
00-113.45837.2751920590
00-113.45737.4581520403
00-113.73237.1478420252
01-113.67337.0396120818
11-113.86137.7537620859
00-113.79336.8502120730
10-113.48837.459320451
10-113.41537.2676920074
1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

My 2 cents. Should be reasonably fast as well.

 

data have;
input h f x y t;
datalines;
0 1 -113.219 37.18241 20239
1 0 -113.751 37.73728 20404
1 0 -113.124 37.69568 20428
0 1 -113.862 37.70729 20194
1 0 -113.373 37.14169 20200
1 0 -113.485 37.52134 20724
1 0 -114.051 36.89753 20395
1 0 -113.666 37.45601 20410
1 0 -113.421 36.86348 20511
1 1 -113.59  37.08619 20652
1 1 -113.904 37.40063 20098
0 0 -113.997 37.13753 20050
0 0 -113.458 37.27519 20590
0 0 -113.457 37.45815 20403
0 0 -113.732 37.14784 20252
0 1 -113.673 37.03961 20818
1 1 -113.861 37.75376 20859
0 0 -113.793 36.85021 20730
1 0 -113.488 37.4593  20451
1 0 -113.415 37.26769 20074
;

data want(drop = xx yy tt);
   if _N_ = 1 then do;
      dcl hash hh(dataset : "have(rename = (x = xx y = yy t = tt) where = (f = 1))");
      hh.definekey("h");
      hh.definedata("xx", "yy", "tt");
      hh.definedone();
      dcl hiter i("hh");
   end;

   set have;
   xx = .; yy = .; tt = .; c = 0;

   do while (i.next() = 0);
      if 0 < geodist(y, x, yy, xx, 'm') < 1 & abs(t - tt) < 30 then c + 1;
   end;

run;

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

If that's the entire data set, 20-30 observations, I would use a Cartesian join to get all possible combinations in a data set and then do the 1 miles and 30 days comparison on this data set. I would also add some sort of ID variable to each row, it could be as simple as sequence number.

 

UNTESTED CODE

proc sql;
     create table cartesian as select a.id,a.h,a.f,a.x,a.y,a.t,
         b.id as id2,b.h as h2,b.f as f2,b.x as x2,b.y as y2,b.t as t2
      from have as a,have as b
      order by a.id,b.id;
quit;

Now if you really have thousands of records, this probably won't work because maybe your computer can't handle this large problem, or is going to take a long time.

--
Paige Miller
uwabe3
Calcite | Level 5

Yes, I actually have 250k observations. I have used the Cartesian join and other methods, but they all consume days worth of resources. I am trying to find an easier solution. I was wondering if there was a solution with do loops.

PaigeMiller
Diamond | Level 26

Well, there are methods that might work when you have 250k observations, such as a double-do loop, and hash objects, but I will leave that type of programming up to others.

--
Paige Miller
PeterClemmensen
Tourmaline | Level 20

My 2 cents. Should be reasonably fast as well.

 

data have;
input h f x y t;
datalines;
0 1 -113.219 37.18241 20239
1 0 -113.751 37.73728 20404
1 0 -113.124 37.69568 20428
0 1 -113.862 37.70729 20194
1 0 -113.373 37.14169 20200
1 0 -113.485 37.52134 20724
1 0 -114.051 36.89753 20395
1 0 -113.666 37.45601 20410
1 0 -113.421 36.86348 20511
1 1 -113.59  37.08619 20652
1 1 -113.904 37.40063 20098
0 0 -113.997 37.13753 20050
0 0 -113.458 37.27519 20590
0 0 -113.457 37.45815 20403
0 0 -113.732 37.14784 20252
0 1 -113.673 37.03961 20818
1 1 -113.861 37.75376 20859
0 0 -113.793 36.85021 20730
1 0 -113.488 37.4593  20451
1 0 -113.415 37.26769 20074
;

data want(drop = xx yy tt);
   if _N_ = 1 then do;
      dcl hash hh(dataset : "have(rename = (x = xx y = yy t = tt) where = (f = 1))");
      hh.definekey("h");
      hh.definedata("xx", "yy", "tt");
      hh.definedone();
      dcl hiter i("hh");
   end;

   set have;
   xx = .; yy = .; tt = .; c = 0;

   do while (i.next() = 0);
      if 0 < geodist(y, x, yy, xx, 'm') < 1 & abs(t - tt) < 30 then c + 1;
   end;

run;

Catch up on SAS Innovate 2026

Dive into keynotes, announcements and breakthroughs on demand.

Explore Now →
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 2492 views
  • 0 likes
  • 3 in conversation