BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
uwabe3
Calcite | Level 5

I am no sure how to describe this, but I have a data set of home sales including foreclosures. I want to count the number of foreclosures near each observation based on distance and time as well as if the foreclosure meets one other, dummy criteria.

 

Below is the data.

 

h is a dummy variable for quality

f is a dummy variable for foreclosure sale

X and Y are lat/long.

t is the month the foreclosure sale

 

Count all the other observations that meet these critera:

Within  distance: say 1 mile (I know how to use GEODIST to find this)

Within time: say 30 days so abs(t_subject - t_obs) < 30)

Where f=1

 

I also want to split this count between obs where h=0 and h=1

 

So for the first datapoint. I want to find all other datapoints that are within 1 mile of 

-113.219

37.18241

 

And within 30 days of 20239 or 20209 to 20269

 

Where f=1.

 

I want a count for when h=0 and h=1.

 

Here's the data:

hfxyt
01-113.21937.1824120239
10-113.75137.7372820404
10-113.12437.6956820428
01-113.86237.7072920194
10-113.37337.1416920200
10-113.48537.5213420724
10-114.05136.8975320395
10-113.66637.4560120410
10-113.42136.8634820511
11-113.5937.0861920652
11-113.90437.4006320098
00-113.99737.1375320050
00-113.45837.2751920590
00-113.45737.4581520403
00-113.73237.1478420252
01-113.67337.0396120818
11-113.86137.7537620859
00-113.79336.8502120730
10-113.48837.459320451
10-113.41537.2676920074
1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

My 2 cents. Should be reasonably fast as well.

 

data have;
input h f x y t;
datalines;
0 1 -113.219 37.18241 20239
1 0 -113.751 37.73728 20404
1 0 -113.124 37.69568 20428
0 1 -113.862 37.70729 20194
1 0 -113.373 37.14169 20200
1 0 -113.485 37.52134 20724
1 0 -114.051 36.89753 20395
1 0 -113.666 37.45601 20410
1 0 -113.421 36.86348 20511
1 1 -113.59  37.08619 20652
1 1 -113.904 37.40063 20098
0 0 -113.997 37.13753 20050
0 0 -113.458 37.27519 20590
0 0 -113.457 37.45815 20403
0 0 -113.732 37.14784 20252
0 1 -113.673 37.03961 20818
1 1 -113.861 37.75376 20859
0 0 -113.793 36.85021 20730
1 0 -113.488 37.4593  20451
1 0 -113.415 37.26769 20074
;

data want(drop = xx yy tt);
   if _N_ = 1 then do;
      dcl hash hh(dataset : "have(rename = (x = xx y = yy t = tt) where = (f = 1))");
      hh.definekey("h");
      hh.definedata("xx", "yy", "tt");
      hh.definedone();
      dcl hiter i("hh");
   end;

   set have;
   xx = .; yy = .; tt = .; c = 0;

   do while (i.next() = 0);
      if 0 < geodist(y, x, yy, xx, 'm') < 1 & abs(t - tt) < 30 then c + 1;
   end;

run;

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

If that's the entire data set, 20-30 observations, I would use a Cartesian join to get all possible combinations in a data set and then do the 1 miles and 30 days comparison on this data set. I would also add some sort of ID variable to each row, it could be as simple as sequence number.

 

UNTESTED CODE

proc sql;
     create table cartesian as select a.id,a.h,a.f,a.x,a.y,a.t,
         b.id as id2,b.h as h2,b.f as f2,b.x as x2,b.y as y2,b.t as t2
      from have as a,have as b
      order by a.id,b.id;
quit;

Now if you really have thousands of records, this probably won't work because maybe your computer can't handle this large problem, or is going to take a long time.

--
Paige Miller
uwabe3
Calcite | Level 5

Yes, I actually have 250k observations. I have used the Cartesian join and other methods, but they all consume days worth of resources. I am trying to find an easier solution. I was wondering if there was a solution with do loops.

PaigeMiller
Diamond | Level 26

Well, there are methods that might work when you have 250k observations, such as a double-do loop, and hash objects, but I will leave that type of programming up to others.

--
Paige Miller
PeterClemmensen
Tourmaline | Level 20

My 2 cents. Should be reasonably fast as well.

 

data have;
input h f x y t;
datalines;
0 1 -113.219 37.18241 20239
1 0 -113.751 37.73728 20404
1 0 -113.124 37.69568 20428
0 1 -113.862 37.70729 20194
1 0 -113.373 37.14169 20200
1 0 -113.485 37.52134 20724
1 0 -114.051 36.89753 20395
1 0 -113.666 37.45601 20410
1 0 -113.421 36.86348 20511
1 1 -113.59  37.08619 20652
1 1 -113.904 37.40063 20098
0 0 -113.997 37.13753 20050
0 0 -113.458 37.27519 20590
0 0 -113.457 37.45815 20403
0 0 -113.732 37.14784 20252
0 1 -113.673 37.03961 20818
1 1 -113.861 37.75376 20859
0 0 -113.793 36.85021 20730
1 0 -113.488 37.4593  20451
1 0 -113.415 37.26769 20074
;

data want(drop = xx yy tt);
   if _N_ = 1 then do;
      dcl hash hh(dataset : "have(rename = (x = xx y = yy t = tt) where = (f = 1))");
      hh.definekey("h");
      hh.definedata("xx", "yy", "tt");
      hh.definedone();
      dcl hiter i("hh");
   end;

   set have;
   xx = .; yy = .; tt = .; c = 0;

   do while (i.next() = 0);
      if 0 < geodist(y, x, yy, xx, 'm') < 1 & abs(t - tt) < 30 then c + 1;
   end;

run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 736 views
  • 0 likes
  • 3 in conversation