Solved: Count Number of Observations Meeting Criteria Compared To Each Observa...

uwabe3 · Posted 04-13-2021 05:06 PM

I am no sure how to describe this, but I have a data set of home sales including foreclosures. I want to count the number of foreclosures near each observation based on distance and time as well as if the foreclosure meets one other, dummy criteria.

Below is the data.

h is a dummy variable for quality

f is a dummy variable for foreclosure sale

X and Y are lat/long.

t is the month the foreclosure sale

Count all the other observations that meet these critera:

Within distance: say 1 mile (I know how to use GEODIST to find this)

Within time: say 30 days so abs(t_subject - t_obs) < 30)

Where f=1

I also want to split this count between obs where h=0 and h=1

So for the first datapoint. I want to find all other datapoints that are within 1 mile of

-113.219

37.18241

And within 30 days of 20239 or 20209 to 20269

Where f=1.

I want a count for when h=0 and h=1.

Here's the data:

h	f	x	y	t
0	1	-113.219	37.18241	20239
1	0	-113.751	37.73728	20404
1	0	-113.124	37.69568	20428
0	1	-113.862	37.70729	20194
1	0	-113.373	37.14169	20200
1	0	-113.485	37.52134	20724
1	0	-114.051	36.89753	20395
1	0	-113.666	37.45601	20410
1	0	-113.421	36.86348	20511
1	1	-113.59	37.08619	20652
1	1	-113.904	37.40063	20098
0	0	-113.997	37.13753	20050
0	0	-113.458	37.27519	20590
0	0	-113.457	37.45815	20403
0	0	-113.732	37.14784	20252
0	1	-113.673	37.03961	20818
1	1	-113.861	37.75376	20859
0	0	-113.793	36.85021	20730
1	0	-113.488	37.4593	20451
1	0	-113.415	37.26769	20074

PeterClemmensen · Posted 04-15-2021 03:32 PM

My 2 cents. Should be reasonably fast as well.

data have;
input h f x y t;
datalines;
0 1 -113.219 37.18241 20239
1 0 -113.751 37.73728 20404
1 0 -113.124 37.69568 20428
0 1 -113.862 37.70729 20194
1 0 -113.373 37.14169 20200
1 0 -113.485 37.52134 20724
1 0 -114.051 36.89753 20395
1 0 -113.666 37.45601 20410
1 0 -113.421 36.86348 20511
1 1 -113.59  37.08619 20652
1 1 -113.904 37.40063 20098
0 0 -113.997 37.13753 20050
0 0 -113.458 37.27519 20590
0 0 -113.457 37.45815 20403
0 0 -113.732 37.14784 20252
0 1 -113.673 37.03961 20818
1 1 -113.861 37.75376 20859
0 0 -113.793 36.85021 20730
1 0 -113.488 37.4593  20451
1 0 -113.415 37.26769 20074
;

data want(drop = xx yy tt);
   if _N_ = 1 then do;
      dcl hash hh(dataset : "have(rename = (x = xx y = yy t = tt) where = (f = 1))");
      hh.definekey("h");
      hh.definedata("xx", "yy", "tt");
      hh.definedone();
      dcl hiter i("hh");
   end;

   set have;
   xx = .; yy = .; tt = .; c = 0;

   do while (i.next() = 0);
      if 0 < geodist(y, x, yy, xx, 'm') < 1 & abs(t - tt) < 30 then c + 1;
   end;

run;

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PaigeMiller · Posted 04-13-2021 05:17 PM

If that's the entire data set, 20-30 observations, I would use a Cartesian join to get all possible combinations in a data set and then do the 1 miles and 30 days comparison on this data set. I would also add some sort of ID variable to each row, it could be as simple as sequence number.

UNTESTED CODE

proc sql;
     create table cartesian as select a.id,a.h,a.f,a.x,a.y,a.t,
         b.id as id2,b.h as h2,b.f as f2,b.x as x2,b.y as y2,b.t as t2
      from have as a,have as b
      order by a.id,b.id;
quit;

Now if you really have thousands of records, this probably won't work because maybe your computer can't handle this large problem, or is going to take a long time.

--
Paige Miller

uwabe3 · Posted 04-13-2021 05:29 PM

Yes, I actually have 250k observations. I have used the Cartesian join and other methods, but they all consume days worth of resources. I am trying to find an easier solution. I was wondering if there was a solution with do loops.

PaigeMiller · Posted 04-13-2021 05:36 PM

Well, there are methods that might work when you have 250k observations, such as a double-do loop, and hash objects, but I will leave that type of programming up to others.

--
Paige Miller

PeterClemmensen · Posted 04-15-2021 03:32 PM

My 2 cents. Should be reasonably fast as well.

data have;
input h f x y t;
datalines;
0 1 -113.219 37.18241 20239
1 0 -113.751 37.73728 20404
1 0 -113.124 37.69568 20428
0 1 -113.862 37.70729 20194
1 0 -113.373 37.14169 20200
1 0 -113.485 37.52134 20724
1 0 -114.051 36.89753 20395
1 0 -113.666 37.45601 20410
1 0 -113.421 36.86348 20511
1 1 -113.59  37.08619 20652
1 1 -113.904 37.40063 20098
0 0 -113.997 37.13753 20050
0 0 -113.458 37.27519 20590
0 0 -113.457 37.45815 20403
0 0 -113.732 37.14784 20252
0 1 -113.673 37.03961 20818
1 1 -113.861 37.75376 20859
0 0 -113.793 36.85021 20730
1 0 -113.488 37.4593  20451
1 0 -113.415 37.26769 20074
;

data want(drop = xx yy tt);
   if _N_ = 1 then do;
      dcl hash hh(dataset : "have(rename = (x = xx y = yy t = tt) where = (f = 1))");
      hh.definekey("h");
      hh.definedata("xx", "yy", "tt");
      hh.definedone();
      dcl hiter i("hh");
   end;

   set have;
   xx = .; yy = .; tt = .; c = 0;

   do while (i.next() = 0);
      if 0 < geodist(y, x, yy, xx, 'm') < 1 & abs(t - tt) < 30 then c + 1;
   end;

run;

The DATA to DATA Step Macro
Blog: SASnrd

Count Number of Observations Meeting Criteria Compared To Each Observation

Re: Count Number of Observations Meeting Criteria Compared To Each Observation

Re: Count Number of Observations Meeting Criteria Compared To Each Observation

Re: Count Number of Observations Meeting Criteria Compared To Each Observation

Re: Count Number of Observations Meeting Criteria Compared To Each Observation

Re: Count Number of Observations Meeting Criteria Compared To Each Observation

Count Number of Observations Meeting Criteria Compared To Each Observation

Re: Count Number of Observations Meeting Criteria Compared To Each Observation

Re: Count Number of Observations Meeting Criteria Compared To Each Observation

Re: Count Number of Observations Meeting Criteria Compared To Each Observation

Re: Count Number of Observations Meeting Criteria Compared To Each Observation

Re: Count Number of Observations Meeting Criteria Compared To Each Observation

Click image to register for webinar

Classroom Training Available!