BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dwsmith
Obsidian | Level 7

I have a data set that has a list of serial numbers. For each serial number, I want to run the same calculation without using a macro to loop through each serial number and create a table. There are 37k serial numbers.

 

The following code works if I specify one serial number:

data test;
 infile datalines delimiter = ',';
 input serial lat long;
 datalines;
 1, 58.6732, 118.2569
 1, 58.9852, 118.2568
 1, 58.9865, 118.3156
 1, 75.259, -52.0023
 1, 75.2689, -52.125
 2, 10.123, 15.256
 2, 10.2365, 14.2698
 2, 11.3569, 15.3489
 3, 23.2569, 0.2659
 3, 22.6789, 0.2788
 3, 12.3569, 35.126
 3, 12.3789, 34.679
 ;
run;

%let rad = constant('pi') / 180; /*degrees to radians*/
%let r_km = 6371; /*radius of the earth in km (3959 miles)*/

proc fcmp outlib = funcs.haversine;
 function haversine(lat1, long1, lat2, long2); /*function name*/
 /*function definition*/
 dist = round(&r_km * 2 * arsin(min(1, sqrt(sin((lat2 - lat1) * &rad / 2)**2 + 
 cos(lat1 * &rad) * cos(lat2 * &rad) * sin((long2 - long1) * &rad / 2)**2))), 0.000001);
 return(dist); /*what to return*/
 endsub;
run;
options cmplib = funcs; /*needed to access the function*/

data test;
 set test;
 cnt = _n_;

	/*pairwise compare distance*/
	data results (drop = _rc dt);
		/*set up hash table*/
		if _n_ = 1 then do;
			if 0 then 
				set test(rename = (lat = lat1 long = long2));
			declare hash h1(dataset:"test(rename = (lat = lat1 long = long2, multidata: 'y');
			_rc = h1.definekey('cnt', 'serial');
			_rc = h1.definedata('lat1', 'long1');
			_rc = h1.definedone();
		end;
		if serial = /*first serial number*/ then accum + 1;/*but then the serial number needs to change, I think a macro wouldn't be ideal though*/
		do _i = (cnt + 1) to (cnt + accum);
			_rc = h1.find(key: _i, key: serial);
			hdist = haversine(lat, long, lat1, long1);
			output;
		end;
	run;
 

Can this be generalized to run on whole dataset comparing distance only for the specific serial numbers, recognizing a new number, and starting the process again. Otherwise, I need a macro to subset the major data set 37k times which seems inefficient.

 

The gist is, I need to find the distance between gps location 1 with 2, 2 with 3, ... for serial number 1. Then when I get to serial number 2, I need to repeat this process and so on.

 

This is a rough sketch of the what I am trying to achieve. I added comments where I explained the intent but haven't figured out the process to achieve the desired results.

 

/*Tried setting up a macro for the serial loop but this doesn't work
  Here is what I tried for that*/
%let rad = constant('pi') / 180; /*degrees to radians*/
%let r_km = 6371; /*radius of the earth in km (3959 miles)*/

proc fcmp outlib = funcs.haversine;
 function haversine(lat1, long1, lat2, long2); /*function name*/
 /*function definition*/
 dist = round(&r_km * 2 * arsin(min(1, sqrt(sin((lat2 - lat1) * &rad / 2)**2 + 
 cos(lat1 * &rad) * cos(lat2 * &rad) * sin((long2 - long1) * &rad / 2)**2))), 0.000001);
 return(dist); /*what to return*/
 endsub;
run;
options cmplib = funcs; /*needed to access the function*/

data test;
 infile datalines delimiter = ',';
 input serial lat long;
 label serial = 'serial'
 lat = 'lat'
 long = 'long';
 datalines;
 1, 58.6732, 118.2569
 1, 58.9852, 118.2568
 1, 58.9865, 118.3156
 1, 75.259, -52.0023
 1, 75.2689, -52.125
 2, 10.123, 15.256
 2, 10.2365, 14.2698
 2, 11.3569, 15.3489
 3, 23.2569, 0.2659
 3, 22.6789, 0.2788
 3, 12.3569, 35.126
 3, 12.3789, 34.679
 ;
run;

data test;
 set test;
 input cnt;
 label cnt = 'cnt';
 cnt = _n_;
run;

data serials (keep = serial);
 set test;
run;

proc sort nodupkey data = serials;
 by serial;
run;

data _null_;
 set serials nobs = n;
 call symput ('num', n);
run;

options mprint; /*full output for macro execution so you know something is happening*/

%macro serial_loop;
%local i sn;
%do i = 1 %to #
 data _null_;
 set serials (obs = &i firstobs = &i);
 call symputx ('sn', serial);
 run;
 if serial = "&sn" then
 accum + 1;
 /*find the distance between the ith and ith + 1 coordinates for each gps location*/
 do _i = (cnt + 1) to (cnt + accum);
 _rc = h1.find(key: _i, key: ser_no);
 if _rc = 0 then do;
 hdist = haversine(lat, long, lat1, long1);
 output;
 end;
 end;
%end;
%mend serial_loop;
data results (drop = _rc);
 /*set up hash table*/
 set test;
 if _n_ = 1 then do;
 if 0 then 
 set test(rename = (lat = lat1 long = long1));
 declare hash h1(dataset: "test(rename = (lat = lat1 
 long = long1))",
 multidata: 'y');
 _rc = h1.definekey('cnt', 'serial');
 _rc = h1.definedata('lat', 'long');
 _rc = h1.definedone();
 end;
 %serial_loop;
run;
 
ERROR: Variable lat is not on file WORK.TEST.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
dwsmith
Obsidian | Level 7

I have answered my own question. I used SAS' geodist for the example solution.

 

data test;
 infile datalines delimiter = ',';
 input serial lat long date;
 format date date9.;
 datalines;
 1, 58.6732, 118.2569, 20225
 1, 58.9852, 118.2568, 20226
 1, 58.9865, 118.3156, 20227
 1, 75.259, -52.0023,  20228
 1, 75.2689, -52.125,  20229
 2, 10.123, 15.256,    22111
 2, 10.2365, 14.2698,  22112
 2, 11.3569, 15.3489,  22113
 3, 23.2569, 0.2659,   21256
 3, 22.6789, 0.2788,   21257
 3, 12.3569, 35.126,   21258
 3, 12.3789, 34.679,   21259
 ;
run;

data test;
	set test;
	i = _n_;
run;

proc sql _method;
	create table results (drop = sn date1 j) 
	as select
		a.*,
		b.serial as sn label = 'sn',
		b.lat as lat1 label = 'lat1',
		b.long as long1 label = 'long1',
		b.date as date1 label = 'date1',
		b.i as j,
		geodist(a.lat, a.long, b.lat, b.long) as gdist
	from test as a
	inner join test as b
	on a.serial = b.serial
	where a.i > b.i;
quit; 

proc sort results;
	by serial date;
run;

View solution in original post

7 REPLIES 7
data_null__
Jade | Level 19

Guessing I would say yes.  I think you will need to supply data to go with your code get anyone interested.  Also show what the output might look like for a "new" number where process is started over.

dwsmith
Obsidian | Level 7

I added a sample data set

ballardw
Super User

@dwsmith wrote:

I have a data set that has a list of serial numbers. For each serial number, I want to run the same calculation without using a macro to loop through each serial number and create a table. There are 37k serial numbers.

 


Are you comparing this only to other items with the same serial number?

 

I am not sure what this means at all:

recognizing a new number, and starting the process again. Otherwise, I need a macro to subset the major data set 37k times which seems inefficient.

It will really help if you can provide an example data set with a small number of records, say 5 to 10 and the minimum number of variables to work with and what the output should look like. What little I think I understand might be done with a reflexive join in Proc Sql since it looks like you are only using one data set.

 

dwsmith
Obsidian | Level 7

I added a sample data set.

dwsmith
Obsidian | Level 7

I have answered my own question. I used SAS' geodist for the example solution.

 

data test;
 infile datalines delimiter = ',';
 input serial lat long date;
 format date date9.;
 datalines;
 1, 58.6732, 118.2569, 20225
 1, 58.9852, 118.2568, 20226
 1, 58.9865, 118.3156, 20227
 1, 75.259, -52.0023,  20228
 1, 75.2689, -52.125,  20229
 2, 10.123, 15.256,    22111
 2, 10.2365, 14.2698,  22112
 2, 11.3569, 15.3489,  22113
 3, 23.2569, 0.2659,   21256
 3, 22.6789, 0.2788,   21257
 3, 12.3569, 35.126,   21258
 3, 12.3789, 34.679,   21259
 ;
run;

data test;
	set test;
	i = _n_;
run;

proc sql _method;
	create table results (drop = sn date1 j) 
	as select
		a.*,
		b.serial as sn label = 'sn',
		b.lat as lat1 label = 'lat1',
		b.long as long1 label = 'long1',
		b.date as date1 label = 'date1',
		b.i as j,
		geodist(a.lat, a.long, b.lat, b.long) as gdist
	from test as a
	inner join test as b
	on a.serial = b.serial
	where a.i > b.i;
quit; 

proc sort results;
	by serial date;
run;
ballardw
Super User

And that solution uses a reflexive join ...

dwsmith
Obsidian | Level 7

I am not sure if this solution is good for a data set with 15million rows though.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1265 views
  • 0 likes
  • 3 in conversation