BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mspak
Quartz | Level 8

Dear all,

I understand that proc sql can create the possible pairs from the data. But I am unclear on how this to be done by year basis.

I attached 2 files as example:

comp - is the company data: zipcode, financial year(fyear) and company identification code (tic)

major_bank is major bank data: ban identification code (bank_id), financial year (fyear) and zipcode (bankzip)

My task is to calculate the distance in miles to the closets large bank. I understand the ways to calculate the distance, but I have no idea on how to generate the possible pairs of origin and destination points between the company and the major banks for a given year from years 2003 to 2010. I have to get the possible pairs and then calculate distances. Lastly, i have to select the lowest distance between the company and the closest bank.

Some banks might available in some years but not in other years. There might be relocation of banks to another places (ie. different zip codes). My query is how to get the possible pairs for every year from 2003 to 2010?

Thank you.

Regards,

mspak

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

The log means that you don't have functions like min(), max(), count(), sum() as such in your select statement, so group by will be recognized as order by. It is fine. The second pass, I  will use sum() instead of count(), sometimes, count() could be tricky.

Good Luck,

Haikuo

View solution in original post

9 REPLIES 9
Haikuo
Onyx | Level 15

After I import your data into library 'dist', here is one try base on SQL:

proc sql;

create table dist as

select a.fyear, catx('-',tic,bank_id) as pairs, min(zipcitydistance(zipcode,zipbank)) as min_dist

from dist.comp a , dist.major_bank b

where a.fyear=b.fyear 

group by a.fyear,tic

having not missing(calculated min_dist);

quit;

Regards,

Haikuo

mspak
Quartz | Level 8

Thank you very much Smiley Happy. Everything can be solved with a very short program.

mspak
Quartz | Level 8

Hi,

If I were to calculate number of banks which are within 60 miles from the company, is the following program correct?

proc sql;

create table bankdist as

select a.fyear, tic, bank_id, count(zipcitydistance(zipcode,zipbank)<=60) as banknum

from dist.comp a , dist.major_banks b

where a.fyear=b.fyear

group by a.fyear,tic

having not missing(calculated banknum);

quit;

But I feel the answer is strange (based on the program above), as almost all the answers are the same of the different companies in the same year?

thank you.

Smiley Happy

Haikuo
Onyx | Level 15

one thing I can suggest is that using:  count(.< zipcitydistance(zipcode,zipbank)<=60), and removing 'having' statement.

Regards,

Haikuo

mspak
Quartz | Level 8

Hi and thanks again,

I created a libname geog. I tried your suggested method and try the following as well:

proc sql;

create table geog.bankbank1 as

select a.fyear, tic, bank_id,zipcitydistance(zipcode, bankzip)as bankdist

from geog.comp a , geog.us_banks_unique b

where a.fyear=b.fyear

group by a.fyear,tic;

having not missing(calculated bankdist);

quit;

proc sql;

create table geog.banknum as

select fyear, tic, count(0<=bankdist<=60) as banknum

from geog.bankbank1

group by tic, fyear;

quit; run;

I found the similar answer. I attached the files if you need them. I feel it is a bit strange as the number of banks within 60 miles radius is similar for different firms in the same year.

Thanks.

mspak

Haikuo
Onyx | Level 15

I have seen some issues with your code.  The first pass, when you generate 'geog.bankbank1', if you include  bank_id, the 'group by' will be turning into 'order by'only. So you are not going to have the results you want. Try to remove it and see want happens. And when you testing your code, use options like 'obs=1000' to speed it up instead of running against the whole table.

Regards,

Haikuo

mspak
Quartz | Level 8

Hi,

The same answer appeared even if I remove bank_id.  for answer. The following on the SAS log:

A GROUP BY clause has been transformed into an ORDER BY clause because neither the SELECT clause nor the optional HAVING clause of the associated table-expression referenced a summary function.

What does the log mean?

Thank you,

mspak

Haikuo
Onyx | Level 15

The log means that you don't have functions like min(), max(), count(), sum() as such in your select statement, so group by will be recognized as order by. It is fine. The second pass, I  will use sum() instead of count(), sometimes, count() could be tricky.

Good Luck,

Haikuo

mspak
Quartz | Level 8

Thanks Haikuo,

Using sum instead of count ---- leading to answer Smiley Happy

Regards,

MSPAK

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 908 views
  • 3 likes
  • 2 in conversation