## find out possible pairs by years

Solved
Regular Contributor
Posts: 162

# find out possible pairs by years

Dear all,

I understand that proc sql can create the possible pairs from the data. But I am unclear on how this to be done by year basis.

I attached 2 files as example:

comp - is the company data: zipcode, financial year(fyear) and company identification code (tic)

major_bank is major bank data: ban identification code (bank_id), financial year (fyear) and zipcode (bankzip)

My task is to calculate the distance in miles to the closets large bank. I understand the ways to calculate the distance, but I have no idea on how to generate the possible pairs of origin and destination points between the company and the major banks for a given year from years 2003 to 2010. I have to get the possible pairs and then calculate distances. Lastly, i have to select the lowest distance between the company and the closest bank.

Some banks might available in some years but not in other years. There might be relocation of banks to another places (ie. different zip codes). My query is how to get the possible pairs for every year from 2003 to 2010?

Thank you.

Regards,

mspak

Accepted Solutions
Solution
‎03-02-2012 10:08 PM
Posts: 3,167

## find out possible pairs by years

The log means that you don't have functions like min(), max(), count(), sum() as such in your select statement, so group by will be recognized as order by. It is fine. The second pass, I  will use sum() instead of count(), sometimes, count() could be tricky.

Good Luck,

Haikuo

All Replies
Posts: 3,167

## find out possible pairs by years

After I import your data into library 'dist', here is one try base on SQL:

proc sql;

create table dist as

select a.fyear, catx('-',tic,bank_id) as pairs, min(zipcitydistance(zipcode,zipbank)) as min_dist

from dist.comp a , dist.major_bank b

where a.fyear=b.fyear

group by a.fyear,tic

having not missing(calculated min_dist);

quit;

Regards,

Haikuo

Regular Contributor
Posts: 162

## find out possible pairs by years

Thank you very much . Everything can be solved with a very short program.

Regular Contributor
Posts: 162

## Re: find out possible pairs by years

Hi,

If I were to calculate number of banks which are within 60 miles from the company, is the following program correct?

proc sql;

create table bankdist as

select a.fyear, tic, bank_id, count(zipcitydistance(zipcode,zipbank)<=60) as banknum

from dist.comp a , dist.major_banks b

where a.fyear=b.fyear

group by a.fyear,tic

having not missing(calculated banknum);

quit;

But I feel the answer is strange (based on the program above), as almost all the answers are the same of the different companies in the same year?

thank you.

Posts: 3,167

## Re: find out possible pairs by years

one thing I can suggest is that using:  count(.< zipcitydistance(zipcode,zipbank)<=60), and removing 'having' statement.

Regards,

Haikuo

Regular Contributor
Posts: 162

## Re: find out possible pairs by years

Hi and thanks again,

I created a libname geog. I tried your suggested method and try the following as well:

proc sql;

create table geog.bankbank1 as

select a.fyear, tic, bank_id,zipcitydistance(zipcode, bankzip)as bankdist

from geog.comp a , geog.us_banks_unique b

where a.fyear=b.fyear

group by a.fyear,tic;

having not missing(calculated bankdist);

quit;

proc sql;

create table geog.banknum as

select fyear, tic, count(0<=bankdist<=60) as banknum

from geog.bankbank1

group by tic, fyear;

quit; run;

I found the similar answer. I attached the files if you need them. I feel it is a bit strange as the number of banks within 60 miles radius is similar for different firms in the same year.

Thanks.

mspak

Posts: 3,167

## Re: find out possible pairs by years

I have seen some issues with your code.  The first pass, when you generate 'geog.bankbank1', if you include  bank_id, the 'group by' will be turning into 'order by'only. So you are not going to have the results you want. Try to remove it and see want happens. And when you testing your code, use options like 'obs=1000' to speed it up instead of running against the whole table.

Regards,

Haikuo

Regular Contributor
Posts: 162

## Re: find out possible pairs by years

Hi,

The same answer appeared even if I remove bank_id.  for answer. The following on the SAS log:

A GROUP BY clause has been transformed into an ORDER BY clause because neither the SELECT clause nor the optional HAVING clause of the associated table-expression referenced a summary function.

What does the log mean?

Thank you,

mspak

Solution
‎03-02-2012 10:08 PM
Posts: 3,167

## find out possible pairs by years

The log means that you don't have functions like min(), max(), count(), sum() as such in your select statement, so group by will be recognized as order by. It is fine. The second pass, I  will use sum() instead of count(), sometimes, count() could be tricky.

Good Luck,

Haikuo

Regular Contributor
Posts: 162

## Re: find out possible pairs by years

Thanks Haikuo,

Regards,

MSPAK

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
• 9 replies
• 244 views
• 3 likes
• 2 in conversation