Solved: Re: find out possible pairs by years

mspak · Posted 03-01-2012 11:03 PM

Dear all,

I understand that proc sql can create the possible pairs from the data. But I am unclear on how this to be done by year basis.

I attached 2 files as example:

comp - is the company data: zipcode, financial year(fyear) and company identification code (tic)

major_bank is major bank data: ban identification code (bank_id), financial year (fyear) and zipcode (bankzip)

My task is to calculate the distance in miles to the closets large bank. I understand the ways to calculate the distance, but I have no idea on how to generate the possible pairs of origin and destination points between the company and the major banks for a given year from years 2003 to 2010. I have to get the possible pairs and then calculate distances. Lastly, i have to select the lowest distance between the company and the closest bank.

Some banks might available in some years but not in other years. There might be relocation of banks to another places (ie. different zip codes). My query is how to get the possible pairs for every year from 2003 to 2010?

Thank you.

Regards,

mspak

Haikuo · Posted 03-02-2012 10:08 PM

The log means that you don't have functions like min(), max(), count(), sum() as such in your select statement, so group by will be recognized as order by. It is fine. The second pass, I will use sum() instead of count(), sometimes, count() could be tricky.

Good Luck,

Haikuo

View solution in original post

Haikuo · Posted 03-02-2012 12:11 AM

After I import your data into library 'dist', here is one try base on SQL:

proc sql;

create table dist as

select a.fyear, catx('-',tic,bank_id) as pairs, min(zipcitydistance(zipcode,zipbank)) as min_dist

from dist.comp a , dist.major_bank b

where a.fyear=b.fyear

group by a.fyear,tic

having not missing(calculated min_dist);

quit;

Regards,

Haikuo

mspak · Posted 03-02-2012 01:43 AM

Thank you very much . Everything can be solved with a very short program.

mspak · Posted 03-02-2012 06:37 AM

Hi,

If I were to calculate number of banks which are within 60 miles from the company, is the following program correct?

proc sql;

create table bankdist as

select a.fyear, tic, bank_id, count(zipcitydistance(zipcode,zipbank)<=60) as banknum

from dist.comp a , dist.major_banks b

where a.fyear=b.fyear

group by a.fyear,tic

having not missing(calculated banknum);

quit;

But I feel the answer is strange (based on the program above), as almost all the answers are the same of the different companies in the same year?

thank you.

Haikuo · Posted 03-02-2012 08:47 AM

one thing I can suggest is that using: count(.< zipcitydistance(zipcode,zipbank)<=60), and removing 'having' statement.

Regards,

Haikuo

mspak · Posted 03-02-2012 10:32 AM

Hi and thanks again,

I created a libname geog. I tried your suggested method and try the following as well:

proc sql;

create table geog.bankbank1 as

select a.fyear, tic, bank_id,zipcitydistance(zipcode, bankzip)as bankdist

from geog.comp a , geog.us_banks_unique b

where a.fyear=b.fyear

group by a.fyear,tic;

having not missing(calculated bankdist);

quit;

proc sql;

create table geog.banknum as

select fyear, tic, count(0<=bankdist<=60) as banknum

from geog.bankbank1

group by tic, fyear;

quit; run;

I found the similar answer. I attached the files if you need them. I feel it is a bit strange as the number of banks within 60 miles radius is similar for different firms in the same year.

Thanks.

mspak

Haikuo · Posted 03-02-2012 12:53 PM

I have seen some issues with your code. The first pass, when you generate 'geog.bankbank1', if you include bank_id, the 'group by' will be turning into 'order by'only. So you are not going to have the results you want. Try to remove it and see want happens. And when you testing your code, use options like 'obs=1000' to speed it up instead of running against the whole table.

Regards,

Haikuo

mspak · Posted 03-02-2012 09:20 PM

Hi,

The same answer appeared even if I remove bank_id. for answer. The following on the SAS log:

A GROUP BY clause has been transformed into an ORDER BY clause because neither the SELECT clause nor the optional HAVING clause of the associated table-expression referenced a summary function.

What does the log mean?

Thank you,

mspak

Haikuo · Posted 03-02-2012 10:08 PM

The log means that you don't have functions like min(), max(), count(), sum() as such in your select statement, so group by will be recognized as order by. It is fine. The second pass, I will use sum() instead of count(), sometimes, count() could be tricky.

Good Luck,

Haikuo

mspak · Posted 03-06-2012 08:59 AM

Thanks Haikuo,

Using sum instead of count ---- leading to answer

Regards,

MSPAK

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away