Dear all,
I understand that proc sql can create the possible pairs from the data. But I am unclear on how this to be done by year basis.
I attached 2 files as example:
comp - is the company data: zipcode, financial year(fyear) and company identification code (tic)
major_bank is major bank data: ban identification code (bank_id), financial year (fyear) and zipcode (bankzip)
My task is to calculate the distance in miles to the closets large bank. I understand the ways to calculate the distance, but I have no idea on how to generate the possible pairs of origin and destination points between the company and the major banks for a given year from years 2003 to 2010. I have to get the possible pairs and then calculate distances. Lastly, i have to select the lowest distance between the company and the closest bank.
Some banks might available in some years but not in other years. There might be relocation of banks to another places (ie. different zip codes). My query is how to get the possible pairs for every year from 2003 to 2010?
Thank you.
Regards,
mspak
The log means that you don't have functions like min(), max(), count(), sum() as such in your select statement, so group by will be recognized as order by. It is fine. The second pass, I will use sum() instead of count(), sometimes, count() could be tricky.
Good Luck,
Haikuo
After I import your data into library 'dist', here is one try base on SQL:
proc sql;
create table dist as
select a.fyear, catx('-',tic,bank_id) as pairs, min(zipcitydistance(zipcode,zipbank)) as min_dist
from dist.comp a , dist.major_bank b
where a.fyear=b.fyear
group by a.fyear,tic
having not missing(calculated min_dist);
quit;
Regards,
Haikuo
Thank you very much . Everything can be solved with a very short program.
Hi,
If I were to calculate number of banks which are within 60 miles from the company, is the following program correct?
proc sql;
create table bankdist as
select a.fyear, tic, bank_id, count(zipcitydistance(zipcode,zipbank)<=60) as banknum
from dist.comp a , dist.major_banks b
where a.fyear=b.fyear
group by a.fyear,tic
having not missing(calculated banknum);
quit;
But I feel the answer is strange (based on the program above), as almost all the answers are the same of the different companies in the same year?
thank you.
one thing I can suggest is that using: count(.< zipcitydistance(zipcode,zipbank)<=60), and removing 'having' statement.
Regards,
Haikuo
Hi and thanks again,
I created a libname geog. I tried your suggested method and try the following as well:
proc sql;
create table geog.bankbank1 as
select a.fyear, tic, bank_id,zipcitydistance(zipcode, bankzip)as bankdist
from geog.comp a , geog.us_banks_unique b
where a.fyear=b.fyear
group by a.fyear,tic;
having not missing(calculated bankdist);
quit;
proc sql;
create table geog.banknum as
select fyear, tic, count(0<=bankdist<=60) as banknum
from geog.bankbank1
group by tic, fyear;
quit; run;
I found the similar answer. I attached the files if you need them. I feel it is a bit strange as the number of banks within 60 miles radius is similar for different firms in the same year.
Thanks.
mspak
I have seen some issues with your code. The first pass, when you generate 'geog.bankbank1', if you include bank_id, the 'group by' will be turning into 'order by'only. So you are not going to have the results you want. Try to remove it and see want happens. And when you testing your code, use options like 'obs=1000' to speed it up instead of running against the whole table.
Regards,
Haikuo
Hi,
The same answer appeared even if I remove bank_id. for answer. The following on the SAS log:
A GROUP BY clause has been transformed into an ORDER BY clause because neither the SELECT clause nor the optional HAVING clause of the associated table-expression referenced a summary function.
What does the log mean?
Thank you,
mspak
The log means that you don't have functions like min(), max(), count(), sum() as such in your select statement, so group by will be recognized as order by. It is fine. The second pass, I will use sum() instead of count(), sometimes, count() could be tricky.
Good Luck,
Haikuo
Thanks Haikuo,
Using sum instead of count ---- leading to answer
Regards,
MSPAK
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.