I would like to calculate the number of loansa company had for the past 5 years. What is the SAS SQL code to do this?
Here is the have table:
CompanyID | LoanID | FYEAR |
1004 | 5006 | 1989 |
1004 | 5006 | 1990 |
1004 | 6460 | 1990 |
1004 | 5006 | 1991 |
1004 | 6460 | 1991 |
1004 | 7970 | 1991 |
1004 | 5006 | 1992 |
1004 | 6460 | 1992 |
1004 | 7970 | 1992 |
1004 | 5006 | 1993 |
1004 | 7970 | 1993 |
1004 | 5006 | 1994 |
1004 | 7970 | 1994 |
1004 | 5006 | 1995 |
1004 | 7970 | 1995 |
1004 | 7970 | 1996 |
1004 | 34433 | 1996 |
1004 | 34441 | 1996 |
1004 | 7970 | 1997 |
1004 | 34433 | 1997 |
1004 | 34441 | 1997 |
1004 | 45856 | 1997 |
1004 | 46572 | 1997 |
1004 | 49136 | 1997 |
1004 | 7970 | 1998 |
1004 | 34433 | 1998 |
1004 | 34441 | 1998 |
1004 | 45856 | 1998 |
1004 | 46572 | 1998 |
1004 | 49136 | 1998 |
1004 | 34433 | 1999 |
1004 | 34441 | 1999 |
1004 | 46572 | 1999 |
1004 | 49136 | 1999 |
1004 | 34433 | 2000 |
1004 | 34441 | 2000 |
1004 | 100472 | 2000 |
1004 | 34433 | 2001 |
1004 | 34441 | 2001 |
1004 | 111500 | 2001 |
1004 | 34433 | 2002 |
1004 | 34441 | 2002 |
1004 | 109872 | 2002 |
1004 | 119544 | 2002 |
1004 | 34433 | 2003 |
1004 | 34441 | 2003 |
1004 | 109872 | 2003 |
1004 | 119544 | 2003 |
1004 | 109872 | 2004 |
1004 | 119544 | 2004 |
1004 | 153686 | 2006 |
1004 | 153686 | 2007 |
1004 | 165317 | 2007 |
1004 | 153686 | 2008 |
1004 | 165317 | 2008 |
1004 | 153686 | 2009 |
1004 | 165317 | 2009 |
1004 | 187804 | 2009 |
1004 | 165317 | 2010 |
1004 | 197270 | 2010 |
1004 | 187804 | 2010 |
1004 | 197270 | 2011 |
1004 | 187804 | 2011 |
1004 | 197270 | 2012 |
1004 | 187804 | 2012 |
1004 | 197270 | 2013 |
1004 | 187804 | 2013 |
1004 | 197270 | 2014 |
1004 | 241601 | 2014 |
Do you have to use SQL ?
data have;
infile cards truncover expandtabs;
input CompanyID LoanID FYEAR;
cards;
1004 5006 1989
1004 5006 1990
1004 6460 1990
1004 5006 1991
1004 6460 1991
1004 7970 1991
1004 5006 1992
1004 6460 1992
1004 7970 1992
1004 5006 1993
1004 7970 1993
1004 5006 1994
1004 7970 1994
1004 5006 1995
1004 7970 1995
1004 7970 1996
1004 34433 1996
1004 34441 1996
1004 7970 1997
1004 34433 1997
1004 34441 1997
1004 45856 1997
1004 46572 1997
1004 49136 1997
1004 7970 1998
1004 34433 1998
1004 34441 1998
1004 45856 1998
1004 46572 1998
1004 49136 1998
1004 34433 1999
1004 34441 1999
1004 46572 1999
1004 49136 1999
1004 34433 2000
1004 34441 2000
1004 100472 2000
1004 34433 2001
1004 34441 2001
1004 111500 2001
1004 34433 2002
1004 34441 2002
1004 109872 2002
1004 119544 2002
1004 34433 2003
1004 34441 2003
1004 109872 2003
1004 119544 2003
1004 109872 2004
1004 119544 2004
1004 153686 2006
1004 153686 2007
1004 165317 2007
1004 153686 2008
1004 165317 2008
1004 153686 2009
1004 165317 2009
1004 187804 2009
1004 165317 2010
1004 197270 2010
1004 187804 2010
1004 197270 2011
1004 187804 2011
1004 197270 2012
1004 187804 2012
1004 197270 2013
1004 187804 2013
1004 197270 2014
1004 241601 2014
;
run;
proc SQL;
create table temp as
select CompanyID,FYEAR,count(distinct LoanID) as N_LOANS_1YEAR
from have
group by CompanyID,FYEAR;
create table want as
select *,case when
(select count(*) from temp where CompanyID=a.CompanyID and FYEAR le a.FYEAR) lt 5 then .
else (select avg(N_LOANS_1YEAR) from temp where CompanyID=a.CompanyID
and FYEAR between a.FYEAR-4 and a.FYEAR) end as AVG_LOANS_5YEAR,
(select max(N_LOANS_1YEAR) from temp where CompanyID=a.CompanyID
and FYEAR between a.FYEAR-4 and a.FYEAR) as MAX_LOANS_5YEAR
from temp as a;
quit;
See example 12 + a WHERE clause.
http://www.listendata.com/2014/04/proc-sql-select-statement.html
If you run into issues post the code and log and explain your issue in detail.
I tried this code but I am not sure if it is right?
proc SQL;
create table want as
select a.* , count(unique b.LoanID) as NUMLOANS
from have as a
left join have as b
on a.CompanyID=b.CompanyID and a.FYEAR=b.FYEAR+5; quit;
You join a table when you have information in another table or rows that you need.
Nothing in your question of indicates a need for a join. Remove the join and use a where. I would recommend hardcoding the years first and then making it dynamic after you sort out the rest of the code.
Unless you were looking for a moving average? Nothing in your question mentions this though.
No I do not want a moving average. I would like a moving count of the number of loans over the past 5 years. Thank you.
Is this correct?
proc SQL;
create table want as
select * , count(unique LoanID) as LOANFREQ
from have where FYEAR<=FYEAR+5
group by LoanID,CompanyID, fyear
order by CompanyID,fyear,LoanID; quit;
LoanID can't be in your group statement since you want a count for it..
If you want a moving count (my mistake in average vs count) you do need a join. If you search through my posts in last week there's a good example of the three different methods in someone else's question.
If I'm interpreting your question correctly, this is too complex an approach. When you say the past 5 years, don't you want:
If so, that's much easier. SQL can do this in one step. PROC FREQ can do it in two steps:
proc freq data=have;
tables CompanyID * LoanID / noprint out=counts;
where fyear >= 2011;
run;
proc freq data=counts;
tables CompanyID;
run;
Rolling counts through (ugh!) SQL. Assuming you have one record per loan in dataset HAVE, and each loan is a five year loan, orginating in year FYEAR:
Edit: forget the companyid:
proc sql;
create table want as
select companyid, year, count(*)
from (
(select companyid, fyear+0 as year from have) outer union corresponding
(select companyid, fyear+1 as year from have) outer union corresponding
(select companyid, fyear+2 as year from have) outer union corresponding
(select companyid, fyear+3 as year from have) outer union corresponding
(select companyid, fyear+4 as year from have)
)
group by companyid, year;
quit;
A far better way would be:
data need (keep=year)/view=need;
set have;
do year=fyear to fyear+4;
output;
end;
run;
proc freq data=need;
tables companyid *year /list;
run;
Since you want a moving count, I would modify what you started doing:
proc SQL;
create table want as
select a.* , count(unique b.LoanID) as NUMLOANS
from have as a
left join have as b
on a.CompanyID=b.CompanyID and a.FYEAR=b.FYEAR+5; quit;
quit;
The first fix, is what gets counted. (You'll have to check my syntax on this since my SQL is suspect.)
count(distinct b.CompanyID, b.LoanID)
Secondly, change to a WHERE condition instead of a join:
where a.CompanyID=b.CompanyID and (a.FYEAR >= b.FYEAR) and (a.FYEAR - 4 <= b.FYEAR)
You can adjust the time periods if I didn't get the 5-year window correctly.
But you didn't post the output yet.
data have;
infile cards truncover expandtabs;
input CompanyID LoanID FYEAR;
cards;
1004 5006 1989
1004 5006 1990
1004 6460 1990
1004 5006 1991
1004 6460 1991
1004 7970 1991
1004 5006 1992
1004 6460 1992
1004 7970 1992
1004 5006 1993
1004 7970 1993
1004 5006 1994
1004 7970 1994
1004 5006 1995
1004 7970 1995
1004 7970 1996
1004 34433 1996
1004 34441 1996
1004 7970 1997
1004 34433 1997
1004 34441 1997
1004 45856 1997
1004 46572 1997
1004 49136 1997
1004 7970 1998
1004 34433 1998
1004 34441 1998
1004 45856 1998
1004 46572 1998
1004 49136 1998
1004 34433 1999
1004 34441 1999
1004 46572 1999
1004 49136 1999
1004 34433 2000
1004 34441 2000
1004 100472 2000
1004 34433 2001
1004 34441 2001
1004 111500 2001
1004 34433 2002
1004 34441 2002
1004 109872 2002
1004 119544 2002
1004 34433 2003
1004 34441 2003
1004 109872 2003
1004 119544 2003
1004 109872 2004
1004 119544 2004
1004 153686 2006
1004 153686 2007
1004 165317 2007
1004 153686 2008
1004 165317 2008
1004 153686 2009
1004 165317 2009
1004 187804 2009
1004 165317 2010
1004 197270 2010
1004 187804 2010
1004 197270 2011
1004 187804 2011
1004 197270 2012
1004 187804 2012
1004 197270 2013
1004 187804 2013
1004 197270 2014
1004 241601 2014
;
run;
proc SQL;
create table want as
select b.* ,(select count(unique a.LoanID)
from have as a where a.CompanyID=b.CompanyID and
a.FYEAR between b.FYEAR and b.FYEAR+5) as NUMLOANS
from have as b;
quit;
Want table.
CompanyID | LoanID | FYEAR | N_LOANS_1YEAR | AVG_LOANS_5YEAR | MAX_LOANS_5YEAR |
1004 | 5006 | 1989 | 1 | 1 | |
1004 | 5006 | 1990 | 2 | 2 | |
1004 | 5006 | 1991 | 3 | 3 | |
1004 | 5006 | 1992 | 3 | 3 | |
1004 | 5006 | 1993 | 2 | 2.2 | 3 |
1004 | 5006 | 1994 | 2 | 2.4 | 3 |
1004 | 5006 | 1995 | 2 | 2.4 | 3 |
1004 | 7970 | 1996 | 3 | 2.4 | 3 |
1004 | 7970 | 1997 | 6 | 3 | 6 |
1004 | 7970 | 1998 | 6 | 3.8 | 6 |
1004 | 34433 | 1999 | 4 | 4.2 | 6 |
1004 | 34433 | 2000 | 3 | 4.4 | 6 |
1004 | 34433 | 2001 | 3 | 4.4 | 6 |
1004 | 34433 | 2002 | 4 | 4 | 6 |
1004 | 34433 | 2003 | 4 | 3.6 | 4 |
1004 | 109872 | 2004 | 2 | 3.2 | 4 |
1004 | 153686 | 2006 | 1 | 2.8 | 4 |
1004 | 153686 | 2007 | 2 | 2.6 | 4 |
1004 | 153686 | 2008 | 2 | 2.2 | 4 |
1004 | 153686 | 2009 | 3 | 2 | 3 |
1004 | 165317 | 2010 | 3 | 2.2 | 3 |
1004 | 197270 | 2011 | 2 | 2.4 | 3 |
1004 | 197270 | 2012 | 2 | 2.4 | 3 |
1004 | 197270 | 2013 | 2 | 2.4 | 3 |
1004 | 197270 | 2014 | 2 | 2.2 | 3 |
Do you have to use SQL ?
data have;
infile cards truncover expandtabs;
input CompanyID LoanID FYEAR;
cards;
1004 5006 1989
1004 5006 1990
1004 6460 1990
1004 5006 1991
1004 6460 1991
1004 7970 1991
1004 5006 1992
1004 6460 1992
1004 7970 1992
1004 5006 1993
1004 7970 1993
1004 5006 1994
1004 7970 1994
1004 5006 1995
1004 7970 1995
1004 7970 1996
1004 34433 1996
1004 34441 1996
1004 7970 1997
1004 34433 1997
1004 34441 1997
1004 45856 1997
1004 46572 1997
1004 49136 1997
1004 7970 1998
1004 34433 1998
1004 34441 1998
1004 45856 1998
1004 46572 1998
1004 49136 1998
1004 34433 1999
1004 34441 1999
1004 46572 1999
1004 49136 1999
1004 34433 2000
1004 34441 2000
1004 100472 2000
1004 34433 2001
1004 34441 2001
1004 111500 2001
1004 34433 2002
1004 34441 2002
1004 109872 2002
1004 119544 2002
1004 34433 2003
1004 34441 2003
1004 109872 2003
1004 119544 2003
1004 109872 2004
1004 119544 2004
1004 153686 2006
1004 153686 2007
1004 165317 2007
1004 153686 2008
1004 165317 2008
1004 153686 2009
1004 165317 2009
1004 187804 2009
1004 165317 2010
1004 197270 2010
1004 187804 2010
1004 197270 2011
1004 187804 2011
1004 197270 2012
1004 187804 2012
1004 197270 2013
1004 187804 2013
1004 197270 2014
1004 241601 2014
;
run;
proc SQL;
create table temp as
select CompanyID,FYEAR,count(distinct LoanID) as N_LOANS_1YEAR
from have
group by CompanyID,FYEAR;
create table want as
select *,case when
(select count(*) from temp where CompanyID=a.CompanyID and FYEAR le a.FYEAR) lt 5 then .
else (select avg(N_LOANS_1YEAR) from temp where CompanyID=a.CompanyID
and FYEAR between a.FYEAR-4 and a.FYEAR) end as AVG_LOANS_5YEAR,
(select max(N_LOANS_1YEAR) from temp where CompanyID=a.CompanyID
and FYEAR between a.FYEAR-4 and a.FYEAR) as MAX_LOANS_5YEAR
from temp as a;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.