BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Agent1592
Pyrite | Level 9

I would like to calculate the number of loansa company had for the past 5 years. What is the SAS SQL code to do this?

Here is the have table:

CompanyIDLoanIDFYEAR
100450061989
100450061990
100464601990
100450061991
100464601991
100479701991
100450061992
100464601992
100479701992
100450061993
100479701993
100450061994
100479701994
100450061995
100479701995
100479701996
1004344331996
1004344411996
100479701997
1004344331997
1004344411997
1004458561997
1004465721997
1004491361997
100479701998
1004344331998
1004344411998
1004458561998
1004465721998
1004491361998
1004344331999
1004344411999
1004465721999
1004491361999
1004344332000
1004344412000
10041004722000
1004344332001
1004344412001
10041115002001
1004344332002
1004344412002
10041098722002
10041195442002
1004344332003
1004344412003
10041098722003
10041195442003
10041098722004
10041195442004
10041536862006
10041536862007
10041653172007
10041536862008
10041653172008
10041536862009
10041653172009
10041878042009
10041653172010
10041972702010
10041878042010
10041972702011
10041878042011
10041972702012
10041878042012
10041972702013
10041878042013
10041972702014
10042416012014
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Do you have to use SQL ?

 

data have;
infile cards truncover expandtabs;
input CompanyID	LoanID	FYEAR;
cards;
1004	5006	1989
1004	5006	1990
1004	6460	1990
1004	5006	1991
1004	6460	1991
1004	7970	1991
1004	5006	1992
1004	6460	1992
1004	7970	1992
1004	5006	1993
1004	7970	1993
1004	5006	1994
1004	7970	1994
1004	5006	1995
1004	7970	1995
1004	7970	1996
1004	34433	1996
1004	34441	1996
1004	7970	1997
1004	34433	1997
1004	34441	1997
1004	45856	1997
1004	46572	1997
1004	49136	1997
1004	7970	1998
1004	34433	1998
1004	34441	1998
1004	45856	1998
1004	46572	1998
1004	49136	1998
1004	34433	1999
1004	34441	1999
1004	46572	1999
1004	49136	1999
1004	34433	2000
1004	34441	2000
1004	100472	2000
1004	34433	2001
1004	34441	2001
1004	111500	2001
1004	34433	2002
1004	34441	2002
1004	109872	2002
1004	119544	2002
1004	34433	2003
1004	34441	2003
1004	109872	2003
1004	119544	2003
1004	109872	2004
1004	119544	2004
1004	153686	2006
1004	153686	2007
1004	165317	2007
1004	153686	2008
1004	165317	2008
1004	153686	2009
1004	165317	2009
1004	187804	2009
1004	165317	2010
1004	197270	2010
1004	187804	2010
1004	197270	2011
1004	187804	2011
1004	197270	2012
1004	187804	2012
1004	197270	2013
1004	187804	2013
1004	197270	2014
1004	241601	2014
;
run;


proc SQL;
create table temp as
select CompanyID,FYEAR,count(distinct LoanID) as N_LOANS_1YEAR
from have 
group by CompanyID,FYEAR;

create table want  as
select *,case when 
(select count(*) from temp where CompanyID=a.CompanyID and FYEAR le a.FYEAR) lt 5 then .
 else (select avg(N_LOANS_1YEAR) from temp where CompanyID=a.CompanyID 
 and FYEAR between a.FYEAR-4 and a.FYEAR) end as AVG_LOANS_5YEAR,
(select max(N_LOANS_1YEAR) from temp where CompanyID=a.CompanyID 
 and FYEAR between a.FYEAR-4 and a.FYEAR) as MAX_LOANS_5YEAR
 
from temp as a;
quit;

View solution in original post

16 REPLIES 16
Reeza
Super User

 See example 12 + a WHERE clause. 

http://www.listendata.com/2014/04/proc-sql-select-statement.html

 

 

If you run into issues post the code and log and explain your issue in detail. 

Agent1592
Pyrite | Level 9

I tried this code but I am not sure if it is right?

 

proc SQL;
create table want  as
select a.* , count(unique b.LoanID) as NUMLOANS
from have as a
left join have as b
on a.CompanyID=b.CompanyID and a.FYEAR=b.FYEAR+5; quit;

 

Reeza
Super User

You join a table when you have information in another table or rows that you need. 

Nothing in your question of indicates a need for a join. Remove the join and use a where. I would recommend hardcoding the years first and then making it dynamic after you sort out the rest of the code. 

 

Unless you were looking for a moving average? Nothing in your question mentions this though. 

Agent1592
Pyrite | Level 9

No I do not want a moving average. I would like a moving count of the number of loans over the past 5 years. Thank you.

Is this correct?

 

proc SQL;
create table want as
select * , count(unique LoanID) as LOANFREQ
from have where FYEAR<=FYEAR+5
group by LoanID,CompanyID, fyear
order by CompanyID,fyear,LoanID; quit;

Reeza
Super User

LoanID can't be in your group statement since you want a count for it..

 

If you want a moving count (my mistake in average vs count) you do need a join. If you search through my posts in last week there's a good example of the three different methods in someone else's question. 

Astounding
PROC Star

If I'm interpreting your question correctly, this is too complex an approach.  When you say the past 5 years, don't you want:

 

  • One observation per company
  • A count of number of loans in the time period FYEAR >= 2011

If so, that's much easier.  SQL can do this in one step.  PROC FREQ can do it in two steps:

 

proc freq data=have;

tables CompanyID * LoanID / noprint out=counts;

where fyear >= 2011;

run;

proc freq data=counts;

tables CompanyID;

run;

 

Agent1592
Pyrite | Level 9
Thanks, I am looking for a moving count though. Not only 2011-2016 but since inception.
mkeintz
PROC Star

Rolling counts through (ugh!) SQL.  Assuming you have one record per loan in dataset HAVE, and each loan is a five year loan, orginating in year FYEAR:

Edit: forget the companyid:

 

proc sql;
  create table want as 
    select companyid, year, count(*) 
    from (
       (select companyid, fyear+0 as year from have) outer union corresponding
       (select companyid, fyear+1 as year from have) outer union corresponding
       (select companyid, fyear+2 as year from have) outer union corresponding
       (select companyid, fyear+3 as year from have) outer union corresponding
       (select companyid, fyear+4 as year from have)
        )
    group by companyid, year;
quit;

 

 

 

A far better way would be:

 

data need (keep=year)/view=need;
  set have;
  do year=fyear to fyear+4;
    output;
  end;
run;
proc freq data=need;
  tables companyid *year /list;
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Astounding
PROC Star

Since you want a moving count, I would modify what you started doing:

 

proc SQL;
create table want  as
select a.* , count(unique b.LoanID) as NUMLOANS
from have as a
left join have as b
on a.CompanyID=b.CompanyID and a.FYEAR=b.FYEAR+5; quit;

quit;

 

 

The first fix, is what gets counted.  (You'll have to check my syntax on this since my SQL is suspect.)

 

count(distinct b.CompanyID, b.LoanID)

 

Secondly, change to a WHERE condition instead of a join:

 

where a.CompanyID=b.CompanyID and (a.FYEAR >= b.FYEAR) and (a.FYEAR - 4 <= b.FYEAR)

 

You can adjust the time periods if I didn't get the 5-year window correctly.

Ksharp
Super User

But you didn't post the output yet.

 

data have;
infile cards truncover expandtabs;
input CompanyID	LoanID	FYEAR;
cards;
1004	5006	1989
1004	5006	1990
1004	6460	1990
1004	5006	1991
1004	6460	1991
1004	7970	1991
1004	5006	1992
1004	6460	1992
1004	7970	1992
1004	5006	1993
1004	7970	1993
1004	5006	1994
1004	7970	1994
1004	5006	1995
1004	7970	1995
1004	7970	1996
1004	34433	1996
1004	34441	1996
1004	7970	1997
1004	34433	1997
1004	34441	1997
1004	45856	1997
1004	46572	1997
1004	49136	1997
1004	7970	1998
1004	34433	1998
1004	34441	1998
1004	45856	1998
1004	46572	1998
1004	49136	1998
1004	34433	1999
1004	34441	1999
1004	46572	1999
1004	49136	1999
1004	34433	2000
1004	34441	2000
1004	100472	2000
1004	34433	2001
1004	34441	2001
1004	111500	2001
1004	34433	2002
1004	34441	2002
1004	109872	2002
1004	119544	2002
1004	34433	2003
1004	34441	2003
1004	109872	2003
1004	119544	2003
1004	109872	2004
1004	119544	2004
1004	153686	2006
1004	153686	2007
1004	165317	2007
1004	153686	2008
1004	165317	2008
1004	153686	2009
1004	165317	2009
1004	187804	2009
1004	165317	2010
1004	197270	2010
1004	187804	2010
1004	197270	2011
1004	187804	2011
1004	197270	2012
1004	187804	2012
1004	197270	2013
1004	187804	2013
1004	197270	2014
1004	241601	2014
;
run;


proc SQL;
create table want  as
select b.* ,(select count(unique a.LoanID) 
from have as a where a.CompanyID=b.CompanyID and 
a.FYEAR between b.FYEAR and b.FYEAR+5) as NUMLOANS
from have as b;
quit;
Agent1592
Pyrite | Level 9
Thanks you. Sorry this is the want table.
Agent1592
Pyrite | Level 9

Want table.


CompanyID
LoanID FYEARN_LOANS_1YEARAVG_LOANS_5YEARMAX_LOANS_5YEAR
1004500619891 1
1004500619902 2
1004500619913 3
1004500619923 3
10045006199322.23
10045006199422.43
10045006199522.43
10047970199632.43
100479701997636
10047970199863.86
100434433199944.26
100434433200034.46
100434433200134.46
1004344332002446
100434433200343.64
1004109872200423.24
1004153686200612.84
1004153686200722.64
1004153686200822.24
10041536862009323
1004165317201032.23
1004197270201122.43
1004197270201222.43
1004197270201322.43
1004197270201422.23
Ksharp
Super User

Do you have to use SQL ?

 

data have;
infile cards truncover expandtabs;
input CompanyID	LoanID	FYEAR;
cards;
1004	5006	1989
1004	5006	1990
1004	6460	1990
1004	5006	1991
1004	6460	1991
1004	7970	1991
1004	5006	1992
1004	6460	1992
1004	7970	1992
1004	5006	1993
1004	7970	1993
1004	5006	1994
1004	7970	1994
1004	5006	1995
1004	7970	1995
1004	7970	1996
1004	34433	1996
1004	34441	1996
1004	7970	1997
1004	34433	1997
1004	34441	1997
1004	45856	1997
1004	46572	1997
1004	49136	1997
1004	7970	1998
1004	34433	1998
1004	34441	1998
1004	45856	1998
1004	46572	1998
1004	49136	1998
1004	34433	1999
1004	34441	1999
1004	46572	1999
1004	49136	1999
1004	34433	2000
1004	34441	2000
1004	100472	2000
1004	34433	2001
1004	34441	2001
1004	111500	2001
1004	34433	2002
1004	34441	2002
1004	109872	2002
1004	119544	2002
1004	34433	2003
1004	34441	2003
1004	109872	2003
1004	119544	2003
1004	109872	2004
1004	119544	2004
1004	153686	2006
1004	153686	2007
1004	165317	2007
1004	153686	2008
1004	165317	2008
1004	153686	2009
1004	165317	2009
1004	187804	2009
1004	165317	2010
1004	197270	2010
1004	187804	2010
1004	197270	2011
1004	187804	2011
1004	197270	2012
1004	187804	2012
1004	197270	2013
1004	187804	2013
1004	197270	2014
1004	241601	2014
;
run;


proc SQL;
create table temp as
select CompanyID,FYEAR,count(distinct LoanID) as N_LOANS_1YEAR
from have 
group by CompanyID,FYEAR;

create table want  as
select *,case when 
(select count(*) from temp where CompanyID=a.CompanyID and FYEAR le a.FYEAR) lt 5 then .
 else (select avg(N_LOANS_1YEAR) from temp where CompanyID=a.CompanyID 
 and FYEAR between a.FYEAR-4 and a.FYEAR) end as AVG_LOANS_5YEAR,
(select max(N_LOANS_1YEAR) from temp where CompanyID=a.CompanyID 
 and FYEAR between a.FYEAR-4 and a.FYEAR) as MAX_LOANS_5YEAR
 
from temp as a;
quit;
Agent1592
Pyrite | Level 9
No i don't necessarily need SQL. If there is an easier SAS code I can use that. Thanks

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 2722 views
  • 2 likes
  • 5 in conversation