Hi,
I am working on a credit risk database using SAS Enterprise Guide 4.3. I have 40 monthly files with the same variables, of which two are of interest for this issue: one variable is a binary one called 'bad', who takes the value 1 when an account is in default and 0 otherwise, and another one called 'account-exposure'. What I need to do is retain the value of the account_exposure for each account_number at the moment of default, i.e. when the bad variable turns from 0 to 1 from one month to the next. The primary key in all the datasets is the account_number, and I have compiled a dataset with all the bad=1 from the last available month. (these are the accounts that need to be checked)
I am a bit stuck as I would basically need to look at two consecutive datasets at a time to compare the values of 'bad' and retain the account_exposure for each account_number if bad(dataset_month1)=1 and bad(dataset_month0)=0 but I am not sure how to do that going through all 40 datasets. The name of all datasets are something like blablabla_yyyymm, and have tought of something like below but I am not too familiar with macros:
%macro default(data);
Proc sql;
Create table defaultflag as
Select &data.account_number
&data.account_exposure
Substr(put(&data), length(put(&data))-3, 4) as default_date /*how do I transform the dataset name into a character string?*/
from &data
Where &data.bad=0 and &data.bad=1; /*this condition should be formulated in another way*/
Quit;
%mend default;
Any help on the above or any other ideas would be much appreciated!!
For your data and using SAS datasets I prefer to have monthly files over one big table. What you could do when creating a new monthly file is to also (re-) create a view over all the existing tables.
I believe I haven't fully understood how you want to create your new variable but I believe a variation of below could still allow for a "straight forward" solution approach without the need to create a big table or sorting of all the data.
/* create sample data */
data crl_201606;
input account_number account_exposure bad account_limit;
datalines;
574476 56.8 0 23
245245 67.5 0 24
252450 45.9 0 25
252457 90.5 1 26
252456 45.6 1 76
252351 23.3 0 5
568576 45.9 1 3
324553 0 1 32
857843 34.7 0 2
;
run;
data crl_201607;
input account_number account_exposure bad account_limit;
datalines;
574476 53.5 1 56
245245 67.8 0 45
252450 43.78 0 34
252457 74.7 1 56
252456 89.4 1 456
252351 21.5 0 23
568576 39.6 1 4
324553 2.5 1 53
857843 37.9 1 4
;
run;
data crl_201608;
input account_number account_exposure bad account_limit;
datalines;
574476 67.5 1 56
245245 68 1 45
252450 43.2 0 67
252457 74.7 1 4
252456 89.4 1 34
252351 21.5 0 23
568576 39.9 1 4
324553 2.9 1 53
857843 37.9 1 4
;
run;
/* create macro variable with list of available source tables */
proc sql noprint;
select catt(libname,'.',memname,' (keep=account_number account_exposure bad account_limit)') into :source_tbl_list separated by ' '
from dictionary.tables
where libname='WORK' and memname like 'CRL^_%' escape '^'
order by memname
;
quit;
/* analyse source tables and create new variable "want" */
data want /*(drop=_:)*/ ;
length source_tbl $ 41;
set &source_tbl_list indsname=_source_tbl;
source_tbl=_source_tbl;
if _n_=1 then
do;
length _account_exposure_time_of_bad 8;
dcl hash h1();
_rc=h1.defineKey('account_number');
_rc=h1.defineData('_account_exposure_time_of_bad');
_rc=h1.defineDone();
end;
if h1.check() ne 0 then
do;
if bad=1 then
do;
_account_exposure_time_of_bad=account_exposure;
_rc=h1.add();
end;
end;
else if h1.find()=0 then
do;
want=account_limit/_account_exposure_time_of_bad;
end;
run;
What I wasn't sure about:
1. Do you want to use the exposure value the first time things go "bad" or should it be the most recent "bad" one.
2. You talk about a selected list of accounts you need to look at (only the ones with a "bad" in the latest month or so). I haven't implemented this so the current code looks at all accounts. It wouldn't be to hard though to implement such an account selection as well. What you could do: create and load another hash table with only the accounts you need to look at in the very first iteration of the "want" data step. Then in every iteration of the data step look up the account_number in the hash (using check() method) and if not found in the hash go straight to the next iteration (e.g. by using "return" statement).
Adding a variation for table selection in case you have other tables or views starting with CRL_ in your library.
/* create macro variable with list of available source tables */
%let source_tbl_list=;
proc sql noprint;
select catt(libname,'.',memname,' (keep=account_number account_exposure bad account_limit)') into :source_tbl_list separated by ' '
from dictionary.tables
where libname='WORK' and memtype='DATA' and prxmatch('/^crl_\d{6}\s*$/oi',memname)>0
order by memname
;
quit;
%put &=source_tbl_list;
Put all your data together, then you can sort it, retain values etc.
Example:
data temp_201608; id=1;var1=0;output; id=3;var1=1;output; run; data temp_201607; id=2;var1=1;output; id=3;var1=0;output; run; data have; set temp: indsname=n; dte=scan(n,2,"_"); run; proc sort data=have; by id dte; run;
Its never a good idea to work with many datasets each with "data" in the name.
My first question as an architect would be if it's possible to store all files in the same data set/table? The would minimize advanced (unnecassary) macro coding.
That said, please attach some sample data of what you have, and what you want?
Thanks,
I was avoinding a concatenation or append procedure as the datasets are many (40) and pretty massive (about 3m entries each). However, in case this solution is the easiest one I am happy to listen. I guess it might work by creating a new variable with the data (yyyymm) into the concatenation and comparing the bad variable for each consecutive months given the same account number? I cannot give any sample of the data as it is confidential, however I have attached a very raw example in Excel- hope it is sufficiently clear.
For your data and using SAS datasets I prefer to have monthly files over one big table. What you could do when creating a new monthly file is to also (re-) create a view over all the existing tables.
I believe I haven't fully understood how you want to create your new variable but I believe a variation of below could still allow for a "straight forward" solution approach without the need to create a big table or sorting of all the data.
/* create sample data */
data crl_201606;
input account_number account_exposure bad account_limit;
datalines;
574476 56.8 0 23
245245 67.5 0 24
252450 45.9 0 25
252457 90.5 1 26
252456 45.6 1 76
252351 23.3 0 5
568576 45.9 1 3
324553 0 1 32
857843 34.7 0 2
;
run;
data crl_201607;
input account_number account_exposure bad account_limit;
datalines;
574476 53.5 1 56
245245 67.8 0 45
252450 43.78 0 34
252457 74.7 1 56
252456 89.4 1 456
252351 21.5 0 23
568576 39.6 1 4
324553 2.5 1 53
857843 37.9 1 4
;
run;
data crl_201608;
input account_number account_exposure bad account_limit;
datalines;
574476 67.5 1 56
245245 68 1 45
252450 43.2 0 67
252457 74.7 1 4
252456 89.4 1 34
252351 21.5 0 23
568576 39.9 1 4
324553 2.9 1 53
857843 37.9 1 4
;
run;
/* create macro variable with list of available source tables */
proc sql noprint;
select catt(libname,'.',memname,' (keep=account_number account_exposure bad account_limit)') into :source_tbl_list separated by ' '
from dictionary.tables
where libname='WORK' and memname like 'CRL^_%' escape '^'
order by memname
;
quit;
/* analyse source tables and create new variable "want" */
data want /*(drop=_:)*/ ;
length source_tbl $ 41;
set &source_tbl_list indsname=_source_tbl;
source_tbl=_source_tbl;
if _n_=1 then
do;
length _account_exposure_time_of_bad 8;
dcl hash h1();
_rc=h1.defineKey('account_number');
_rc=h1.defineData('_account_exposure_time_of_bad');
_rc=h1.defineDone();
end;
if h1.check() ne 0 then
do;
if bad=1 then
do;
_account_exposure_time_of_bad=account_exposure;
_rc=h1.add();
end;
end;
else if h1.find()=0 then
do;
want=account_limit/_account_exposure_time_of_bad;
end;
run;
What I wasn't sure about:
1. Do you want to use the exposure value the first time things go "bad" or should it be the most recent "bad" one.
2. You talk about a selected list of accounts you need to look at (only the ones with a "bad" in the latest month or so). I haven't implemented this so the current code looks at all accounts. It wouldn't be to hard though to implement such an account selection as well. What you could do: create and load another hash table with only the accounts you need to look at in the very first iteration of the "want" data step. Then in every iteration of the data step look up the account_number in the hash (using check() method) and if not found in the hash go straight to the next iteration (e.g. by using "return" statement).
Adding a variation for table selection in case you have other tables or views starting with CRL_ in your library.
/* create macro variable with list of available source tables */
%let source_tbl_list=;
proc sql noprint;
select catt(libname,'.',memname,' (keep=account_number account_exposure bad account_limit)') into :source_tbl_list separated by ' '
from dictionary.tables
where libname='WORK' and memtype='DATA' and prxmatch('/^crl_\d{6}\s*$/oi',memname)>0
order by memname
;
quit;
%put &=source_tbl_list;
Thanks a lot, that sounds like a good alternative! Computing the variable is easy, the below is the part I was struggling with.
Regarding your two point, yes, I need the first instance of 'bad', while the second one I belive I can solve with either a where subsetting of the most recent dataset or with yout idea, to save some processing time.
Just not sure what the if _n_=1 stands for?
Will try it tomorrow on the database to see how it goes 🙂
A hash table gets instantiated during execution time but we want to only instantiate the hash table once. "_n_" is an automatic variable which contains the count of the iteration of the data step (so 1 in the first iteration, 2 in the 2nd itereation and so on).
"if _n_=1" lets us do things only once in the very first iteration of the data step (here: instantiate the hash table).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.