Good morning friends,
I need help reconstructing the following table:
data test;
input account_number time loan_amount default_status;
datalines;
410 201601 5000 0
410 201602 5000 0
410 201603 5000 1
411 201601 3000 0
411 201602 3000 0
411 201603 3000 0
412 201601 2500 1
412 201601 2500 0
412 201601 2500 0
;
Using the above table, I want to generate a table that will return: 1.account_number, loan_amount, defaulted during the three months evaluation. For example, for account 410 since there one default, I want to obtain the values below:
data result;
input account_number loan_amount defaulted;
datalines;
410 5000 1
411 3000 0
412 2500 1
First, thank you for providing a usable data step for the sample data. It makes my heart sing.
You appear to have only 3 monthly records per id, sorted by month within id. So any instance of default status=1 means you assign default status=1 to the id.
Here is a minimalist programming solution.
data test;
input account_number time loan_amount default_status;
datalines;
410 201601 5000 0
410 201602 5000 0
410 201603 5000 1
411 201601 3000 0
411 201602 3000 0
411 201603 3000 0
412 201601 2500 1
412 201601 2500 0
412 201601 2500 0
;
data want (drop=time);
set test test (where=(default_status=1));
by account_number;
if last.account_number;
run;
The "trick" here is that a SET statement with a BY statement interleaves observations sorted by account_number. That is all records from the first SET operand (i.e. all of test) precede all matching observations for the second SET operand (only defaults). So if there are defaults the last record for an account will have default_status=1. If there are no such records, then the account_number ends with a default=0 record. The "if last.account_number;" is a subsetting if, telling sas to keep only the final incoming record for each account.
Note: Using SET with by is usually done with different datasets (e.g. SET BOYS GIRLS; by id;), but can be very usefull in interleaving a dataset with itself, as above.
Proc sql with group by and the max() summary function:
proc sql;
create table want as
select
account_number,
loan_amount,
max(default_status) as default_status
from test
group by
account_number,
loan_amount
;
quit;
PROC SORT might work for you.
proc sort data=test ;
by account_number loan_amount descending default_status;
run;
proc sort data=test out=want nodupkey;
by account_number loan_amount;
run;
I'd rather use a data step as the second step:
proc sort data=test;
by account_number loan_amount descending default_status;
run;
data want;
set test;
by account_number loan_amount;
if first.loan_amount;
run;
First, thank you for providing a usable data step for the sample data. It makes my heart sing.
You appear to have only 3 monthly records per id, sorted by month within id. So any instance of default status=1 means you assign default status=1 to the id.
Here is a minimalist programming solution.
data test;
input account_number time loan_amount default_status;
datalines;
410 201601 5000 0
410 201602 5000 0
410 201603 5000 1
411 201601 3000 0
411 201602 3000 0
411 201603 3000 0
412 201601 2500 1
412 201601 2500 0
412 201601 2500 0
;
data want (drop=time);
set test test (where=(default_status=1));
by account_number;
if last.account_number;
run;
The "trick" here is that a SET statement with a BY statement interleaves observations sorted by account_number. That is all records from the first SET operand (i.e. all of test) precede all matching observations for the second SET operand (only defaults). So if there are defaults the last record for an account will have default_status=1. If there are no such records, then the account_number ends with a default=0 record. The "if last.account_number;" is a subsetting if, telling sas to keep only the final incoming record for each account.
Note: Using SET with by is usually done with different datasets (e.g. SET BOYS GIRLS; by id;), but can be very usefull in interleaving a dataset with itself, as above.
Thanks for the solution and detailed solution, I was able to get the intended result.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.