Solved: Table transformation

samface · Posted 09-24-2018 10:12 AM

Good morning friends,

I need help reconstructing the following table:

data test;
input account_number time loan_amount default_status;
datalines;
410 201601 5000 0
410 201602 5000 0
410 201603 5000 1
411 201601 3000 0
411 201602 3000 0
411 201603 3000 0
412 201601 2500 1
412 201601 2500 0
412 201601 2500 0
;

Using the above table, I want to generate a table that will return: 1.account_number, loan_amount, defaulted during the three months evaluation. For example, for account 410 since there one default, I want to obtain the values below:

data result;
input account_number loan_amount defaulted;
datalines;
410 5000 1
411 3000 0
412 2500 1

mkeintz · Posted 09-24-2018 10:45 AM

First, thank you for providing a usable data step for the sample data. It makes my heart sing.

You appear to have only 3 monthly records per id, sorted by month within id. So any instance of default status=1 means you assign default status=1 to the id.

Here is a minimalist programming solution.

data test;
input account_number time loan_amount default_status;
datalines;
410 201601 5000 0
410 201602 5000 0
410 201603 5000 1
411 201601 3000 0
411 201602 3000 0
411 201603 3000 0
412 201601 2500 1
412 201601 2500 0
412 201601 2500 0
;
data want (drop=time);
  set test  test (where=(default_status=1));
  by account_number;
  if last.account_number;
run;

The "trick" here is that a SET statement with a BY statement interleaves observations sorted by account_number. That is all records from the first SET operand (i.e. all of test) precede all matching observations for the second SET operand (only defaults). So if there are defaults the last record for an account will have default_status=1. If there are no such records, then the account_number ends with a default=0 record. The "if last.account_number;" is a subsetting if, telling sas to keep only the final incoming record for each account.

Note: Using SET with by is usually done with different datasets (e.g. SET BOYS GIRLS; by id;), but can be very usefull in interleaving a dataset with itself, as above.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

Kurt_Bremser · Posted 09-24-2018 10:26 AM

Proc sql with group by and the max() summary function:

proc sql;
create table want as
select
  account_number,
  loan_amount,
  max(default_status) as default_status
from test
group by
  account_number,
  loan_amount
;
quit;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

SuryaKiran · Posted 09-24-2018 10:33 AM

PROC SORT might work for you.

proc sort data=test  ;
by account_number loan_amount descending default_status;
run; 
proc sort data=test out=want nodupkey;
by account_number loan_amount;
run;

Thanks,
Suryakiran

Kurt_Bremser · Posted 09-24-2018 10:37 AM

I'd rather use a data step as the second step:

proc sort data=test;
by account_number loan_amount descending default_status;
run;

data want;
set test;
by account_number loan_amount;
if first.loan_amount;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

samface · Posted 09-24-2018 11:24 AM

Thanks your code also works! 🙂

samface · Posted 09-24-2018 11:25 AM

Thanks this code solved my problem also.

mkeintz · Posted 09-24-2018 10:45 AM

First, thank you for providing a usable data step for the sample data. It makes my heart sing.

You appear to have only 3 monthly records per id, sorted by month within id. So any instance of default status=1 means you assign default status=1 to the id.

Here is a minimalist programming solution.

data test;
input account_number time loan_amount default_status;
datalines;
410 201601 5000 0
410 201602 5000 0
410 201603 5000 1
411 201601 3000 0
411 201602 3000 0
411 201603 3000 0
412 201601 2500 1
412 201601 2500 0
412 201601 2500 0
;
data want (drop=time);
  set test  test (where=(default_status=1));
  by account_number;
  if last.account_number;
run;

The "trick" here is that a SET statement with a BY statement interleaves observations sorted by account_number. That is all records from the first SET operand (i.e. all of test) precede all matching observations for the second SET operand (only defaults). So if there are defaults the last record for an account will have default_status=1. If there are no such records, then the account_number ends with a default=0 record. The "if last.account_number;" is a subsetting if, telling sas to keep only the final incoming record for each account.

Note: Using SET with by is usually done with different datasets (e.g. SET BOYS GIRLS; by id;), but can be very usefull in interleaving a dataset with itself, as above.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

samface · Posted 09-24-2018 11:24 AM

Thanks for the solution and detailed solution, I was able to get the intended result.

Table transformation

Re: Table transformation

Re: Table transformation

Re: Table transformation

Re: Table transformation

Re: Table transformation

Re: Table transformation

Re: Table transformation

Re: Table transformation

Registration is open

SAS Training: Just a Click Away