BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
samface
Calcite | Level 5

Good morning friends,

 

I need help reconstructing the following table:

 

data test;
input account_number time loan_amount default_status;
datalines;
410 201601 5000 0
410 201602 5000 0
410 201603 5000 1
411 201601 3000 0
411 201602 3000 0
411 201603 3000 0
412 201601 2500 1
412 201601 2500 0
412 201601 2500 0
;

 

Using the above table, I want to generate a table that will return: 1.account_number, loan_amount, defaulted during the three months evaluation. For example, for account 410 since there one default, I want to obtain the values below:

 

data result;
input account_number loan_amount defaulted;
datalines;
410 5000 1 411 3000 0 412 2500 1

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

First, thank you for providing a usable data step for the sample data.  It makes my heart sing.

 

You appear to have only 3 monthly records per id, sorted by month within id.  So any instance of default status=1 means you assign default status=1 to the id.

 

Here is a minimalist programming solution.

 

data test;
input account_number time loan_amount default_status;
datalines;
410 201601 5000 0
410 201602 5000 0
410 201603 5000 1
411 201601 3000 0
411 201602 3000 0
411 201603 3000 0
412 201601 2500 1
412 201601 2500 0
412 201601 2500 0
;
data want (drop=time);
  set test  test (where=(default_status=1));
  by account_number;
  if last.account_number;
run;

 

 

The "trick" here is that a SET statement with a BY statement interleaves observations sorted by account_number.  That is all records from the first SET operand (i.e. all of test) precede all matching observations for the second SET operand (only defaults).  So if there are defaults the last record for an account will have default_status=1.  If there are no such records, then the account_number ends with a default=0 record.   The "if last.account_number;" is a subsetting if, telling sas to keep only the final incoming record for each account.

 

Note:  Using SET with by is usually done with different datasets  (e.g.  SET BOYS GIRLS; by id;), but can be very usefull in interleaving a dataset with itself, as above.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

7 REPLIES 7
Kurt_Bremser
Super User

Proc sql with group by and the max() summary function:

proc sql;
create table want as
select
  account_number,
  loan_amount,
  max(default_status) as default_status
from test
group by
  account_number,
  loan_amount
;
quit;
SuryaKiran
Meteorite | Level 14

PROC SORT might work for you.

proc sort data=test  ;
by account_number loan_amount descending default_status;
run; 
proc sort data=test out=want nodupkey;
by account_number loan_amount;
run; 
Thanks,
Suryakiran
Kurt_Bremser
Super User

I'd rather use a data step as the second step:

proc sort data=test;
by account_number loan_amount descending default_status;
run;

data want;
set test;
by account_number loan_amount;
if first.loan_amount;
run;
samface
Calcite | Level 5
Thanks your code also works! 🙂
samface
Calcite | Level 5
Thanks this code solved my problem also.
mkeintz
PROC Star

First, thank you for providing a usable data step for the sample data.  It makes my heart sing.

 

You appear to have only 3 monthly records per id, sorted by month within id.  So any instance of default status=1 means you assign default status=1 to the id.

 

Here is a minimalist programming solution.

 

data test;
input account_number time loan_amount default_status;
datalines;
410 201601 5000 0
410 201602 5000 0
410 201603 5000 1
411 201601 3000 0
411 201602 3000 0
411 201603 3000 0
412 201601 2500 1
412 201601 2500 0
412 201601 2500 0
;
data want (drop=time);
  set test  test (where=(default_status=1));
  by account_number;
  if last.account_number;
run;

 

 

The "trick" here is that a SET statement with a BY statement interleaves observations sorted by account_number.  That is all records from the first SET operand (i.e. all of test) precede all matching observations for the second SET operand (only defaults).  So if there are defaults the last record for an account will have default_status=1.  If there are no such records, then the account_number ends with a default=0 record.   The "if last.account_number;" is a subsetting if, telling sas to keep only the final incoming record for each account.

 

Note:  Using SET with by is usually done with different datasets  (e.g.  SET BOYS GIRLS; by id;), but can be very usefull in interleaving a dataset with itself, as above.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
samface
Calcite | Level 5

Thanks for the solution and detailed solution, I was able to get the intended result. 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1820 views
  • 3 likes
  • 4 in conversation