BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Bankshot
Obsidian | Level 7

Hi,

 

I'm trying to calculate a value (final_bal) using a sorted dataset I have. The dataset exposure has a structure like the following:

 

Accno   Date   Def_type   Adj_bal

1001   2008/01   Loss   $3500

1001   2008/06   Loss   $1100

1001   2008/09   Fee    $260

1002   2008/03   Fee    $2200

1002   2008/09   Loss   $800

1002   2009/01    Fee   $260

1003   2016/11   Fee    $15000

1003   2017/01   Loss   8200

1003   2017/03   Loss  40

 

The dataset is sorted by account-number, then by date. What I'm to do is to come up with a new column that's called final_bal, and it will be the adj_bal value dependent on the first Def_type is. Here is the logic I'm trying to implement into my code:

If the first Def_type of the account is called 'Loss', then I'll pick the value of that date (ex. $3500 for account 1001) regardless what status the later dates have. However if the first value of the account is called 'Fee', then I'll pick the last value (ex. $40 for account 1003) regardless what status the later dates have. How do I get this done? I tried following approaches:

data select;
	set exposure;
	retain final_bal;
	if first.accno then do;
		if def_type = 'Loss' then final_bal = first.Adj_Bal;
		else if def_type = 'Fee' then final_bal = last.Adj_Bal;
	end;
run;

That didn't work. Also I tried something like the following but it doesn't work (desperate attempt) the logic also failed.

data test (keep = accno date def_type Adj_Bal);
set exposure;
by accno date;
if first.accno or last.accno;
run;

data test2;
set test;
retain EAD;
Bal_prev = lag(Adj_Bal);
Type_prev = lag(def_type);
if first.accno and def_type = 'Loss'
then final_bal = Adj_Bal;
if not first.accno and Type_prev = 'Loss'
then final_bal = Bal_prev;
else if not first.accno and Type_prev = 'Fee'
then final_bal = Adj_Bal;
else if first.accno and def_type = 'Fee'
then final_bal = .;
run;

 

What am I doing wrong? Most importantly, what's the best way to get this done, whether in a data step or proc sql? 

I hope I made this explanation clear. Thank you for your contribution to this question I have.

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

I would do this:

 

data HAVE;
  input ACCNO   DATE $  DEF_TYPE  $ ADJ_BAL dollar8.;
cards;
1001   2008/01   Loss   $3500
1001   2008/06   Loss   $1100
1001   2008/09   Fee    $260
1002   2008/03   Fee    $2200
1002   2008/09   Loss   $800
1002   2009/01   Fee   $260
1003   2016/11   Fee    $15000
1003   2017/01   Loss   8200
1003   2017/03   Loss  40
;

data want;
do until (last.accno);
    set have; by accno;
    if first.accno then first_type = def_type;
    if first_type = "Loss" and first.accno then final_bal = adj_bal;
    if first_type = "Fee" and last.accno then final_bal = adj_bal;
    end;
do until (last.accno);
    set have; by accno;
    output;
    end;
drop first_type;
run;
PG

View solution in original post

7 REPLIES 7
ChrisNZ
Tourmaline | Level 20

I suspect that's not what you want, but since you haven't given the output you want, here is what you asked for.

 

data HAVE;
  input ACCNO   DATE $  DEF_TYPE  $ ADJ_BAL dollar8.;
cards;
1001   2008/01   Loss   $3500
1001   2008/06   Loss   $1100
1001   2008/09   Fee    $260
1002   2008/03   Fee    $2200
1002   2008/09   Loss   $800
1002   2009/01   Fee   $260
1003   2016/11   Fee    $15000
1003   2017/01   Loss   8200
1003   2017/03   Loss  40
run;
data WANT;
  set HAVE;
  by ACCNO;
  retain FINAL_BAL;
  if first.ACCNO then FINAL_BAL=ifn(DEF_TYPE='Loss', ADJ_BAL,.);
  if last.ACCNO & FINAL_BAL=. then FINAL_BAL=ADJ_BAL;
run;

 

ACCNO DATE DEF_TYPE ADJ_BAL FINAL_BAL
1001 2008/01 Loss 3500 3500
1001 2008/06 Loss 1100 3500
1001 2008/09 Fee 260 3500
1002 2008/03 Fee 2200 .
1002 2008/09 Loss 800 .
1002 2009/01 Fee 260 260
1003 2016/11 Fee 1500 .
1003 2017/01 Loss 8200 .
1003 2017/03 Loss 40 40

 

 

 

Shmuel
Garnet | Level 18

To use first.<variable> or last.<variable> yo need a BY statement.

The outcome value of either first.<var> or last.<var> is either 1 or 0, depending on the observation in group,

it does not contain the value of the variable.

 

In your code you defined retain EAD statement but never used it. 

What was your intention by it ? Didn't you got EAD missing in all observations?

 

In the 2nd step there are at most 2 obs per ACCNO, so any obs may be either first.accno or last.accno or both at same time

in case there is only 1 obs per accno.

data test (keep = accno date def_type Adj_Bal);
 set exposure;
 by accno date;
 if first.accno or last.accno;
run;

/* the outcome of 1st step is the most 2 obs per accno */

data test2; 
 set test;
  by accno;         /* <<< line added */

 retain EAD;
 Bal_prev = lag(Adj_Bal);
 Type_prev = lag(def_type);

 if first.accno and def_type = 'Loss'
 then final_bal = Adj_Bal;

 if last.accno and Type_prev = 'Loss'
 then final_bal = Bal_prev;
 else if last.accno and Type_prev = 'Fee' 
 then final_bal = Adj_Bal;
 else if first.accno and def_type = 'Fee' 
 then final_bal = .;
run;

 

 

 

shahparth260
Quartz | Level 8
 /** GROUP BASED ON VALUES **/ 

data have;
	set have;
	by accno;

	if first.accno and substr(def_type,1,1) eq "L" then grp="F"; /** L denotes Loss**/
	else if first.accno and substr(def_type,1,1) eq "F" then grp="L"; /** F detones Fees **/
	else grp=" ";

run;

/** LOCF CONCEPT ***/ 

data have;
	set have;
	retain locf;/** LAST OBS CARRY FORWARD **/
	by accno;

	if first.accno then locf=grp;
	if locf eq  " " then grp=locf;

	drop grp;
	rename locf=grp;
run;

/** DIVIDED IN TO TWO SEPARATE DATASET FOR FURTHER PROCESS **/ 

data first last;
	set have;
	if grp eq "F" then output first;
	else output last;
run;

/** AS PER NEED FIRST.ACCNO**/ 

data first;
	set first;
	by accno;
	if first.accno;
run;

/** AS PER NEED LAST.ACCNO ***/ 

data last;
	set last;
	by accno;
	if last.accno;
run;

/** SET ALL TOGHER AND FINAL OUTPUT **/ 

data final;
	set first last;
	by accno;
run;

@Shmuel wrote:

To use first.<variable> or last.<variable> yo need a BY statement.

The outcome value of either first.<var> or last.<var> is either 1 or 0, depending on the observation in group,

it does not contain the value of the variable.

 

In your code you defined retain EAD statement but never used it. 

What was your intention by it ? Didn't you got EAD missing in all observations?

 

In the 2nd step there are at most 2 obs per ACCNO, so any obs may be either first.accno or last.accno or both at same time

in case there is only 1 obs per accno.

data test (keep = accno date def_type Adj_Bal);
 set exposure;
 by accno date;
 if first.accno or last.accno;
run;

/* the outcome of 1st step is the most 2 obs per accno */

data test2; 
 set test;
  by accno;         /* <<< line added */

 retain EAD;
 Bal_prev = lag(Adj_Bal);
 Type_prev = lag(def_type);

 if first.accno and def_type = 'Loss'
 then final_bal = Adj_Bal;

 if last.accno and Type_prev = 'Loss'
 then final_bal = Bal_prev;
 else if last.accno and Type_prev = 'Fee' 
 then final_bal = Adj_Bal;
 else if first.accno and def_type = 'Fee' 
 then final_bal = .;
run;

 

 

 


There should many methods do it but this seems fair easy and very quick to understand at any level your experience. 

I hope it helps 

PS
PGStats
Opal | Level 21

I would do this:

 

data HAVE;
  input ACCNO   DATE $  DEF_TYPE  $ ADJ_BAL dollar8.;
cards;
1001   2008/01   Loss   $3500
1001   2008/06   Loss   $1100
1001   2008/09   Fee    $260
1002   2008/03   Fee    $2200
1002   2008/09   Loss   $800
1002   2009/01   Fee   $260
1003   2016/11   Fee    $15000
1003   2017/01   Loss   8200
1003   2017/03   Loss  40
;

data want;
do until (last.accno);
    set have; by accno;
    if first.accno then first_type = def_type;
    if first_type = "Loss" and first.accno then final_bal = adj_bal;
    if first_type = "Fee" and last.accno then final_bal = adj_bal;
    end;
do until (last.accno);
    set have; by accno;
    output;
    end;
drop first_type;
run;
PG
Bankshot
Obsidian | Level 7

Thank you PGStats! This worked like a charm!

One more follow-up question, just for the sake of my personal understanding, why did you put

set test; by accno;

within the do loop rather than outside of the loop? Also, why divide the work into two loops? In another word, how come combining the two doesn't work?

data test; 
do until (last.accno);
	set have; by accno;
	if first.accno then first_type = def_type;
	if first_type = 'Loss' and first.accno then final_bal= adj_bal;
	if first_type = 'Waiver' and last.accno then final_bal = adj_bal;
	output; 
	end;
drop first_type;
run;
PGStats
Opal | Level 21

@Bankshot, The do until(last...) loop is a very useful technique for gathering information about by-groups. If you only need the summary, you can use a single loop. If you want to redistribute the summary information on the original data, you need a second loop, the first one to gather the summary, the second to associate it with the original data.

 

The set...; by...; statements must reside inside the loop to change the value of the last.xxx variable before it is tested at the end of the loop against the until() condition.

PG
Bankshot
Obsidian | Level 7

This is very helpful. Thanks again

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2116 views
  • 2 likes
  • 5 in conversation