BookmarkSubscribeRSS Feed
brulard
Pyrite | Level 9

hello,

I've been struggling to solve the following. I have large dataset ordered by acct ID, from date, to date (sorted descending), and  amount (added the last two dummy columns as shown below).

 

**Editing table for clarity. 

The first record will always contain a negative amount, with a to date [eff_to_dt0]  of 31dec9999.

 

Trying to obtain only, by customer ID, each (i) consecutive record(s) where (ii) the amount is negative. 

The consecutive record has a TO DATE column, which would always be 1 day prior, to preceding record's FROM DATE column.

 

Data have:

 

ID

eff_from_dt0

eff_to_dt0

crnt_bal_amt0

GROUP_ID

ROW

000000090

22-Jun-08

31-Dec-99

-3.62

8

1731

000000079

22-Jun-08

31-Dec-99

-1926.52

9

1732

000000002

4-Oct-18

31-Dec-99

-17.11

10

1733

000000002

4-Sep-18

3-Oct-18

-18.57

10

1734

000000002

4-Aug-18

3-Sep-18

-20.03

10

1735

000000002

4-Jul-18

3-Aug-18

-21.49

10

1736

000000002

28-Dec-17

3-Jul-18

-22.95

10

1737

000000002

27-Dec-17

27-Dec-17

-56.19

10

1738

000000002

18-Nov-17

26-Dec-17

-51.19

10

1739

000000002

24-Apr-17

17-Nov-17

-49.38

10

1740

000000002

27-Mar-17

23-Apr-17

-79.38

10

1741

000000002

18-Mar-17

26-Mar-17

101.5

10

1742

000000002

6-Mar-17

17-Mar-17

99.38

10

1743

000000002

9-Feb-17

5-Mar-17

79.38

10

1744

000000002

30-Jan-17

8-Feb-17

47.7

10

1745

000000069

20-Jun-09

31-Dec-99

-6819.53

11

1888

000000069

22-Jun-08

19-Jun-09

6164.97

11

1889

000000013

12-Nov-17

31-Dec-99

-23.96

12

1890

000000013

12-Oct-17

11-Nov-17

-16.62

12

1891

000000013

28-Sep-17

11-Oct-17

-20

12

1892

000000013

12-Sep-17

27-Sep-17

460.74

12

1893

000000013

29-Aug-17

11-Sep-17

449.34

12

1894

000000013

17-Aug-17

28-Aug-17

949.34

12

1895

 

Data want:

 

ID

eff_from_dt0

eff_to_dt0

crnt_bal_amt0

GROUP_ID

ROW

000000090

22-Jun-08

31-Dec-99

-3.62

8

1731

000000079

22-Jun-08

31-Dec-99

-1926.52

9

1732

000000002

4-Oct-18

31-Dec-99

-17.11

10

1733

000000002

4-Sep-18

3-Oct-18

-18.57

10

1734

000000002

4-Aug-18

3-Sep-18

-20.03

10

1735

000000002

4-Jul-18

3-Aug-18

-21.49

10

1736

000000002

28-Dec-17

3-Jul-18

-22.95

10

1737

000000002

27-Dec-17

27-Dec-17

-56.19

10

1738

000000002

18-Nov-17

26-Dec-17

-51.19

10

1739

000000002

24-Apr-17

17-Nov-17

-49.38

10

1740

000000002

27-Mar-17

23-Apr-17

-79.38

10

1741

000000069

20-Jun-09

31-Dec-99

-6819.53

11

1888

000000013

12-Nov-17

31-Dec-99

-23.96

12

1890

000000013

12-Oct-17

11-Nov-17

-16.62

12

1891

000000013

28-Sep-17

11-Oct-17

-20

12

1892

 


*** As suggested, i'm adding a large sample in the attached csv file, already sorted. VAR Group_ID marks each new distinct ID, which with first record always have a negative balance, and the TO DATE of 31dec9999 (ie, current). Thanks 

 

 We tried using LAG but as indicated elsewhere it led to missing records. Any help is appreciated. Thanks,

10 REPLIES 10
brulard
Pyrite | Level 9

Hi novinosrin, thanks for tip. I put an asterisk next to the desired record:

 

 Id amount toDate;

*001 -1 9999
*001 -2 9998
001 3 9997
001 -3 9996
*002 -3 9999
*002 -3 9998
*003 -2 9997
003 2 9996
*004 -2 9999
*004 -1 9998
*004 -3 9997
004 3 9996
004 -3 9995
*005 -3 9999
*005 -1 9998
*005 -2 9997
005 0 9996
005 -2 9995

novinosrin
Tourmaline | Level 20
data have;
input Id amount toDate;
datalines;
001 -1 9999
001 -2 9998
001 3 9997
001 -3 9996
002 -3 9999
002 -3 9998
003 -2 9997
003 2 9996
004 -2 9999
004 -1 9998
004 -3 9997
004 3 9996
004 -3 9995
005 -3 9999
005 -1 9998
005 -2 9997
005 0 9996
005 -2 9995
run;
data want;
set have;
by id;
if first.id and amount<0 or  amount<0 and lag(amount)<0 and id=lag(id) ;
run;
brulard
Pyrite | Level 9
Thank you sir... this looks promising, will just test it out on my expanded dataset...
novinosrin
Tourmaline | Level 20

@brulard  Thank you for clarifying 

 

Ok Test and let me know sir/mam

brulard
Pyrite | Level 9

Hi,

This is similar to our own attempt using LAG, where testing using small sample worked, but when applying to larger sample it didn’t.

Example. I’ve recoded your code here for my real data:

data temp;

set   a.all (obs=60000);

by SRC_SYS_ACCT_ID /*descending eff_to_dt0*/;

if first.SRC_SYS_ACCT_ID and crnt_bal_amt0<0 or crnt_bal_amt0<0 and lag(crnt_bal_amt0)<0

and SRC_SYS_ACCT_ID=lag(SRC_SYS_ACCT_ID);

run;

data tt1;set temp (where=(SRC_SYS_ACCT_ID='000000xxxxxx'));run;

data tt2;set a.all (where=(SRC_SYS_ACCT_ID='000000xxxxxx'));run;

Below is a visual of the output:

 

Here is real data, where only the first record should be outputted.

Note the eff from dt0 precedes the eff to dt0 in the next record.

 

Table TT2

SRC_SYS_ACCT_ID

eff_from_dt0

eff_to_dt0

crnt_bal_amt0

000000xxxxxx

21mar2018

31dec9999

-0.110

000000xxxxxx

06MAR2018

20MAR2018

128.890

Here is output using ur suggestion.There were in total 6 records generated, and note the break in the eff from dt0 , to the eff to dt0.

 

Table TT1

SRC_SYS_ACCT_ID

eff_from_dt0

eff_to_dt0

crnt_bal_amt0

000000xxxxxx

21mar2018

31dec9999

-0.110

000000xxxxxx

29JAN2013

07FEB2013

-120.910

 

 

Any thoughts or suggestion? Apologies if my issue description was not clear, if you have a tip, pls let me know.

Thank you

novinosrin
Tourmaline | Level 20

Hmm the logic i wrote didn't have anything to do with the dates at all , and your real data has effective from date and effective to date.

 

So what concerns me is that does those dates have to be given serious consideration to the logic. Can you verify the business requirement, and make a better and representative sample(a good mock data) so I can play with. 

 

Your real data rings the bell of some transaction data that I was part of transaction banking analytics couple of years ago which makes me think other variables may have be taken into account in the logic. Well, at this point, let me think through if something I can simulate and test with some reasoning and in the meantime, I am sure somebody else will perhaps give us the much needed solution.

 

Also, if you could get more details, that can only help. (IMHO, may be this is all one needs and my dumb brain isn't grasping lol) let's see

brulard
Pyrite | Level 9
No no lol, you are very smart and I appreciate the posts and effort you put into helping folks such as myself that have less experience learn (and be patient in the process)...
Let me see tomorrow if can put a larger sample data set together... Also perhaps using LAG approach, may not be the ticket home.
novinosrin
Tourmaline | Level 20

Lol luckily we have many people who answer here . 🙂 Anyways, something should work out and I will be stay focused until you close this thread

r_behata
Barite | Level 11
data have;
input Id amount toDate;
datalines;
001 -1 9999
001 -2 9998
001 3 9997
001 -3 9996
002 -3 9999
002 -3 9998
003 -2 9997
003 2 9996
004 -2 9999
004 -1 9998
004 -3 9997
004 3 9996
004 -3 9995
005 -3 9999
005 -1 9998
005 -2 9997
005 0 9996
005 -2 9995
;
run;

PROC SORT DATA=have out=have2; 
BY id descending todate; 
RUN;


data want;
	set have2;
	by id;
	
	tmp_=lag(amount);
	if first.id then	tmp=amount;
		else tmp=tmp_;

	if amount < 0 and tmp < 0 then output;

	drop tmp_ tmp;
run;

brulard
Pyrite | Level 9
Hi r_behata, and thanks for the suggested code...
When applying to a larger dataset, it doesn't properly output, similar to my answer to novinosrin.
I will edit my POST to add an extra column tomorrow, and attempt to provide a larger dataset.


SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1672 views
  • 1 like
  • 3 in conversation