DATA Step, Macro, Functions and more

Do Loop / Array to output based on sequentisl rows of a dataset

Reply
New Contributor
Posts: 4

Do Loop / Array to output based on sequentisl rows of a dataset

Hi,

I am new to sas array and loops. I was wondering if someone can help me with my query.

Please find below sample of my dataset. Basically I want to output only those rows where the difference betweeen reportdate of 2 records is more than 8.

so for example fromw below records - i would want to output row1, row4, row5,.....so on

i was looking at writing something like below - but its not working. I want a way to tell that reportdate of observation _n_ is....

data chk;

set test;

array reportarray {1}ReportDate ;

do until (EndofFile);

Daysdiff = reportarray[i+1] - reportarray;

if

Daysdiff > 8 then output;

end;

run;

RecordReportDateAcctNumber
107/06/09ABC123456
213/06/09ABC123456
321/06/09ABC123456
428/06/09ABC123456
502/03/10ABC123456
607/03/10ABC123456
730/04/10ABC123456
802/05/10ABC123456
930/06/10ABC123456
1004/07/10ABC123456
1111/07/10ABC123456
1218/07/10ABC123456
1330/09/10ABC123456
1403/10/10ABC123456
1530/04/11ABC123456
1601/05/11ABC123456
1730/07/11ABC123456
1831/07/11ABC123456
1913/06/09DEF567890
2021/06/09DEF567890
2128/06/09DEF567890
2205/07/09DEF567890
2302/02/10DEF567890
2409/02/10DEF567890
2516/02/10DEF567890
PROC Star
Posts: 7,473

Do Loop / Array to output based on sequentisl rows of a dataset

Posted in reply to VibhaChordiya

Given your example which rows do you want to output?

New Contributor
Posts: 4

Do Loop / Array to output based on sequentisl rows of a dataset

Hi,

Thanks for looking into this. I want to output below rows:

RecordReportDateAcctNumber
107/06/09ABC123456
228/06/09ABC123456
302/03/10ABC123456
530/04/10ABC123456
730/06/10ABC123456
1130/09/10ABC123456
1330/04/11ABC123456
1530/07/11ABC123456
1713/06/09DEF567890
2102/02/10DEF567890
Super User
Posts: 10,028

Re: Do Loop / Array to output based on sequentisl rows of a dataset

Posted in reply to VibhaChordiya

It is so weird. Are you sure row4 is what you need?

data temp;
input Record     ReportDate : ddmmyy10.     AcctNumber : $10.;
format  ReportDate ddmmyy10.;
datalines;
1     07/06/09     ABC123456
2     13/06/09     ABC123456
3     21/06/09     ABC123456
4     28/06/09     ABC123456
5     02/03/10     ABC123456
6     07/03/10     ABC123456
7     30/04/10     ABC123456
8     02/05/10     ABC123456
9     30/06/10     ABC123456
10     04/07/10     ABC123456
11     11/07/10     ABC123456
12     18/07/10     ABC123456
13     30/09/10     ABC123456
14     03/10/10     ABC123456
15     30/04/11     ABC123456
16     01/05/11     ABC123456
17     30/07/11     ABC123456
18     31/07/11     ABC123456
19     13/06/09     DEF567890
20     21/06/09     DEF567890
21     28/06/09     DEF567890
22     05/07/09     DEF567890
23     02/02/10     DEF567890
24     09/02/10     DEF567890
25     16/02/10     DEF567890
;
run;
data want;
 set temp;
 dif=dif(ReportDate);
 if  AcctNumber ne lag(AcctNumber) then output;
  else if dif gt 8 then output;
  drop dif;
run;



Ksharp
New Contributor
Posts: 4

Re: Do Loop / Array to output based on sequentisl rows of a dataset

Thanks Ksharp

Valued Guide
Posts: 765

Re: Do Loop / Array to output based on sequentisl rows of a dataset

Posted in reply to VibhaChordiya

Hi ... just a variation on the Ksharp solution (same result) ...

data want;

set temp;

by acctnumber;

if first.acctnumber or (^first.acctnumber and dif(reportdate) gt 8);

run;

Valued Guide
Posts: 2,177

Do Loop / Array to output based on sequentisl rows of a dataset

won't that always select the first row of an account number?

Valued Guide
Posts: 765

Re: Do Loop / Array to output based on sequentisl rows of a dataset

hi ... yes, it matches the response to "Given your example which rows do you want to output?"

Valued Guide
Posts: 2,177

Re: Do Loop / Array to output based on sequentisl rows of a dataset

Posted in reply to VibhaChordiya

requirement stated as

I want to output only those rows where the difference betweeen reportdate of 2 records is more than 8.

has more than 1 interpretation

  1. output pairs with with difference > 8
  2. output later of pair
  3. output earlier

none of these requirements need ARRAY handling when the data are in order

The proposed code that follows produced the rows for Requirement 1 only.

As AcctNumber is shown among the data, I'll assume differences are relevant only within an account

Two "use cases" are not demonstrated in that test data, so I extended the test data by changing the ReportDate on record 12 to 20-jul-2011 (then record 12 will appear twice - once with record 11 and once with record 13), and to check that change in AcctNumber is tested, I added record 18.5

18.5,31/08/2011,ABC123457

I think the neatest solution uses "the read-ahead-merge"

With this data step

 

data requirement1  ;

option mergeNOby = NOwarn ;

   merge test

         test( firstobs=2 keep= ReportDate  AcctNumber

                 rename=( AcctNumber = a2

                          ReportDate = d2 ) ) ;

   if part = '1' then do;

      part = '2' ;

      output ;

      part = ' ' ;

   end ;

   if AcctNumber ne a2 then delete ;

     *********************************

   * finished with this AcctNumber *

   *********************************;  

   ddif =  d2 - reportDate ;

   if ddif > 8 then do ;

      part = '1' ;

      output ;

   end ;  

   retain part ;

   format ReportDate d2 date11.  ;

run ;

option mergeNOby = warn ;

I obtained these rows

The SAS System                                        10:26 Saturday, October 22, 2011

                                  Acct

Obs    record     ReportDate     Number               d2       a2        part    ddif

  1       4      28-JUN-2009    ABC123456    02-MAR-2010    ABC123456     1       247

  2       5      02-MAR-2010    ABC123456    07-MAR-2010    ABC123456     2

  3       6      07-MAR-2010    ABC123456    30-APR-2010    ABC123456     1        54

  4       7      30-APR-2010    ABC123456    02-MAY-2010    ABC123456     2

  5       8      02-MAY-2010    ABC123456    30-JUN-2010    ABC123456     1        59

  6       9      30-JUN-2010    ABC123456    04-JUL-2010    ABC123456     2

  7      11      11-JUL-2010    ABC123456    20-JUL-2010    ABC123456     1         9

  8      12      20-JUL-2010    ABC123456    30-SEP-2010    ABC123456     2

  9      12      20-JUL-2010    ABC123456    30-SEP-2010    ABC123456     1        72

10      13      30-SEP-2010    ABC123456    03-OCT-2010    ABC123456     2

11      14      03-OCT-2010    ABC123456    30-APR-2011    ABC123456     1       209

12      15      30-APR-2011    ABC123456    01-MAY-2011    ABC123456     2

13      16      01-MAY-2011    ABC123456    30-JUL-2011    ABC123456     1        90

14      17      30-JUL-2011    ABC123456    31-JUL-2011    ABC123456     2

15      22      05-JUL-2009    DEF567890    02-FEB-2010    DEF567890     1       212

16      23      02-FEB-2010    DEF567890    09-FEB-2010    DEF567890     2

Message was edited by: Peter Crawford ### was appearing for trailing blanks on some lines of code, so it seemed a tidy idea to clean

New Contributor
Posts: 4

Re: Do Loop / Array to output based on sequentisl rows of a dataset

Thanks Peter C.

Ask a Question
Discussion stats
  • 9 replies
  • 280 views
  • 8 likes
  • 5 in conversation