Hi,
I am new to sas array and loops. I was wondering if someone can help me with my query.
Please find below sample of my dataset. Basically I want to output only those rows where the difference betweeen reportdate of 2 records is more than 8.
so for example fromw below records - i would want to output row1, row4, row5,.....so on
i was looking at writing something like below - but its not working. I want a way to tell that reportdate of observation _n_ is....
data chk;
set test;
array reportarray {1}ReportDate ;
do until (EndofFile);
Daysdiff = reportarray[i+1] - reportarray;
if
Daysdiff > 8 then output;end;
run;
Record | ReportDate | AcctNumber |
1 | 07/06/09 | ABC123456 |
2 | 13/06/09 | ABC123456 |
3 | 21/06/09 | ABC123456 |
4 | 28/06/09 | ABC123456 |
5 | 02/03/10 | ABC123456 |
6 | 07/03/10 | ABC123456 |
7 | 30/04/10 | ABC123456 |
8 | 02/05/10 | ABC123456 |
9 | 30/06/10 | ABC123456 |
10 | 04/07/10 | ABC123456 |
11 | 11/07/10 | ABC123456 |
12 | 18/07/10 | ABC123456 |
13 | 30/09/10 | ABC123456 |
14 | 03/10/10 | ABC123456 |
15 | 30/04/11 | ABC123456 |
16 | 01/05/11 | ABC123456 |
17 | 30/07/11 | ABC123456 |
18 | 31/07/11 | ABC123456 |
19 | 13/06/09 | DEF567890 |
20 | 21/06/09 | DEF567890 |
21 | 28/06/09 | DEF567890 |
22 | 05/07/09 | DEF567890 |
23 | 02/02/10 | DEF567890 |
24 | 09/02/10 | DEF567890 |
25 | 16/02/10 | DEF567890 |
Given your example which rows do you want to output?
Hi,
Thanks for looking into this. I want to output below rows:
Record | ReportDate | AcctNumber |
1 | 07/06/09 | ABC123456 |
2 | 28/06/09 | ABC123456 |
3 | 02/03/10 | ABC123456 |
5 | 30/04/10 | ABC123456 |
7 | 30/06/10 | ABC123456 |
11 | 30/09/10 | ABC123456 |
13 | 30/04/11 | ABC123456 |
15 | 30/07/11 | ABC123456 |
17 | 13/06/09 | DEF567890 |
21 | 02/02/10 | DEF567890 |
It is so weird. Are you sure row4 is what you need?
data temp; input Record ReportDate : ddmmyy10. AcctNumber : $10.; format ReportDate ddmmyy10.; datalines; 1 07/06/09 ABC123456 2 13/06/09 ABC123456 3 21/06/09 ABC123456 4 28/06/09 ABC123456 5 02/03/10 ABC123456 6 07/03/10 ABC123456 7 30/04/10 ABC123456 8 02/05/10 ABC123456 9 30/06/10 ABC123456 10 04/07/10 ABC123456 11 11/07/10 ABC123456 12 18/07/10 ABC123456 13 30/09/10 ABC123456 14 03/10/10 ABC123456 15 30/04/11 ABC123456 16 01/05/11 ABC123456 17 30/07/11 ABC123456 18 31/07/11 ABC123456 19 13/06/09 DEF567890 20 21/06/09 DEF567890 21 28/06/09 DEF567890 22 05/07/09 DEF567890 23 02/02/10 DEF567890 24 09/02/10 DEF567890 25 16/02/10 DEF567890 ; run; data want; set temp; dif=dif(ReportDate); if AcctNumber ne lag(AcctNumber) then output; else if dif gt 8 then output; drop dif; run; Ksharp
Thanks Ksharp
Hi ... just a variation on the Ksharp solution (same result) ...
data want;
set temp;
by acctnumber;
if first.acctnumber or (^first.acctnumber and dif(reportdate) gt 8);
run;
won't that always select the first row of an account number?
hi ... yes, it matches the response to "Given your example which rows do you want to output?"
requirement stated as
I want to output only those rows where the difference betweeen reportdate of 2 records is more than 8.
has more than 1 interpretation
none of these requirements need ARRAY handling when the data are in order
The proposed code that follows produced the rows for Requirement 1 only.
As AcctNumber is shown among the data, I'll assume differences are relevant only within an account
Two "use cases" are not demonstrated in that test data, so I extended the test data by changing the ReportDate on record 12 to 20-jul-2011 (then record 12 will appear twice - once with record 11 and once with record 13), and to check that change in AcctNumber is tested, I added record 18.5
18.5,31/08/2011,ABC123457
I think the neatest solution uses "the read-ahead-merge"
With this data step
data requirement1 ;
option mergeNOby = NOwarn ;
merge test
test( firstobs=2 keep= ReportDate AcctNumber
rename=( AcctNumber = a2
ReportDate = d2 ) ) ;
if part = '1' then do;
part = '2' ;
output ;
part = ' ' ;
end ;
if AcctNumber ne a2 then delete ;
*********************************
* finished with this AcctNumber *
*********************************;
ddif = d2 - reportDate ;
if ddif > 8 then do ;
part = '1' ;
output ;
end ;
retain part ;
format ReportDate d2 date11. ;
run ;
option mergeNOby = warn ;
I obtained these rows
The SAS System 10:26 Saturday, October 22, 2011
Acct
Obs record ReportDate Number d2 a2 part ddif
1 4 28-JUN-2009 ABC123456 02-MAR-2010 ABC123456 1 247
2 5 02-MAR-2010 ABC123456 07-MAR-2010 ABC123456 2
3 6 07-MAR-2010 ABC123456 30-APR-2010 ABC123456 1 54
4 7 30-APR-2010 ABC123456 02-MAY-2010 ABC123456 2
5 8 02-MAY-2010 ABC123456 30-JUN-2010 ABC123456 1 59
6 9 30-JUN-2010 ABC123456 04-JUL-2010 ABC123456 2
7 11 11-JUL-2010 ABC123456 20-JUL-2010 ABC123456 1 9
8 12 20-JUL-2010 ABC123456 30-SEP-2010 ABC123456 2
9 12 20-JUL-2010 ABC123456 30-SEP-2010 ABC123456 1 72
10 13 30-SEP-2010 ABC123456 03-OCT-2010 ABC123456 2
11 14 03-OCT-2010 ABC123456 30-APR-2011 ABC123456 1 209
12 15 30-APR-2011 ABC123456 01-MAY-2011 ABC123456 2
13 16 01-MAY-2011 ABC123456 30-JUL-2011 ABC123456 1 90
14 17 30-JUL-2011 ABC123456 31-JUL-2011 ABC123456 2
15 22 05-JUL-2009 DEF567890 02-FEB-2010 DEF567890 1 212
16 23 02-FEB-2010 DEF567890 09-FEB-2010 DEF567890 2
Message was edited by: Peter Crawford ### was appearing for trailing blanks on some lines of code, so it seemed a tidy idea to clean
Thanks Peter C.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.