- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi ,
I have a Table test1 when values of variables Flow_typ,Region,Account_status and Amount are equal and diff between end date of first record and start date of next record is 1 then collapse all those records as 1 for that group (Flow_typ,Region,Account_status and Amount)
when values of variables Flow_typ,Region,Account_status and Amount are equal and diff between end date of first record and start date of next record is > 1 then treat as different row
Output should be like Test2 .
Test1
Flow_typ | Region | Account_Status | Amount | St_date | end_date |
Internal | South | Open | 100 | 1/1/2020 | 1/31/2020 |
Internal | South | Open | 200 | 2/1/2021 | 2/28/2021 |
Internal | South | Open | 200 | 5/1/2030 | 5/31/2030 |
Internal | South | Open | 200 | 6/1/2030 | 6/30/2030 |
Internal | South | Open | 200 | 7/1/2030 | 7/31/2030 |
External | North | Closed | 50 | 1/1/2024 | 1/31/2024 |
External | North | Closed | 60 | 2/1/2024 | 2/28/2024 |
External | North | Closed | 50 | 3/1/2026 | 3/31/2026 |
External | North | Closed | 50 | 4/1/2026 | 4/30/2026 |
External | North | Closed | 50 | 5/1/2026 | 5/31/2026 |
Can anyone please help me . Thanks in Advance
Test 2
Flow_typ | Region | Account_Status | Amount | St_date | end_date |
Internal | South | Open | 100 | 1/1/2020 | 1/31/2020 |
Internal | South | Open | 200 | 2/1/2021 | 2/28/2021 |
Internal | South | Open | 200 | 5/1/2030 | 7/31/2030 |
External | North | Closed | 50 | 1/1/2024 | 1/31/2024 |
External | North | Closed | 50 | 3/1/2026 | 5/31/2026 |
External | North | Closed | 60 | 2/1/2024 | 2/28/2024 |
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If I understood what you mean.
data test1;
input Flow_typ $ Region $ Account_Status $ Amount St_date : mmddyy10. end_date : mmddyy10.;
informat St_date mmddyy10. end_date mmddyy10.;
format St_date mmddyy10. end_date mmddyy10.;
cards;
Internal South Open 100 1/1/2020 1/31/2020
Internal South Open 200 2/1/2021 2/28/2021
Internal South Open 200 5/1/2030 5/31/2030
Internal South Open 200 6/1/2030 6/30/2030
Internal South Open 200 7/1/2030 7/31/2030
External North Closed 500 1/1/2024 1/31/2024
External North Closed 600 2/1/2024 2/28/2024
External North Closed 500 3/1/2026 3/31/2026
External North Closed 500 4/1/2026 4/30/2026
External North Closed 500 5/1/2026 5/31/2026
;
run;
proc sort data=test1 out=temp;
by Flow_typ Region Account_Status Amount St_date;
run;
data temp;
set temp;
by Flow_typ Region Account_Status Amount;
if first.amount or st_date-lag(end_date) ne 1 then group+1;
run;
proc sql;
create table want as
select Flow_typ, Region , Account_Status , Amount,
min(st_Date) as st_date format=mmddyy10.,max(end_date) as end_date format=mmddyy10.
from temp
group by Flow_typ, Region , Account_Status , Amount,group;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
That's not so difficult.
Can you give us the source table (i.e. the top one) as a data step with datalines (cards)?
When pasting your data step SAS code in the communities ...
, click the "running man icon" in the toolbar on top and paste your code in the pop-up window.
Thanks,
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
input Flow_typ$ 1-9 Region$ 10-15 Account_Status$16-22 Amount 24-27 @28 St_date mmddyy10. @37 end_date mmddyy10.;
informat St_date mmddyy10. end_date mmddyy10.;
format St_date mmddyy10. end_date mmddyy10.;
cards;
Internal South Open 100 1/1/2020 1/31/2020
Internal South Open 200 2/1/2021 2/28/2021
Internal South Open 200 5/1/2030 5/31/2030
Internal South Open 200 6/1/2030 6/30/2030
Internal South Open 200 7/1/2030 7/31/2030
External North Closed 500 1/1/2024 1/31/2024
External North Closed 600 2/1/2024 2/28/2024
External North Closed 500 3/1/2026 3/31/2026
External North Closed 500 4/1/2026 4/30/2026
External North Closed 500 5/1/2026 5/31/2026
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is extremely similar to your other question: https://communities.sas.com/t5/SAS-Programming/SAS-Row-by-row-compare/m-p/849017#M335681
The only difference is comparing an additional variable which would be done with a LAGged value to collapse the values.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can use this code which gives your desired output
Flow_typ | Region | Account_Status | Amount | St_date | end_date |
External | North | Closed | 500 | 1/01/2024 | 01/31/2024 |
External | North | Closed | 500 | 3/01/2026 | 05/31/2026 |
External | North | Closed | 600 | 2/01/2024 | 02/28/2024 |
Internal | South | Open | 100 | 1/01/2020 | 01/31/2020 |
Internal | South | Open | 200 | 2/01/2021 | 02/28/2021 |
Internal | South | Open | 200 | 5/01/2030 | 07/31/2030 |
proc sort data = test1 out = test1;
by Flow_typ Region Account_Status Amount St_date end_date ;
run;
/*array to compare previous vales to identify consecutive days*/
data test2;
set test1;
by Flow_typ Region Account_Status Amount ;
array sdate(1); format sdate1 mmddyy10.; array edate(1); format edate1 mmddyy10.;
if first.amount then do; i=1 ;
sdate(1)=st_date; edate(1)=end_date;
NewGroup=1; end;
if i ne 1 then do;
if st_date ne edate(1)+1 then do;
sdate(1)=st_date; edate(1)=end_date; NewGroup=2; end;
else if i ne 1 and st_date = edate(1)+1 then do; edate(1) = end_date; end;
else NewGroup+1;
end;
i + 1;
retain sdate1 edate1 ;
drop i;
run;
Proc sql;
Create table test3 as
Select distinct
Flow_typ, Region, Account_Status, Amount, min(sdate1) as St_date format mmddyy10., Max(edate1) as end_date format mmddyy10.
from
test2
group by Flow_typ, Region, Account_Status, Amount,NewGroup /*instead of NewGroup sdate1*/
;Quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If I understood what you mean.
data test1;
input Flow_typ $ Region $ Account_Status $ Amount St_date : mmddyy10. end_date : mmddyy10.;
informat St_date mmddyy10. end_date mmddyy10.;
format St_date mmddyy10. end_date mmddyy10.;
cards;
Internal South Open 100 1/1/2020 1/31/2020
Internal South Open 200 2/1/2021 2/28/2021
Internal South Open 200 5/1/2030 5/31/2030
Internal South Open 200 6/1/2030 6/30/2030
Internal South Open 200 7/1/2030 7/31/2030
External North Closed 500 1/1/2024 1/31/2024
External North Closed 600 2/1/2024 2/28/2024
External North Closed 500 3/1/2026 3/31/2026
External North Closed 500 4/1/2026 4/30/2026
External North Closed 500 5/1/2026 5/31/2026
;
run;
proc sort data=test1 out=temp;
by Flow_typ Region Account_Status Amount St_date;
run;
data temp;
set temp;
by Flow_typ Region Account_Status Amount;
if first.amount or st_date-lag(end_date) ne 1 then group+1;
run;
proc sql;
create table want as
select Flow_typ, Region , Account_Status , Amount,
min(st_Date) as st_date format=mmddyy10.,max(end_date) as end_date format=mmddyy10.
from temp
group by Flow_typ, Region , Account_Status , Amount,group;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content