Hi ,
I have a Table test1 when values of variables Flow_typ,Region,Account_status and Amount are equal and diff between end date of first record and start date of next record is 1 then collapse all those records as 1 for that group (Flow_typ,Region,Account_status and Amount)
when values of variables Flow_typ,Region,Account_status and Amount are equal and diff between end date of first record and start date of next record is > 1 then treat as different row
Output should be like Test2 .
Test1
Flow_typ | Region | Account_Status | Amount | St_date | end_date |
Internal | South | Open | 100 | 1/1/2020 | 1/31/2020 |
Internal | South | Open | 200 | 2/1/2021 | 2/28/2021 |
Internal | South | Open | 200 | 5/1/2030 | 5/31/2030 |
Internal | South | Open | 200 | 6/1/2030 | 6/30/2030 |
Internal | South | Open | 200 | 7/1/2030 | 7/31/2030 |
External | North | Closed | 50 | 1/1/2024 | 1/31/2024 |
External | North | Closed | 60 | 2/1/2024 | 2/28/2024 |
External | North | Closed | 50 | 3/1/2026 | 3/31/2026 |
External | North | Closed | 50 | 4/1/2026 | 4/30/2026 |
External | North | Closed | 50 | 5/1/2026 | 5/31/2026 |
Can anyone please help me . Thanks in Advance
Test 2
Flow_typ | Region | Account_Status | Amount | St_date | end_date |
Internal | South | Open | 100 | 1/1/2020 | 1/31/2020 |
Internal | South | Open | 200 | 2/1/2021 | 2/28/2021 |
Internal | South | Open | 200 | 5/1/2030 | 7/31/2030 |
External | North | Closed | 50 | 1/1/2024 | 1/31/2024 |
External | North | Closed | 50 | 3/1/2026 | 5/31/2026 |
External | North | Closed | 60 | 2/1/2024 | 2/28/2024 |
If I understood what you mean.
data test1;
input Flow_typ $ Region $ Account_Status $ Amount St_date : mmddyy10. end_date : mmddyy10.;
informat St_date mmddyy10. end_date mmddyy10.;
format St_date mmddyy10. end_date mmddyy10.;
cards;
Internal South Open 100 1/1/2020 1/31/2020
Internal South Open 200 2/1/2021 2/28/2021
Internal South Open 200 5/1/2030 5/31/2030
Internal South Open 200 6/1/2030 6/30/2030
Internal South Open 200 7/1/2030 7/31/2030
External North Closed 500 1/1/2024 1/31/2024
External North Closed 600 2/1/2024 2/28/2024
External North Closed 500 3/1/2026 3/31/2026
External North Closed 500 4/1/2026 4/30/2026
External North Closed 500 5/1/2026 5/31/2026
;
run;
proc sort data=test1 out=temp;
by Flow_typ Region Account_Status Amount St_date;
run;
data temp;
set temp;
by Flow_typ Region Account_Status Amount;
if first.amount or st_date-lag(end_date) ne 1 then group+1;
run;
proc sql;
create table want as
select Flow_typ, Region , Account_Status , Amount,
min(st_Date) as st_date format=mmddyy10.,max(end_date) as end_date format=mmddyy10.
from temp
group by Flow_typ, Region , Account_Status , Amount,group;
quit;
Hello,
That's not so difficult.
Can you give us the source table (i.e. the top one) as a data step with datalines (cards)?
When pasting your data step SAS code in the communities ...
, click the "running man icon" in the toolbar on top and paste your code in the pop-up window.
Thanks,
Koen
This is extremely similar to your other question: https://communities.sas.com/t5/SAS-Programming/SAS-Row-by-row-compare/m-p/849017#M335681
The only difference is comparing an additional variable which would be done with a LAGged value to collapse the values.
You can use this code which gives your desired output
Flow_typ | Region | Account_Status | Amount | St_date | end_date |
External | North | Closed | 500 | 1/01/2024 | 01/31/2024 |
External | North | Closed | 500 | 3/01/2026 | 05/31/2026 |
External | North | Closed | 600 | 2/01/2024 | 02/28/2024 |
Internal | South | Open | 100 | 1/01/2020 | 01/31/2020 |
Internal | South | Open | 200 | 2/01/2021 | 02/28/2021 |
Internal | South | Open | 200 | 5/01/2030 | 07/31/2030 |
proc sort data = test1 out = test1;
by Flow_typ Region Account_Status Amount St_date end_date ;
run;
/*array to compare previous vales to identify consecutive days*/
data test2;
set test1;
by Flow_typ Region Account_Status Amount ;
array sdate(1); format sdate1 mmddyy10.; array edate(1); format edate1 mmddyy10.;
if first.amount then do; i=1 ;
sdate(1)=st_date; edate(1)=end_date;
NewGroup=1; end;
if i ne 1 then do;
if st_date ne edate(1)+1 then do;
sdate(1)=st_date; edate(1)=end_date; NewGroup=2; end;
else if i ne 1 and st_date = edate(1)+1 then do; edate(1) = end_date; end;
else NewGroup+1;
end;
i + 1;
retain sdate1 edate1 ;
drop i;
run;
Proc sql;
Create table test3 as
Select distinct
Flow_typ, Region, Account_Status, Amount, min(sdate1) as St_date format mmddyy10., Max(edate1) as end_date format mmddyy10.
from
test2
group by Flow_typ, Region, Account_Status, Amount,NewGroup /*instead of NewGroup sdate1*/
;Quit;
If I understood what you mean.
data test1;
input Flow_typ $ Region $ Account_Status $ Amount St_date : mmddyy10. end_date : mmddyy10.;
informat St_date mmddyy10. end_date mmddyy10.;
format St_date mmddyy10. end_date mmddyy10.;
cards;
Internal South Open 100 1/1/2020 1/31/2020
Internal South Open 200 2/1/2021 2/28/2021
Internal South Open 200 5/1/2030 5/31/2030
Internal South Open 200 6/1/2030 6/30/2030
Internal South Open 200 7/1/2030 7/31/2030
External North Closed 500 1/1/2024 1/31/2024
External North Closed 600 2/1/2024 2/28/2024
External North Closed 500 3/1/2026 3/31/2026
External North Closed 500 4/1/2026 4/30/2026
External North Closed 500 5/1/2026 5/31/2026
;
run;
proc sort data=test1 out=temp;
by Flow_typ Region Account_Status Amount St_date;
run;
data temp;
set temp;
by Flow_typ Region Account_Status Amount;
if first.amount or st_date-lag(end_date) ne 1 then group+1;
run;
proc sql;
create table want as
select Flow_typ, Region , Account_Status , Amount,
min(st_Date) as st_date format=mmddyy10.,max(end_date) as end_date format=mmddyy10.
from temp
group by Flow_typ, Region , Account_Status , Amount,group;
quit;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.