I have a dataset named Entry with the following columns
Date | Category | Indicator | Total |
01Nov-2021 | Starting | 13 | |
01Nov-2021 | Add | TRUE | 10 |
01Nov-2021 | Add | TRUE | 7 |
01Nov-2021 | Subs | FALSE | 6 |
01Nov-2021 | Subs | FALSE | 5 |
02Nov-2021 | Add | TRUE | 6 |
02Nov-2021 | Add | TRUE | 8 |
02Nov-2021 | Subs | FALSE | 6 |
02Nov-2021 | Subs | FALSE | 2 |
03Nov-2021 | Add | TRUE | 7 |
03Nov-2021 | Add | TRUE | 2 |
03Nov-2021 | Subs | FALSE | 8 |
03Nov-2021 | Subs | FALSE | 1 |
Now, I want to create the following dataset:
Date | Starting | Add_total | Subs_Total | Add_true | Add_False | Subs_True | Subs_False | Nets_Total | Nets_True | Nets_False | Ending |
01stNov2021 | 13 | 17 | 11 | 10 | 7 | 6 | 5 | 6 | 4 | 2 | 19 |
02nd Nov 2021 | 19 | 14 | 8 | 6 | 8 | 6 | 2 | 6 | 0 | 6 | 25 |
03rd Nov 2021 | 25 | 9 | 9 | 7 | 2 | 8 | 1 | 0 | -1 | 1 | 25 |
Starting - Total of category - Starting for the day
Add_Total - Total of Category Adds
Subs_Total - Total of Category Subs
Nets_Total - Adds_Total - Subs_total
Nets_True - Adds_true - Subs_true
Nets_false - Adds_false - Subs_False
Ending - Starting + Nets_Total
Moreover, the ending for the previous day needs to be starting for the next day and so on. Is it possible to do it in one query. I tried
PROC SQL;
CREATE TABLE Test AS
SELECT
Date,
sum(case when Category = 'Starting' then Total end) as Starting,
sum(case when Category = 'Add' then Total end) as AddTotal,
sum(case when Category = 'Subs' then Total end) as SubsTotal,
sum(case when Category = 'Add' and Indicator = 'True' then Total end) as Addtrue
sum(case when Category = 'Add' and Indicator = 'False' then Total end) as Addfalse
sum(case when Category = 'Subs' and Indicator = 'True' then Total end) as Substrue
sum(case when Category = 'Subs' and Indicator = 'False' then Total end) as Subsfalse
calculated Add_Total - calculated Subs_Total as Nets_Total,
calculated Add_true - calculated Subs_true as Nets_True,
calculated Add_false - calculated Subs_false as Nets_false,
calculated Starting + Nets_Total as Ending
FROM Entry
;
QUIT;
But need help in sorting out for the Starting and Ending for each day.
Does not look like you need to do an "iteration". Instead you just need to process the data by date. Something that PROC SQL is not designed to do. So instead just use a data step.
data have;
input Date :date. Category $ Indicator $ Total ;
format date date9.;
cards;
01Nov-2021 Starting . 13
01Nov-2021 Add TRUE 10
01Nov-2021 Add TRUE 7
01Nov-2021 Subs FALSE 6
01Nov-2021 Subs FALSE 5
02Nov-2021 Add TRUE 6
02Nov-2021 Add TRUE 8
02Nov-2021 Subs FALSE 6
02Nov-2021 Subs FALSE 2
03Nov-2021 Add TRUE 7
03Nov-2021 Add TRUE 2
03Nov-2021 Subs FALSE 8
03Nov-2021 Subs FALSE 1
;
data want;
set have ;
by date ;
retain Starting Adds_total Subs_Total
Adds_true Adds_False
Subs_True Subs_False
Nets_Total Nets_True Nets_False
Ending
;
if first.date then do;
Adds_total=0;
Subs_Total=0;
Adds_true=0;
Adds_False=0;
Subs_True=0;
Subs_False=0;
end;
if category='Starting' then do;
starting=total;
end;
else if category='Add' then do;
adds_total+total;
if Indicator='TRUE' then adds_true+total;
else adds_false+total;
end;
else if category='Subs' then do;
subs_total+total;
if Indicator='TRUE' then subs_true+total;
else subs_false+total;
end;
if last.date then do;
Nets_Total = Adds_Total - Subs_total ;
Nets_True = Adds_true - Subs_true ;
Nets_false = Adds_false - Subs_False ;
Ending = Starting + Nets_Total;
output;
starting=ending;
end;
run;
proc print; run;
Does not look like you need to do an "iteration". Instead you just need to process the data by date. Something that PROC SQL is not designed to do. So instead just use a data step.
data have;
input Date :date. Category $ Indicator $ Total ;
format date date9.;
cards;
01Nov-2021 Starting . 13
01Nov-2021 Add TRUE 10
01Nov-2021 Add TRUE 7
01Nov-2021 Subs FALSE 6
01Nov-2021 Subs FALSE 5
02Nov-2021 Add TRUE 6
02Nov-2021 Add TRUE 8
02Nov-2021 Subs FALSE 6
02Nov-2021 Subs FALSE 2
03Nov-2021 Add TRUE 7
03Nov-2021 Add TRUE 2
03Nov-2021 Subs FALSE 8
03Nov-2021 Subs FALSE 1
;
data want;
set have ;
by date ;
retain Starting Adds_total Subs_Total
Adds_true Adds_False
Subs_True Subs_False
Nets_Total Nets_True Nets_False
Ending
;
if first.date then do;
Adds_total=0;
Subs_Total=0;
Adds_true=0;
Adds_False=0;
Subs_True=0;
Subs_False=0;
end;
if category='Starting' then do;
starting=total;
end;
else if category='Add' then do;
adds_total+total;
if Indicator='TRUE' then adds_true+total;
else adds_false+total;
end;
else if category='Subs' then do;
subs_total+total;
if Indicator='TRUE' then subs_true+total;
else subs_false+total;
end;
if last.date then do;
Nets_Total = Adds_Total - Subs_total ;
Nets_True = Adds_true - Subs_true ;
Nets_false = Adds_false - Subs_False ;
Ending = Starting + Nets_Total;
output;
starting=ending;
end;
run;
proc print; run;
Thanks so much. Works like a charm
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.