Hello all,
I have a dataset as below, and for each UPC each week, I need to calculate the sum of units of the last quarter, meaning weeks between last_qtr_start and last_qtr_end. I thought proc sql would work for this, but after hours of trying, I didn't get any luck.
Any help is greatly appreciated!
data one;
input upc week units last_qtr_start last_qtr_end;
datalines;
10001 1150 10 1127 1149
10001 1151 20 1128 1150
10001 1154 15 1141 1153
10001 1156 12 1143 1155
10002 1300 20 1287 1299
10002 1301 25 1288 1300
10002 1305 20 1292 1304
;
run;
And the output should look like:
data one;
input upc week units last_qtr_start last_qtr_end last_qtr_sum;
datalines;
10001 1150 10 1127 1149 0
10001 1151 20 1128 1150 10
10001 1154 15 1141 1153 30
10001 1156 12 1143 1155 45
10002 1300 20 1287 1299 0
10002 1301 25 1288 1300 25
10002 1305 20 1292 1304 45
;
run;
data one;
input upc week units last_qtr_start last_qtr_end;
datalines;
10001 1150 10 1127 1149
10001 1151 20 1128 1150
10001 1154 15 1141 1153
10001 1156 12 1143 1155
10002 1300 20 1287 1299
10002 1301 25 1288 1300
10002 1305 20 1292 1304
10002 1330 20 1317 1329 0
10002 1340 25 1327 1339 20
10002 1350 20 1337 1349 25
;
run;
proc sql;
create table want as
select a.*, coalesce(sum(b.units),0) as last_qtr_sum from one a
left join one b
on a.upc=b.upc
and a.last_qtr_start<=b.week<=a.last_qtr_end
group by a.upc, a.week, a.units, a.last_qtr_start, a.last_qtr_end
;quit;
UPDATE: Oops, Reeza beats me on Proc SQL version. Here is an alternative data step Hash approach:
data want;
if _n_=1 then do;
if 0 then set one (rename=(week=_week units=_units) keep=upc week units);
declare hash h(dataset:'one(rename=(week=_week units=_units) keep=upc week units)', multidata:'y');
h.definekey('upc');
h.definedata(all:'y');
h.definedone();
end;
set one;
do _rc=h.find() by 0 while (_rc=0);
last_qtr_sum=sum(last_qtr_sum,(last_qtr_start <= _week <= last_qtr_end)*_units,0);
_rc=h.find_next();
end;
drop _:;
run;
data one;
input upc week units last_qtr_start last_qtr_end;
datalines;
10001 1150 10 1127 1149
10001 1151 20 1128 1150
10001 1154 15 1141 1153
10001 1156 12 1143 1155
10002 1300 20 1287 1299
10002 1301 25 1288 1300
10002 1305 20 1292 1304
;
run;
data want;
set one;
by upc notsorted;
_var+units;
if first.upc then last_qtr_sum=0;else last_qtr_sum=_var-units;
if last.upc then _var=0;
drop _var;
run;
The code of Hai.Kuo works well, i just gave an alternative one
data want;
set one;
retain val;
by upc;
if first.upc then val=units;
else val+units;
if first.upc then last_qtr_sum=0;
else last_qtr_sum=val-units;
drop val;
run;
a correction in the output, it should be 20 i believe instead of 25, please check
10002 1301 25 1288 1300 25 20
Thanks,
Jagadish
Thank so much Jagadish and Hai.Kuo! Sorry that i didn't explain the question clearly. For the same upc and week, the last_qtr_sum calculation is conditioned by last_qtr_start last_qtr_end, i.e., last_qtr_sum is the sum of units of the same upc but weeks have to be between last_qtr_start and last_qtr_end.
i modified my example by adding 3 rows at the bottom to show the difference.
btw, your correction of the output is correct,Jagadish, thank you!
data one;
input upc week units last_qtr_start last_qtr_end;
datalines;
10001 1150 10 1127 1149
10001 1151 20 1128 1150
10001 1154 15 1141 1153
10001 1156 12 1143 1155
10002 1300 20 1287 1299
10002 1301 25 1288 1300
10002 1305 20 1292 1304
10002 1330 20 1317 1329
10002 1340 25 1327 1339
10002 1350 20 1337 1349
;
run;
And the output should look like:
data two;
input upc week units last_qtr_start last_qtr_end last_qtr_sum;
datalines;
10001 1150 10 1127 1149 0
10001 1151 20 1128 1150 10
10001 1154 15 1141 1153 30
10001 1156 12 1143 1155 45
10002 1300 20 1287 1299 0
10002 1301 25 1288 1300 20
10002 1305 20 1292 1304 45
10002 1330 20 1317 1329 0
10002 1340 25 1327 1339 20
10002 1350 20 1337 1349 25
;
run;
Highly inefficient but this may work:
proc sql;
create table want as
select distinct a.*, coalesce(sum(b.units), 0) as total_units
from one as a
left join one as b
on b.week between a.last_qtr_start and a.last_qtr_end
group by a.upc, a.week;
quit;
Thanks so much Reeza. This is very helpful! In the left join step, we do need to specify a.upc=b.upc, otherwise the last_qtr_sum will be messed up.
Thanks again!
data one;
input upc week units last_qtr_start last_qtr_end;
datalines;
10001 1150 10 1127 1149
10001 1151 20 1128 1150
10001 1154 15 1141 1153
10001 1156 12 1143 1155
10002 1300 20 1287 1299
10002 1301 25 1288 1300
10002 1305 20 1292 1304
10002 1330 20 1317 1329 0
10002 1340 25 1327 1339 20
10002 1350 20 1337 1349 25
;
run;
proc sql;
create table want as
select a.*, coalesce(sum(b.units),0) as last_qtr_sum from one a
left join one b
on a.upc=b.upc
and a.last_qtr_start<=b.week<=a.last_qtr_end
group by a.upc, a.week, a.units, a.last_qtr_start, a.last_qtr_end
;quit;
UPDATE: Oops, Reeza beats me on Proc SQL version. Here is an alternative data step Hash approach:
data want;
if _n_=1 then do;
if 0 then set one (rename=(week=_week units=_units) keep=upc week units);
declare hash h(dataset:'one(rename=(week=_week units=_units) keep=upc week units)', multidata:'y');
h.definekey('upc');
h.definedata(all:'y');
h.definedone();
end;
set one;
do _rc=h.find() by 0 while (_rc=0);
last_qtr_sum=sum(last_qtr_sum,(last_qtr_start <= _week <= last_qtr_end)*_units,0);
_rc=h.find_next();
end;
drop _:;
run;
Thanks so much Hai Kuo! I've applied your sql version and it worked great! You rock!!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.