DATA Step, Macro, Functions and more

sum values of one variable depending on values of another variable

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

sum values of one variable depending on values of another variable

Hello all,

I have a dataset as below, and for each UPC each week, I need to calculate the sum of units of the last quarter, meaning weeks between last_qtr_start and last_qtr_end. I thought proc sql would work for this, but after hours of trying, I didn't get any luck.

Any help is greatly appreciated!

data one;

  input upc week units last_qtr_start last_qtr_end;

datalines;

10001 1150 10 1127 1149

10001 1151 20 1128 1150

10001 1154 15 1141 1153

10001 1156 12 1143 1155

10002 1300 20 1287 1299

10002 1301 25 1288 1300

10002 1305 20 1292 1304

;

run;

And the output should look like:

data one;

  input upc week units last_qtr_start last_qtr_end last_qtr_sum;

datalines;

10001 1150 10 1127 1149 0

10001 1151 20 1128 1150 10

10001 1154 15 1141 1153 30

10001 1156 12 1143 1155 45

10002 1300 20 1287 1299 0

10002 1301 25 1288 1300 25

10002 1305 20 1292 1304 45

;

run;



Accepted Solutions
Solution
‎10-18-2013 02:04 PM
Respected Advisor
Posts: 3,124

Re: sum values of one variable depending on values of another variable

data one;

  input upc week units last_qtr_start last_qtr_end;

datalines;

10001 1150 10 1127 1149

10001 1151 20 1128 1150

10001 1154 15 1141 1153

10001 1156 12 1143 1155

10002 1300 20 1287 1299

10002 1301 25 1288 1300

10002 1305 20 1292 1304

10002 1330 20 1317 1329 0

10002 1340 25 1327 1339 20

10002 1350 20 1337 1349 25

;

run;

proc sql;

  create table want as

  select a.*, coalesce(sum(b.units),0) as last_qtr_sum from one a

left join one b

on a.upc=b.upc

  and a.last_qtr_start<=b.week<=a.last_qtr_end

group by a.upc, a.week, a.units, a.last_qtr_start, a.last_qtr_end

;quit;

UPDATE: Oops, Reeza beats me on Proc SQL version. Here is an alternative data step Hash approach:

data want;

  if _n_=1 then do;

  if 0 then set one (rename=(week=_week units=_units) keep=upc week units);

declare hash h(dataset:'one(rename=(week=_week units=_units) keep=upc week units)', multidata:'y');

h.definekey('upc');

h.definedata(all:'y');

h.definedone();

  end;

  set one;

  do _rc=h.find() by 0 while (_rc=0);

  last_qtr_sum=sum(last_qtr_sum,(last_qtr_start <= _week <= last_qtr_end)*_units,0);

_rc=h.find_next();

end;

drop _:;

run;

View solution in original post


All Replies
Respected Advisor
Posts: 3,124

Re: sum values of one variable depending on values of another variable

data one;

  input upc week units last_qtr_start last_qtr_end;

datalines;

10001 1150 10 1127 1149

10001 1151 20 1128 1150

10001 1154 15 1141 1153

10001 1156 12 1143 1155

10002 1300 20 1287 1299

10002 1301 25 1288 1300

10002 1305 20 1292 1304

;

run;

data want;

  set one;

  by upc notsorted;

  _var+units;

  if first.upc then last_qtr_sum=0;else last_qtr_sum=_var-units;

  if last.upc then _var=0;

  drop _var;

  run;

Trusted Advisor
Posts: 1,128

Re: sum values of one variable depending on values of another variable

The code of Hai.Kuo works well, i just gave an alternative one

data want;

    set one;

    retain val;

    by upc;

    if first.upc then val=units;

    else val+units;

    if first.upc then  last_qtr_sum=0;

    else last_qtr_sum=val-units;

     drop val;

run;

a correction in the output, it should be 20 i believe instead of 25, please check

10002 1301 25 1288 1300 25 20

Thanks,

Jagadish

Thanks,
Jag
New Contributor
Posts: 4

Re: sum values of one variable depending on values of another variable

Thank so much Jagadish and Hai.Kuo! Sorry that i didn't explain the question clearly. For the same upc and week, the last_qtr_sum calculation is conditioned by last_qtr_start last_qtr_end, i.e., last_qtr_sum is the sum of units of the same upc but weeks have to be between last_qtr_start and last_qtr_end.

i modified my example by adding 3 rows at the bottom to show the difference.

btw, your correction of the output is correct,Jagadish, thank you!

data one;

  input upc week units last_qtr_start last_qtr_end;

datalines;

10001 1150 10 1127 1149

10001 1151 20 1128 1150

10001 1154 15 1141 1153

10001 1156 12 1143 1155

10002 1300 20 1287 1299

10002 1301 25 1288 1300

10002 1305 20 1292 1304

10002 1330 20 1317 1329

10002 1340 25 1327 1339

10002 1350 20 1337 1349

;

run;

And the output should look like:

data two;

  input upc week units last_qtr_start last_qtr_end last_qtr_sum;

datalines;

10001 1150 10 1127 1149 0

10001 1151 20 1128 1150 10

10001 1154 15 1141 1153 30

10001 1156 12 1143 1155 45

10002 1300 20 1287 1299 0

10002 1301 25 1288 1300 20

10002 1305 20 1292 1304 45

10002 1330 20 1317 1329 0

10002 1340 25 1327 1339 20

10002 1350 20 1337 1349 25

;

run;

Super User
Posts: 17,784

Re: sum values of one variable depending on values of another variable

Highly inefficient but this may work:

proc sql;

    create table want as

    select distinct a.*, coalesce(sum(b.units), 0) as total_units

    from one as a

    left join one as b

    on b.week between a.last_qtr_start and a.last_qtr_end

    group by a.upc, a.week;

quit;

New Contributor
Posts: 4

Re: sum values of one variable depending on values of another variable

Thanks so much Reeza. This is very helpful! In the left join step, we do need to specify a.upc=b.upc, otherwise the last_qtr_sum will be messed up.

Thanks again!

Solution
‎10-18-2013 02:04 PM
Respected Advisor
Posts: 3,124

Re: sum values of one variable depending on values of another variable

data one;

  input upc week units last_qtr_start last_qtr_end;

datalines;

10001 1150 10 1127 1149

10001 1151 20 1128 1150

10001 1154 15 1141 1153

10001 1156 12 1143 1155

10002 1300 20 1287 1299

10002 1301 25 1288 1300

10002 1305 20 1292 1304

10002 1330 20 1317 1329 0

10002 1340 25 1327 1339 20

10002 1350 20 1337 1349 25

;

run;

proc sql;

  create table want as

  select a.*, coalesce(sum(b.units),0) as last_qtr_sum from one a

left join one b

on a.upc=b.upc

  and a.last_qtr_start<=b.week<=a.last_qtr_end

group by a.upc, a.week, a.units, a.last_qtr_start, a.last_qtr_end

;quit;

UPDATE: Oops, Reeza beats me on Proc SQL version. Here is an alternative data step Hash approach:

data want;

  if _n_=1 then do;

  if 0 then set one (rename=(week=_week units=_units) keep=upc week units);

declare hash h(dataset:'one(rename=(week=_week units=_units) keep=upc week units)', multidata:'y');

h.definekey('upc');

h.definedata(all:'y');

h.definedone();

  end;

  set one;

  do _rc=h.find() by 0 while (_rc=0);

  last_qtr_sum=sum(last_qtr_sum,(last_qtr_start <= _week <= last_qtr_end)*_units,0);

_rc=h.find_next();

end;

drop _:;

run;

New Contributor
Posts: 4

Re: sum values of one variable depending on values of another variable

Thanks so much Hai Kuo! I've applied your sql version and it worked great! You rock!!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 360 views
  • 3 likes
  • 4 in conversation