Hello,
I am trying to make a column that will be added to an existing data set. I need it to Multiply its number by the 6 previous entries (including its own for a total of 7) but only if the previous entries are under the same categories:
I have looked up Rolling lag "by group" but i have had no luck creating my result column.
Category A | Category B | Numbers | **Column I am looking to Add** |
---|---|---|---|
A | B | 2 | 2 |
A | B | 3 | 6 |
A | B | 3 | 18 |
A | B | 4 | 72 |
A | B | 2 | 144 |
A | B | 2 | 288 |
A | B | 2 | 576 |
A | B | 1 | 288 |
B | B | 2 | 2 |
B | B | 3 | 6 |
B | A | 2 | 2 |
B | A | 4 | 8 |
All help is appreciated. Thank you.
Sorry, I missed that part about the last 7 entries only. Here's a modification:
data want;
set have;
by category_a category_b notsorted;
if numbers=. then factor=1;
else factor=numbers;
drop factor;
back7 = lag7(factor);
if first.category_b then do;
new_column=factor;
counter = 1;
end;
else do;
new_column = new_column * factor;
counter + 1;
if counter > 7 then new_column = new_column / back7;
end;
retain new_column;
drop back7;
run;
Could use retain:
data have (drop=lsta lstb lstnum);
attrib CatA catB format=$1. Numbers format=best.;
infile cards;
input cata $ catb $ numbers;
cards;
A B 2
A B 3
A B 3
A B 4
B B 2
B B 3
B A 2
;
run;
data want;
set have;
attrib result format=best.;
retain lsta lstb lstnum;
if _n_=1 or lsta ne cata or lstb ne catb then do;
lsta=cata;
lstb=catb;
lstnum=numbers;
result=numbers;
end;
else do;
result=lstnum * numbers;
lstnum=result;
end;
run;
Thank you for your swift response. A few things,
1) It ended up creating new columns for result, CatA, CatB, 1st number and result, is there a way to avoid this and just have the result be the remaining column?
2) I forgot to mention this, but if there is nothing in the cells, then to treat the value in them as "1" ..... IE
A B 3 3
A B - 3
A B 2 6
Thanks!
You can simplify some of the logic using BY group processing:
data want;
set have;
by category_a category_b notsorted;
if numbers=. then factor=1;
else factor=numbers;
drop factor;
if first.category_b then new_column=factor;
else new_column = new_column * factor;
retain new_column;
run;
Good luck.
Can this be applied to also the last 7 Numbers? Yours worked in the way I wanted to except the new column kept compounding as opposed to keep a "rolling" product of only the last 7 entries.
Sorry, I missed that part about the last 7 entries only. Here's a modification:
data want;
set have;
by category_a category_b notsorted;
if numbers=. then factor=1;
else factor=numbers;
drop factor;
back7 = lag7(factor);
if first.category_b then do;
new_column=factor;
counter = 1;
end;
else do;
new_column = new_column * factor;
counter + 1;
if counter > 7 then new_column = new_column / back7;
end;
retain new_column;
drop back7;
run;
Thank you very much. Works Perfectly. Appreciate everyone else's efforts as well. Great community here.
Yep, just modify as:
data want (drop=lsta lstb lstnum);
set have;
attrib result format=best.;
retain lsta lstb lstnum;
if _n_=1 or lsta ne cata or lstb ne catb then do;
lsta=cata;
lstb=catb;
lstnum=numbers;
result=numbers;
end;
else do;
if numbers=. then result=lstnum;
else result=lstnum * numbers;
lstnum=result;
end;
run;
data have;
input cata $ catb $ numbers;
cards;
A B 2
A B 3
A B .
A B 4
B B 2
B B 3
B A 2
;
run;
data want;
set have;
by cata;
retain temp;
if first.cata then add=numbers;
else add=coalesce(numbers,1)*temp;
temp=add;
drop temp;
run;
The following code works even if you have more then 7 observations in one by group:
data have;
attrib CatA catB format=$1. Numbers format=best.;
infile cards;
input cata $ catb $ numbers;
cards;
A B 2
A B 3
A B 3
A B 4
A B 2
A B 2
A B 2
A B 1
B B 2
B B 3
B A 2
B A 4
;
run;
data want;
set have;
keep catA catB result Numbers; by catA catB notsorted;
array lastNums[7] _temporary_;
retain pointer;
if first.catB then do;
do i=1 to dim(lastNums);
lastNums=1;
end;
pointer=1;
end;
lastNums[pointer]=coalesce(numbers,1);
pointer=mod(pointer,dim(lastNums))+1;
result=1;
do i=1 to dim(lastNums);
result=result*lastNums;
end;
run;
Message was edited by: Gergely Bathó - modified the keep statement.
Try this !!
data y1;
input cat_1 $ cat_2 $ numbers;
cards;
A B 2
A B 3
A B 3
A B 4
A B 2
A B 2
A B 2
A B 1
B B 2
B B 3
B A 2
B A 4
;
run;
proc sort data=y1;by cat_1 cat_2;run;
data y2;
set y1;
by cat_1 cat_2;
if first.cat_2=1 then do;
retain x 1;
x=numbers-1;
end;
x=(numbers)*x;
run;
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.