Hi,
I have a dataset that contains a variable for year (2010-2013), a variable for unitID (LT0100 to LT2400), a variable "code" ranging from 0-6, and some other variables (a_1-a_8;b_1-b_8;etc...).
At the moment, what I have is all the columns in the dataset below, except for f_7 which I want to create. f_7 should be calculated using a formula like this: " f_5*(mean(f_6) where Cod=0) ". In words, I want to multiply f_5 by the single value of f_6 (only 1 value, where code=0, for each year and unitId).
The code below, and variations of it, is my first attempts. But it fails to either "lock" the multiplication to each single value, or to find values in column f_6.
proc sql;
create table Dataset1 as
select *, f_5*(mean(f_6)) as f_7
from Dataset
group by unitID ,year
order by year,unitID ,code
;
quit;
UnitId | Year | Code | a_1 | a_2 | a_3 | f_5 | f_6 | f_7 |
LT0100 | 2010 | 0 | 2415 | |||||
LT0100 | 2010 | 1 | 2959139 | 320212 | 1600244 | 0,137518 | 332,106 | |
LT0100 | 2010 | 2 | 4565010 | 142715 | 996557 | 0,06129 | 148,0154 | |
LT0100 | 2010 | 3 | 3301774 | 401540 | 2041513 | 0,172445 | 416,4547 | |
LT0100 | 2010 | 4 | 7384528 | 1163174 | 5855854 | 0,499536 | 1206,379 | |
LT0100 | 2010 | 5 | 1357620 | 33371 | 1125468 | 0,014332 | 34,61178 | |
LT0100 | 2010 | 6 | 1454282 | 267495 | 1110066 | 0,114878 | 277,4304 | |
LT0300 | 2010 | 0 | 1684 | |||||
LT0300 | 2010 | 1 | 649706 | 171904 | 326910 | 0,15413 | 259,5549 | |
LT0300 | 2010 | 2 | 1058760 | 97291 | 167824 | 0,087232 | 146,8987 | |
LT0300 | 2010 | 3 | 1156493 | 348738 | 733397 | 0,31268 | 526,5531 | |
LT0300 | 2010 | 4 | 2047786 | 415344 | 1444201 | 0,372399 | 627,1199 | |
LT0300 | 2010 | 5 | 257839 | 76602 | 174777 | 0,068682 | 115,6605 | |
LT0300 | 2010 | 6 | 248166 | 5440 | 241026 | 0,004878 | 8,214552 | |
LT0400 | 2010 | 0 | 3104 | |||||
LT0400 | 2010 | 1 | 478329 | 37400 | 224111 | 0,091693 | 284,6151 | |
LT0400 | 2010 | 2 | 745501 | 16167 | 65280 | 0,039637 | 123,0332 | |
LT0400 | 2010 | 3 | 705415 | 133909 | 542181 | 0,328304 | 1019,056 | |
LT0400 | 2010 | 4 | 901986 | 182750 | 712861 | 0,448047 | 1390,738 | |
LT0400 | 2010 | 5 | 175015 | 37655 | 112965 | 0,092319 | 286,5582 | |
LT0400 | 2010 | 6 | 185470 | 185470 | ||||
LT0100 | 2011 | 0 | 1235 | |||||
LT0100 | 2011 | 1 | 2959139 | 352233,2 | 1600244 | 0,137518 | 169,8347 | |
LT0100 | 2011 | 2 | 4565010 | 156986,5 | 996557 | 0,06129 | 75,69315 | |
LT0100 | 2011 | 3 | 3301774 | 441694 | 2041513 | 0,172445 | 212,9696 | |
LT0100 | 2011 | 4 | 7384528 | 1279491 | 5855854 | 0,499536 | 616,927 | |
LT0100 | 2011 | 5 | 1357620 | 36708,1 | 1125468 | 0,014332 | 17,70002 | |
LT0100 | 2011 | 6 | 1454282 | 294244,5 | 1110066 | 0,114878 | 141,8743 | |
LT0300 | 2011 | 0 | 2214 | |||||
LT0300 | 2011 | 1 | 649706 | 189094,4 | 326910 | 0,15413 | 341,2438 | |
LT0300 | 2011 | 2 | 1058760 | 107020,1 | 167824 | 0,087232 | 193,1316 | |
LT0300 | 2011 | 3 | 1156493 | 383611,8 | 733397 | 0,31268 | 692,2735 | |
LT0300 | 2011 | 4 | 2047786 | 456878,4 | 1444201 | 0,372399 | 824,4914 | |
LT0300 | 2011 | 5 | 257839 | 84262,2 | 174777 | 0,068682 | 152,0619 | |
LT0300 | 2011 | 6 | 248166 | 5984 | 241026 | 0,004878 | 10,79989 | |
LT0400 | 2011 | 0 | 2610 | |||||
LT0400 | 2011 | 1 | 478329 | 41140 | 224111 | 0,091693 | 239,3187 | |
LT0400 | 2011 | 2 | 745501 | 17783,7 | 65280 | 0,039637 | 103,4526 | |
LT0400 | 2011 | 3 | 705415 | 147299,9 | 542181 | 0,328304 | 856,8734 | |
LT0400 | 2011 | 4 | 901986 | 201025 | 712861 | 0,448047 | 1169,403 | |
LT0400 | 2011 | 5 | 175015 | 41420,5 | 112965 | 0,092319 | 240,9526 | |
LT0400 | 2011 | 6 | 185470 | 185470 |
I can´t find any sources on this problem online, and maybe I need to structure the variable f_6 differently but I don´t know exactly how to, in order to get the multiplication done correctly.
Any suggestions on how to solve this problem?
Best regards,
Hank
A data step is easier, use retain to hold the multiplier and make sure to set the value when code=0. Then drop the multiplier column at the end, I've left it in for display purposes. This assumes your data is sorted as above.
data want;
set have;
by unitid year;
retain multiplier;
if code=0 then multiplier=f_6;
f_7=f_5*multiplier;
*drop mulitplier;
run;
A data step is easier, use retain to hold the multiplier and make sure to set the value when code=0. Then drop the multiplier column at the end, I've left it in for display purposes. This assumes your data is sorted as above.
data want;
set have;
by unitid year;
retain multiplier;
if code=0 then multiplier=f_6;
f_7=f_5*multiplier;
*drop mulitplier;
run;
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.