Hi Everyone,
I am trying to create a variable by id in which the value of the variable is calculated based on another variable and is repeated 10 times for each id. Here is an example of my dataset and my desired variables, i.e. var1, var2, and var3, as well as the calculations for each one of them. I have also attached the dataset in SAS format.
id | gwill | Lag_gwill | year | var1 | var2 | var3 |
1 | 100 | 2000 | 10 = 100/10 | |||
1 | 150 | 100 | 2001 | 10 = 100/10 | 5 = (150-100)/10 | |
1 | 130 | 150 | 2002 | 10 = 100/10 | 5 = (150-100)/10 | -2 = (130-150)/10 |
1 | 140 | 130 | 2003 | 10 = 100/10 | 5 = (150-100)/10 | -2 = (130-150)/10 |
1 | 150 | 140 | 2004 | 10 = 100/10 | 5 = (150-100)/10 | -2 = (130-150)/10 |
1 | 120 | 150 | 2005 | 10 = 100/10 | 5 = (150-100)/10 | -2 = (130-150)/10 |
1 | 110 | 120 | 2006 | 10 = 100/10 | 5 = (150-100)/10 | -2 = (130-150)/10 |
1 | 100 | 110 | 2007 | 10 = 100/10 | 5 = (150-100)/10 | -2 = (130-150)/10 |
1 | 140 | 100 | 2008 | 10 = 100/10 | 5 = (150-100)/10 | -2 = (130-150)/10 |
1 | 160 | 140 | 2009 | 10 = 100/10 | 5 = (150-100)/10 | -2 = (130-150)/10 |
1 | 180 | 160 | 2010 | 0 | 5 = (150-100)/10 | -2 = (130-150)/10 |
1 | 110 | 180 | 2011 | 0 | 0 | -2 = (130-150)/10 |
1 | 150 | 110 | 2012 | 0 | 0 | 0 |
2 | 200 | 2003 | 20 = 200/10 | |||
2 | 200 | 200 | 2004 | 20 = 200/10 | 0 = (200-200)/10 | |
2 | 180 | 200 | 2005 | 20 = 200/10 | 0 = (200-200)/10 | -2 = (180-200)/10 |
2 | 150 | 200 | 2006 | 20 = 200/10 | 0 = (200-200)/10 | -2 = (180-200)/10 |
2 | 150 | 150 | 2007 | 20 = 200/10 | 0 = (200-200)/10 | -2 = (180-200)/10 |
2 | 140 | 150 | 2008 | 20 = 200/10 | 0 = (200-200)/10 | -2 = (180-200)/10 |
2 | 170 | 140 | 2009 | 20 = 200/10 | 0 = (200-200)/10 | -2 = (180-200)/10 |
2 | 190 | 170 | 2010 | 20 = 200/10 | 0 = (200-200)/10 | -2 = (180-200)/10 |
2 | 215 | 190 | 2011 | 20 = 200/10 | 0 = (200-200)/10 | -2 = (180-200)/10 |
2 | 230 | 215 | 2012 | 20 = 200/10 | 0 = (200-200)/10 | -2 = (180-200)/10 |
2 | 165 | 230 | 2013 | 0 | 0 = (200-200)/10 | -2 = (180-200)/10 |
2 | 182 | 165 | 2014 | 0 | 0 | -2 = (180-200)/10 |
Thanks very much for your help.
Can you, please post what you have as text: a data step using datalines? Datasets are binary files and i am not allowed to download them. You could highlight those variables, that you already have in the table, alternatively.
here is what I have. I want to create var1, var2, and var3 in my original post.
data ka.gwill;
input id gwill lag_gwill year;
cards;
1 100 . 2000
1 150 100 2001
1 130 150 2002
1 140 130 2003
1 150 140 2004
1 120 150 2005
1 110 120 2006
1 100 110 2007
1 140 100 2008
1 160 140 2009
1 180 160 2010
1 110 180 2011
1 150 110 2012
2 200 . 2003
2 200 200 2004
2 180 200 2005
2 150 200 2006
2 150 150 2007
2 140 150 2008
2 170 140 2009
2 190 170 2010
2 215 190 2011
2 230 215 2012
2 165 230 2013
2 182 165 2014
;
run;
thanks
Surely not the most elegant solution:
data want;
set work.gwill;
by id;
retain start_year var1-var3;
if first.id then do;
start_year = year;
end;
select (year);
when (start_year) var1 = gwill / 10;
when (start_year + 1) var2 = (gwill - lag_gwill) / 10;
when (start_year + 2) var3 = (gwill - lag_gwill) / 10;
when (start_year + 11) var1 = 0;
when (start_year + 12) var2 = 0;
when (start_year + 13) var3 = 0;
otherwise;
end;
drop start_year;
run;
Thanks, andreas_Ids for your solution. It does work for my example here. However, when I dropped the last observation for id=1, the value of var3 from id=1 is copied over to id=2. I am not sure why the first.id statement is ignored in this case. If you run the following code, you can see that the value of var3 id=1 is copied to var3 id=2 for the first two observations which are supposed to be zero.
data gwill;
input id gwill lag_gwill year;
cards;
1 100 . 2000
1 150 100 2001
1 130 150 2002
1 140 130 2003
1 150 140 2004
1 120 150 2005
1 110 120 2006
1 100 110 2007
1 140 100 2008
1 160 140 2009
1 180 160 2010
1 110 180 2011
2 200 . 2003
2 200 200 2004
2 130 200 2005
2 150 200 2006
2 150 150 2007
2 140 150 2008
2 170 140 2009
;
run;
data want;
set gwill;
by id;
retain start_year var1-var3;
if first.id then do;
start_year = year;
end;
select (year);
when (start_year) var1 = gwill / 10;
when (start_year + 1) var2 = (gwill - lag_gwill) / 10;
when (start_year + 2) var3 = (gwill - lag_gwill) / 10;
when (start_year + 10) var1 = 0;
when (start_year + 11) var2 = 0;
when (start_year + 12) var3 = 0;
otherwise;
end;
drop start_year;
run;
Any ideas why this is happenings?
Thanks!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.