I have the following dataset. Year, city and variable. The variable changes infrequently. I would like to create another variable (variable_2) that tracks variable. Prior to a change in variable, the value of variable_2 will be 0 and after the change the value of variable_2 changes to 1. Here is the original data.
year | city | variable |
2000 | Allen | 5000 |
2001 | Allen | 5000 |
2002 | Allen | 5000 |
2003 | Allen | 5100 |
2004 | Allen | 5100 |
2005 | Allen | 5100 |
2000 | Frisco | 6300 |
2001 | Frisco | 6300 |
2002 | Frisco | 6000 |
2003 | Frisco | 6000 |
2004 | Frisco | 6000 |
2005 | Frisco | 6000 |
After the transformation
year | city | variable | Variable_2 |
2000 | Allen | 5000 | 0 |
2001 | Allen | 5000 | 0 |
2002 | Allen | 5000 | 0 |
2003 | Allen | 5100 | 1 |
2004 | Allen | 5100 | 1 |
2005 | Allen | 5100 | 1 |
2000 | Frisco | 6300 | 0 |
2001 | Frisco | 6300 | 0 |
2002 | Frisco | 6000 | 1 |
2003 | Frisco | 6000 | 1 |
2004 | Frisco | 6000 | 1 |
2005 | Frisco | 6000 | 1 |
Please provide data in the form of a data step. That way we don't have to make guesses about the actual types of variables.
One way, note the data step of what I am guessing are the variable types for your data.
data have; input year city $ variable ; datalines; 2000 Allen 5000 2001 Allen 5000 2002 Allen 5000 2003 Allen 5100 2004 Allen 5100 2005 Allen 5100 2000 Frisco 6300 2001 Frisco 6300 2002 Frisco 6000 2003 Frisco 6000 2004 Frisco 6000 2005 Frisco 6000 ; proc sort data=have; by city year ; run; data want; set have; by city year ; retain var_2; lv=lag(variable); if first.city then var_2=0; else if variable ne lv then var_2+1; drop lv; run;
I am sorting the data by City and year just in case your data is not in order and to allow use of BY group processing in the follow up data step that creates what you request.
The BY statement creates automatic variables First. and Last. for each variable on the by statement that are 1/0 valued for true/false indicating if the current observation is the first or last of the by group. These values can be used to conditionally execute code such as resetting the value of Var_2 to 0. RETAIN tells SAS to keep the values of the variable across boundaries of the data step so can accumulate values. The LAG function, when used correctly, has the value of the previous record(s) for a variable. So this allows comparing the value of Variable for the current record with the previous to increment Var_2.
Note: if you have more than 1 change this will increment at each change, getting 2, 3, etc.
The lagged value is dropped from the data set as not needed.
@pablocasso wrote:
I have the following dataset. Year, city and variable. The variable changes infrequently. I would like to create another variable (variable_2) that tracks variable. Prior to a change in variable, the value of variable_2 will be 0 and after the change the value of variable_2 changes to 1. Here is the original data.
year city variable 2000 Allen 5000 2001 Allen 5000 2002 Allen 5000 2003 Allen 5100 2004 Allen 5100 2005 Allen 5100 2000 Frisco 6300 2001 Frisco 6300 2002 Frisco 6000 2003 Frisco 6000 2004 Frisco 6000 2005 Frisco 6000
After the transformation
year city variable Variable_2 2000 Allen 5000 0 2001 Allen 5000 0 2002 Allen 5000 0 2003 Allen 5100 1 2004 Allen 5100 1 2005 Allen 5100 1 2000 Frisco 6300 0 2001 Frisco 6300 0 2002 Frisco 6000 1 2003 Frisco 6000 1 2004 Frisco 6000 1 2005 Frisco 6000 1
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: