I'm a relatively new coder in SAS and I am trying to figure out what is the SAS equivalent of Stata's tsset command which allows you to declare a time series data set and specify which variables to sort on. I have a panel data set with US county observations for multiple categories over three years (1 observation per year, per category). I am trying to prevent crossover between counties when I calculate a growth variable so as to avoid contaminating the data.
This is the Stata code I need to translate to SAS:
tsset fipscode year
gen pop_growth= ((race_total_pop-l.race_total_pop)/l.race_total_pop)*100
gen under_18_perc = (pop_under_18_ / race_total_pop)*100
gen over_65_perc = (pop_over_65_ / race_total_pop)*100
Thanks in advance for your help!
It looks like somehow you are telling STATA that YEAR is a time dimension. It is not clear how it would know when you are changing from one FIPSCODE to another however, what happens if FIPSCODE=1 has data from 2016 to 2018 and FISPCODE=2 has data starting in 2019?
Anyway, in SAS you use the LAG() functions to remember previous values. Use BY group processing to prevent using lagged values from a different group.
data have ;
input fipscode year race_total_pop ;
cards;
1003 2015 195121
1003 2016 199510
1003 2017 203360
1009 2015 57710
1009 2016 57704
1009 2017 57667
;
data want;
set have;
by fipscode year;
lag_race_total_pop=lag(race_total_pop);
if not first.fipscode then do;
pop_growth=((race_total_pop-lag_race_total_pop)/lag_race_total_pop)*100;
end;
run;
proc print;
run;
race_ total_ lag_race_ pop_ Obs fipscode year pop total_pop growth 1 1003 2015 195121 . . 2 1003 2016 199510 195121 2.24937 3 1003 2017 203360 199510 1.92973 4 1009 2015 57710 203360 . 5 1009 2016 57704 57710 -0.01040 6 1009 2017 57667 57704 -0.06412
Show some example data and what result you want.
Sounds like you just need to understand how to use BY groups.
This is a small sample of my data.
fipscode | year | race_total_pop | Pop_growth |
1003 | 2015 | 195121 | |
1003 | 2016 | 199510 | |
1003 | 2017 | 203360 | |
1009 | 2015 | 57710 | |
1009 | 2016 | 57704 | |
1009 | 2017 | 57667 |
I want to be able to calculate the population growth for every county (represented by the fipscode) but I need SAS to realize when a new observation starts so that the growth variable is accurate.
It looks like somehow you are telling STATA that YEAR is a time dimension. It is not clear how it would know when you are changing from one FIPSCODE to another however, what happens if FIPSCODE=1 has data from 2016 to 2018 and FISPCODE=2 has data starting in 2019?
Anyway, in SAS you use the LAG() functions to remember previous values. Use BY group processing to prevent using lagged values from a different group.
data have ;
input fipscode year race_total_pop ;
cards;
1003 2015 195121
1003 2016 199510
1003 2017 203360
1009 2015 57710
1009 2016 57704
1009 2017 57667
;
data want;
set have;
by fipscode year;
lag_race_total_pop=lag(race_total_pop);
if not first.fipscode then do;
pop_growth=((race_total_pop-lag_race_total_pop)/lag_race_total_pop)*100;
end;
run;
proc print;
run;
race_ total_ lag_race_ pop_ Obs fipscode year pop total_pop growth 1 1003 2015 195121 . . 2 1003 2016 199510 195121 2.24937 3 1003 2017 203360 199510 1.92973 4 1009 2015 57710 203360 . 5 1009 2016 57704 57710 -0.01040 6 1009 2017 57667 57704 -0.06412
Explain what tsset does.
It declares that the data set is a time-series data and allows you to specify what variables the data should be sorted by (in my case the county identification code (fips code) and the year. This ensures that when manipulating variables it doesn't contaminate observations i.e doesn't calculate a population growth variable that meshes all the counties together instead of recognizing when there is a new observation.
Since we don't understand STATA, we are still not sure what your code is doing. Can you show us the desired output from this data? What is l.race_total_pop?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.