BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Beginner44
Calcite | Level 5

I'm a relatively new coder in SAS and I am trying to figure out what is the SAS equivalent of Stata's tsset command which allows you to declare a time series data set and specify which variables to sort on. I have a panel data set with US county observations for multiple categories over three years (1 observation per year, per category). I am trying to prevent crossover between counties when I calculate a growth variable so as to avoid contaminating the data.

 

This is the Stata code I need to translate to SAS:

 

tsset fipscode year 

gen pop_growth= ((race_total_pop-l.race_total_pop)/l.race_total_pop)*100 

gen under_18_perc = (pop_under_18_ / race_total_pop)*100 

gen over_65_perc = (pop_over_65_ / race_total_pop)*100 

 

 

Thanks in advance for your help!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

It looks like somehow you are telling STATA that YEAR is a time dimension.  It is not clear how it would know when you are changing from one FIPSCODE to another however, what happens if FIPSCODE=1 has data from 2016 to 2018 and FISPCODE=2 has data starting in 2019?

 

Anyway, in SAS you use the LAG() functions to remember previous values.  Use BY group processing to prevent using lagged values from a different group.

 

data have ;
  input fipscode	year	race_total_pop ;
cards;
1003	2015	195121	 
1003	2016	199510	 
1003	2017	203360	 
1009	2015	57710	 
1009	2016	57704	 
1009	2017	57667	 
;
data want;
  set have;
  by fipscode year;
  lag_race_total_pop=lag(race_total_pop);
  if not first.fipscode then do;
    pop_growth=((race_total_pop-lag_race_total_pop)/lag_race_total_pop)*100;
  end;
run;

proc print;
run;
                            race_
                           total_    lag_race_      pop_
Obs    fipscode    year      pop     total_pop     growth

 1       1003      2015    195121           .       .
 2       1003      2016    199510      195121      2.24937
 3       1003      2017    203360      199510      1.92973
 4       1009      2015     57710      203360       .
 5       1009      2016     57704       57710     -0.01040
 6       1009      2017     57667       57704     -0.06412

 

 

View solution in original post

6 REPLIES 6
Tom
Super User Tom
Super User

Show some example data and what result you want.

Sounds like you just need to understand how to use BY groups.

Beginner44
Calcite | Level 5

This is a small sample of my data.

 

fipscodeyearrace_total_popPop_growth
10032015195121 
10032016199510 
10032017203360 
1009201557710 
1009201657704 
1009201757667 

 

 

I want to be able to calculate the population growth for every county (represented by the fipscode) but I need SAS to realize when a new observation starts so that the growth variable is accurate.

Tom
Super User Tom
Super User

It looks like somehow you are telling STATA that YEAR is a time dimension.  It is not clear how it would know when you are changing from one FIPSCODE to another however, what happens if FIPSCODE=1 has data from 2016 to 2018 and FISPCODE=2 has data starting in 2019?

 

Anyway, in SAS you use the LAG() functions to remember previous values.  Use BY group processing to prevent using lagged values from a different group.

 

data have ;
  input fipscode	year	race_total_pop ;
cards;
1003	2015	195121	 
1003	2016	199510	 
1003	2017	203360	 
1009	2015	57710	 
1009	2016	57704	 
1009	2017	57667	 
;
data want;
  set have;
  by fipscode year;
  lag_race_total_pop=lag(race_total_pop);
  if not first.fipscode then do;
    pop_growth=((race_total_pop-lag_race_total_pop)/lag_race_total_pop)*100;
  end;
run;

proc print;
run;
                            race_
                           total_    lag_race_      pop_
Obs    fipscode    year      pop     total_pop     growth

 1       1003      2015    195121           .       .
 2       1003      2016    199510      195121      2.24937
 3       1003      2017    203360      199510      1.92973
 4       1009      2015     57710      203360       .
 5       1009      2016     57704       57710     -0.01040
 6       1009      2017     57667       57704     -0.06412

 

 

PaigeMiller
Diamond | Level 26

Explain what tsset does.

--
Paige Miller
Beginner44
Calcite | Level 5

It declares that the data set is a time-series data and allows you to specify what variables the data should be sorted by (in my case the county identification code (fips code) and the year. This ensures that when manipulating variables it doesn't contaminate observations i.e doesn't calculate a population growth variable that meshes all the counties together instead of recognizing when there is a new observation.

PaigeMiller
Diamond | Level 26

Since we don't understand STATA, we are still not sure what your code is doing. Can you show us the desired output from this data? What is l.race_total_pop?

 

 

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 906 views
  • 0 likes
  • 3 in conversation