Good morning!
I am trying to create panel data from a legal dataset that has rows of data by state with a start date and an end date reflecting when laws were enacted/changed. I've been able to panel the data by year, but I need it to be by month & year such that every month between start and end date is included as a row. Many rows will have identical data as the laws don't change frequently.
The problem I'm having is that a state might have the same law from 1/1/1980 until 2/1/2010 and when I try to panel it by month, I'm only getting Jan and Feb, rather than Jan-Dec for all years until 2010 (when it should just be Jan-Feb). Here's the code I used to panel by year, which worked:
*Extract Year, Month, Day from Begin Date & End Date;
data date;
set eitc;
Start_Year = year(Begin_Date);
End_Year = year(End_Date);
Start_month = month(begin_date);
end_month = month(end_date);
Start_Day = day(Begin_date);
End_day = day(end_date);
if start_day>14 then new_month=start_month+1;
else new_month=start_month;
run;
*Sort by State Year Month;
proc sort data=date;
by state start_year new_month;
run;
*Panel Data by year;
data year;
set date;
by state;
do year = start_year to end_year;
output;
end;
run;
Would really appreciate any suggestions! Please let me know if more information would be helpful.
Many thanks,
Sarah
Hi,
It would be useful to show some test data in the form of a datastep and what you want the output to look like. I am not familiar with "panel by"? It looks like you need to loop over an interval:
do i=1 to intck('month',month,start_date,end date);
date=intnx('month',start_date,i-1);
Thanks for the quick reply! I'm not sure what you mean by showing test data in a datastep. Does the attached help?
Not really, I am afraid I can't download Excel files as they are a secutiry risk. However I extrapolated the first record from it and sho an example here:
data have (drop=i); state="AZ"; begin_date="01JAN1980"d; end_date="02Jan2016"d; do i=1 to intck('month',begin_date,end_date); mon_yr=cats(put(month(intnx('month',begin_date,i-1)),z2.),"-",put(year(intnx('month',begin_date,i-1)),z4.)); output; end; run;
Great, that worked for AZ! I have a dataset with all 50 states. Do I need to repeat the step for each state?
The do loop will run for each observation in your dataset, so it will do the loop for _n_=1, then for _n_=2 to _n_=last, and expand each date.
I cannot tell without seeing the code you are running and the data you are running over. I am assuming that your replace the below in your code with a set for the data you have?
data have (drop=i); state="AZ"; begin_date="01JAN1980"d; end_date="02Jan2016"d; <- replace this with set <your_dataset>;!! do i=1 to intck('month',begin_date,end_date); mon_yr=cats(put(month(intnx('month',begin_date,i-1)),z2.),"-",put(year(intnx('month',begin_date,i-1) ),z4.)); output; end; run;
Here's my code, with yours at the bottom:
libname eitc 'C:\Users\tub68397\Desktop\Projects\Legal Datasets\Panel Datasets for Nick';
*Import Excel Data;
PROC IMPORT OUT= work.eitc DATAFILE= "C:\Users\tub68397\Desktop\Projects\Legal Datasets\Panel Datasets for Nick\2_1_2016 EITC Data Stat.xlsx"
DBMS=xlsx REPLACE;
SHEET="2_1_2016 EITC Data";
GETNAMES=YES;
RUN;
proc print data=eitc;
run;
*Extract Year, Month, Day from Begin Date & End Date;
data date;
set eitc;
Start_Year = year(Begin_Date);
End_Year = year(End_Date);
Start_month = month(begin_date);
end_month = month(end_date);
Start_Day = day(Begin_date);
End_day = day(end_date);
if start_day>14 then new_month=start_month+1;
else new_month=start_month;
yearmon = start_year * 100 + new_month;
format yearmon mmyys.;
run;
*Sort by State Year Month;
proc sort data=date;
by state start_year new_month;
run;
*Panel Data by year;
data year;
set date;
by state;
do year = start_year to end_year;
output;
end;
run;
data have (drop=i);
set year;
state="AK"; begin_date="01JAN1980"d; end_date="02Jan2016"d;
do i=1 to intck('month',begin_date,end_date);
mon_yr=cats(put(month(intnx('month',begin_date,i-1)),z2.),"-",put(year(intnx('month',begin_date,i-1)),z4.));
output;
end;
run;
proc freq data=have;
tables state;
run;
Yes, remove this line, I put that in as example data, you don't want that to be used in your data!
data have (drop=i);
set year;
state="AK"; begin_date="01JAN1980"d; end_date="02Jan2016"d; <--- drop
do i=1 to intck('month',begin_date,end_date);
mon_yr=cats(put(month(intnx('month',begin_date,i-1)),z2.),"-",put(year(intnx('month',begin_date,i-1)),z4.));
output;
end;
run;
Oh, and whilst your looking at it, make sure your variables match the data you have. This is why we ask for test data - in the form of a datastep, so that we program on the same data you are programming on.
For point 1, Reeza has posted a good article on how to post data here:
For your second point, you can use if statements in the do loop, or anywhere in code. I would suggest you probably want it before the do loop - i.e. to apply to the whole loop
data have (drop=i);
set year;
if day(begin_date) > 14 then begin_date=intnx('month',begin_date,1);
do i=1 to intck('month',begin_date,end_date);
mon_yr=cats(put(month(intnx('month',begin_date,i-1)),z2.),"-",put(year(intnx('month',begin_date,i-1)),z4.));
output;
end;
run;
This will increment begin_date by 1 month if day > 14, and then loop uses begin_date...
Thanks! Last (hopefully!) wrinkle. I now have a missing month when it skips to following month for the 15th or later dates. So it will go from Jan to Mar (skip Feb). I would want the data to carry from Jan to Feb. Suggestions?
Create a new variable:
yearmon = start_year * 100 + new_month;
and sort by that.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.