BookmarkSubscribeRSS Feed
sklieger
Calcite | Level 5

Good morning!

 

I am trying to create panel data from a legal dataset that has rows of data by state with a start date and an end date reflecting when laws were enacted/changed. I've been able to panel the data by year, but I need it to be by month & year such that every month between start and end date is included as a row. Many rows will have identical data as the laws don't change frequently.

 

The problem I'm having is that a state might have the same law from 1/1/1980 until 2/1/2010 and when I try to panel it by month, I'm only getting Jan and Feb, rather than Jan-Dec for all years until 2010 (when it should just be Jan-Feb). Here's the code I used to panel by year, which worked:

 

*Extract Year, Month, Day from Begin Date & End Date;
data date;
set eitc;
Start_Year = year(Begin_Date);
End_Year = year(End_Date);
Start_month = month(begin_date);
end_month = month(end_date);
Start_Day = day(Begin_date);
End_day = day(end_date);
if start_day>14 then new_month=start_month+1;
else new_month=start_month;
run;
 
*Sort by State Year Month;
proc sort data=date;
by state start_year new_month;
run;
 
*Panel Data by year;
data year;
set date;
by state;
do year = start_year to end_year;
output;
end;
run;

 

Would really appreciate any suggestions! Please let me know if more information would be helpful.

 

Many thanks,

 

Sarah

 

21 REPLIES 21
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

 

It would be useful to show some test data in the form of a datastep and what you want the output to look like.  I am not familiar with "panel by"?  It looks like you need to loop over an interval:

do i=1 to intck('month',month,start_date,end date);

  date=intnx('month',start_date,i-1);

sklieger
Calcite | Level 5

Thanks for the quick reply! I'm not sure what you mean by showing test data in a datastep. Does the attached help?

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Not really, I am afraid I can't download Excel files as they are a secutiry risk.  However I extrapolated the first record from it and sho an example here:

data have (drop=i);
  state="AZ"; begin_date="01JAN1980"d; end_date="02Jan2016"d;
  do i=1 to intck('month',begin_date,end_date);
    mon_yr=cats(put(month(intnx('month',begin_date,i-1)),z2.),"-",put(year(intnx('month',begin_date,i-1)),z4.));
    output; 
  end;
run;
sklieger
Calcite | Level 5

Great, that worked for AZ! I have a dataset with all 50 states. Do I need to repeat the step for each state?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

The do loop will run for each observation in your dataset, so it will do the loop for _n_=1, then for _n_=2 to _n_=last, and expand each date.

sklieger
Calcite | Level 5
For some reason, it's only doing it for AZ. Do I need to add code?
RW9
Diamond | Level 26 RW9
Diamond | Level 26

I cannot tell without seeing the code you are running and the data you are running over.  I am assuming that your replace the below in your code with a set for the data you have?

data have (drop=i);
  state="AZ"; begin_date="01JAN1980"d; end_date="02Jan2016"d;    <- replace this with set <your_dataset>;!!
  do i=1 to intck('month',begin_date,end_date);
    mon_yr=cats(put(month(intnx('month',begin_date,i-1)),z2.),"-",put(year(intnx('month',begin_date,i-1)),z4.));
    output; 
  end;
run;
sklieger
Calcite | Level 5

Here's my code, with yours at the bottom:

 

libname eitc 'C:\Users\tub68397\Desktop\Projects\Legal Datasets\Panel Datasets for Nick';

*Import Excel Data;
PROC IMPORT OUT= work.eitc DATAFILE= "C:\Users\tub68397\Desktop\Projects\Legal Datasets\Panel Datasets for Nick\2_1_2016 EITC Data Stat.xlsx" 
            DBMS=xlsx REPLACE;
     SHEET="2_1_2016 EITC Data"; 
     GETNAMES=YES;
RUN;

proc print data=eitc;
run;

*Extract Year, Month, Day from Begin Date & End Date;
data date;
set eitc;
Start_Year = year(Begin_Date);
End_Year = year(End_Date);
Start_month = month(begin_date);
end_month = month(end_date);
Start_Day = day(Begin_date);
End_day = day(end_date);
if start_day>14 then new_month=start_month+1;
else new_month=start_month;
yearmon = start_year * 100 + new_month;
format yearmon mmyys.;
run;

*Sort by State Year Month;
proc sort data=date;
by state start_year new_month;
run;

*Panel Data by year;
data year;
set date;
by state;
do year = start_year to end_year;
output;
end;
run;


data have (drop=i);
set year;
  state="AK"; begin_date="01JAN1980"d; end_date="02Jan2016"d;
  do i=1 to intck('month',begin_date,end_date);
    mon_yr=cats(put(month(intnx('month',begin_date,i-1)),z2.),"-",put(year(intnx('month',begin_date,i-1)),z4.));
    output; 
  end;
run;

proc freq data=have;
tables state;
run;

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, remove this line, I put that in as example data, you don't want that to be used in your data!

data have (drop=i);
set year;
  state="AK"; begin_date="01JAN1980"d; end_date="02Jan2016"d;   <--- drop
  do i=1 to intck('month',begin_date,end_date);
    mon_yr=cats(put(month(intnx('month',begin_date,i-1)),z2.),"-",put(year(intnx('month',begin_date,i-1)),z4.));
    output; 
  end;
run;

 

Oh, and whilst your looking at it, make sure your variables match the data you have.  This is why we ask for test data - in the form of a datastep, so that we program on the same data you are programming on.

sklieger
Calcite | Level 5
Okay, data look good now. I generally import data form Excel. How would I show data in the form of a data step?

One other, related question, if I want to use a rule to assign month based on day, how would I incorporate that into the code I have? For example, if a law changed during the first 14 days of a month, I want to use the month already there in the Begin_Date. If the law changed on Day 15 or after, I want to change the month to the following month. So April 14, stay April. April 15th, change to May. Can that be done in the do loop?

Thanks for all your help!!
RW9
Diamond | Level 26 RW9
Diamond | Level 26

For point 1, Reeza has posted a good article on how to post data here:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

For your second point, you can use if statements in the do loop, or anywhere in code.  I would suggest you probably want it before the do loop - i.e. to apply to the whole loop

data have (drop=i);
  set year;
if day(begin_date) > 14 then begin_date=intnx('month',begin_date,1);
do i=1 to intck('month',begin_date,end_date); mon_yr=cats(put(month(intnx('month',begin_date,i-1)),z2.),"-",put(year(intnx('month',begin_date,i-1)),z4.)); output; end; run;

 This will increment begin_date by 1 month if day > 14, and then loop uses begin_date...

sklieger
Calcite | Level 5

Thanks! Last (hopefully!) wrinkle. I now have a missing month when it skips to following month for the 15th or later dates. So it will go from Jan to Mar (skip Feb). I would want the data to carry from Jan to Feb. Suggestions?

sklieger
Calcite | Level 5
Hi Kurt,

Thanks for your response. I tried that approach and also just ran your code and am getting the same result: it's showing me the months for which changes to data were reflected in changes to the start_date, but it's not creating rows for every month and carrying the data down until the next change.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 1717 views
  • 0 likes
  • 6 in conversation