Dynamically Creating Time Series Dataset

Reply
Frequent Contributor
Posts: 87

Dynamically Creating Time Series Dataset

I am note sure how to post this, but here goes. I am using airline navigation data and I need to organize the data into a time series format for analysis. That is, for each monthly date I have a set of variables and associated data. The data is organized into flight zones. They are Vancouver, Edmonton, Winnipeg, Toronto, Montreal, Moncton, Gander, and Gander_OCA. For each region there are two different types of flights; IFR and VFR. Now making time series based on region and type is easy...the problem I have is adding markets. We have up to 13 markets, but not all regions will experience flights from the 13 markets, but I need to future proof in case such a flight comes. Each region will have hours in the region and number of flights. I can write macro loop for the region and type, but how do I make variables for each of the market types, and if no data exists, still create the market but populate it with zeros. Here is the code I use to make the time series without any markets:

 

ods listing close;
ods html close;
goptions device=actximg;


%macro fir_all (var_list=);
%local n fir_var;
%do n=1 %to %sysfunc(countw(&var_list));
	%let fir_var=%scan(&var_list,&n);

	/* Get IFR Flights */
	PROC SQL;
   		CREATE TABLE EGTASK.&fir_var._ARRANGE_IFR AS 
   		SELECT t1.Date, 
          t1.FIR_NUF LABEL="Non-Unique &fir_var IFR Flights" AS &fir_var._FIR_IFR_NUF, 
          t1.FIR_HOURS LABEL="&fir_var FIR IFR Hours" AS &fir_var._FIR_IFR_HOURS
     	FROM EGTASK.TOTALS_RULE_FIR t1
      	WHERE t1.FIR_NAME = "&fir_var" AND t1.Flight_Rule_Code = 'IFR'; run;

	/* Get VFR Flights */
	PROC SQL;
   		CREATE TABLE EGTASK.&fir_var._ARRANGE_VFR AS 
   		SELECT t1.Date, 
          t1.FIR_NUF LABEL="Non-Unique &fir_var VFR Flights" AS &fir_var._FIR_VFR_NUF, 
          t1.FIR_HOURS LABEL="&fir_var FIR VFR Hours" AS &fir_var._FIR_VFR_HOURS
     	FROM EGTASK.TOTALS_RULE_FIR t1
      	WHERE t1.FIR_NAME = "&fir_var" AND t1.Flight_Rule_Code = 'VFR'; run;

	/* Join IFR and VFR Flights into one dataset */
	PROC SQL;
 	   CREATE TABLE EGTASK.&fir_var._FIR AS 
   	   SELECT t1.DATE, 
       	  t1.&fir_var._FIR_IFR_NUF, 
          t1.&fir_var._FIR_IFR_HOURS, 
          t2.&fir_var._FIR_VFR_NUF, 
          t2.&fir_var._FIR_VFR_HOURS
      	FROM EGTASK.&fir_var._ARRANGE_IFR t1
        LEFT JOIN EGTASK.&fir_var._ARRANGE_VFR t2 ON (t1.DATE = t2.DATE);

	proc datasets nodetails nolist nowarn lib=egtask; delete &fir_var._ARRANGE_IFR &fir_var._ARRANGE_VFR &fir_var._ARRANGE_UNK; run;
	
%end;
%mend;
%fir_all (var_list=VANCOUVER EDMONTON WINNIPEG TORONTO MONTREAL MONCTON GANDER GANDER_OCA);

 

I am now stuck with integrating the market data. There are 13 markets called "Market 1", "Market 2", etc...but again, not all regions will get all flights, but they MAY get them all in the future. So I need to be able to merge in this data, even if a region has none. If there is none, then create the variable (e.g., Edmonton-IFR-Market1) and set it to zero. In the end I need a dataset like:

 

Jan2000 Edmonton-IFR-Flights-Market1 to Market13 Edmonton-IFR-Hours-Market1 to Market13 Edmonton-VFR-Flights-Market1 to Market13 Edmonton-VFR-Hours-Market1 to Market13

 

 Each region will have 13 IFR data variables and 13 VFR data variables for both flights and hours in the time series. All regions need to have all variables populated. If they do not have data, then create the variable and set observations to zero (or missing, as I can set missing to 0 later).

 

Not sure how difficult this is...but I cannot figure it out (checking if variable exists or not).  Thanks and Happy New Year.

Super User
Posts: 22,844

Re: Dynamically Creating Time Series Dataset

This is probably pretty straightforward, but a basic option is to first use PROC FREQ with the SPARSE option for region, type and market. It will generate all possible combinations and then you can 'fill' the values in from your actual data and set everything not in the dataset to 0. 

 

Or in a similar manner, have a master table that has all possible values and then use a cross join to make the empty table structure and again, fill it in via a join and set missing to zero. 

 

Here's how I'd generate all age/sex combinations from SASHELP.CLASS:

 

*generate list with all possible combinations;
proc freq data=sashelp.class noprint;
table age*sex / sparse out=empty;
run;

It sounds like your data is in a wide structure so you may need a transpose after to get your structure, but it's also a good example of how a long structure is easier to work with in the long run Smiley Happy

 

Reviewing your code, you could probably use a long structure and remove your macro overall, with a TRANSPOSE at the end to get your desired structure. 

 

It's probably easier if you show sample data and what you need as final output if you need further help. 

 


BCNAV wrote:

I am note sure how to post this, but here goes. I am using airline navigation data and I need to organize the data into a time series format for analysis. That is, for each monthly date I have a set of variables and associated data. The data is organized into flight zones. They are Vancouver, Edmonton, Winnipeg, Toronto, Montreal, Moncton, Gander, and Gander_OCA. For each region there are two different types of flights; IFR and VFR. Now making time series based on region and type is easy...the problem I have is adding markets. We have up to 13 markets, but not all regions will experience flights from the 13 markets, but I need to future proof in case such a flight comes. Each region will have hours in the region and number of flights. I can write macro loop for the region and type, but how do I make variables for each of the market types, and if no data exists, still create the market but populate it with zeros. Here is the code I use to make the time series without any markets:

 

ods listing close;
ods html close;
goptions device=actximg;


%macro fir_all (var_list=);
%local n fir_var;
%do n=1 %to %sysfunc(countw(&var_list));
	%let fir_var=%scan(&var_list,&n);

	/* Get IFR Flights */
	PROC SQL;
   		CREATE TABLE EGTASK.&fir_var._ARRANGE_IFR AS 
   		SELECT t1.Date, 
          t1.FIR_NUF LABEL="Non-Unique &fir_var IFR Flights" AS &fir_var._FIR_IFR_NUF, 
          t1.FIR_HOURS LABEL="&fir_var FIR IFR Hours" AS &fir_var._FIR_IFR_HOURS
     	FROM EGTASK.TOTALS_RULE_FIR t1
      	WHERE t1.FIR_NAME = "&fir_var" AND t1.Flight_Rule_Code = 'IFR'; run;

	/* Get VFR Flights */
	PROC SQL;
   		CREATE TABLE EGTASK.&fir_var._ARRANGE_VFR AS 
   		SELECT t1.Date, 
          t1.FIR_NUF LABEL="Non-Unique &fir_var VFR Flights" AS &fir_var._FIR_VFR_NUF, 
          t1.FIR_HOURS LABEL="&fir_var FIR VFR Hours" AS &fir_var._FIR_VFR_HOURS
     	FROM EGTASK.TOTALS_RULE_FIR t1
      	WHERE t1.FIR_NAME = "&fir_var" AND t1.Flight_Rule_Code = 'VFR'; run;

	/* Join IFR and VFR Flights into one dataset */
	PROC SQL;
 	   CREATE TABLE EGTASK.&fir_var._FIR AS 
   	   SELECT t1.DATE, 
       	  t1.&fir_var._FIR_IFR_NUF, 
          t1.&fir_var._FIR_IFR_HOURS, 
          t2.&fir_var._FIR_VFR_NUF, 
          t2.&fir_var._FIR_VFR_HOURS
      	FROM EGTASK.&fir_var._ARRANGE_IFR t1
        LEFT JOIN EGTASK.&fir_var._ARRANGE_VFR t2 ON (t1.DATE = t2.DATE);

	proc datasets nodetails nolist nowarn lib=egtask; delete &fir_var._ARRANGE_IFR &fir_var._ARRANGE_VFR &fir_var._ARRANGE_UNK; run;
	
%end;
%mend;
%fir_all (var_list=VANCOUVER EDMONTON WINNIPEG TORONTO MONTREAL MONCTON GANDER GANDER_OCA);

 

I am now stuck with integrating the market data. There are 13 markets called "Market 1", "Market 2", etc...but again, not all regions will get all flights, but they MAY get them all in the future. So I need to be able to merge in this data, even if a region has none. If there is none, then create the variable (e.g., Edmonton-IFR-Market1) and set it to zero. In the end I need a dataset like:

 

Jan2000 Edmonton-IFR-Flights-Market1 to Market13 Edmonton-IFR-Hours-Market1 to Market13 Edmonton-VFR-Flights-Market1 to Market13 Edmonton-VFR-Hours-Market1 to Market13

 

 Each region will have 13 IFR data variables and 13 VFR data variables for both flights and hours in the time series. All regions need to have all variables populated. If they do not have data, then create the variable and set observations to zero (or missing, as I can set missing to 0 later).

 

Not sure how difficult this is...but I cannot figure it out (checking if variable exists or not).  Thanks and Happy New Year.


 

Super User
Posts: 13,023

Re: Dynamically Creating Time Series Dataset

It might help to provide some example data and what the final result for that data would look like. Dummy data that behaves as your actual data is sufficient.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

It may be that different summarization, such as Proc Means/Summary with market variables as class variables would work. Or possibly a restructuring your data to use a different approach than summarize each and merge back.

 

I have to say that

var_list=VANCOUVER EDMONTON WINNIPEG TORONTO MONTREAL MONCTON GANDER GANDER_OCA

indicates that a data structure choice with each market as a variable may have been suboptimal. Many analysis tasks get much easier if a variable such as Market contains the value of the market name. Then By group or Class variable of Market can simplify the code and completely remove the worry about adding markets.

 

 

If you need for people to see the result in an across table then a report procedure such as Proc Report or Tabulate may work better.

 

Ask a Question
Discussion stats
  • 2 replies
  • 198 views
  • 1 like
  • 3 in conversation