SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Filtering Data

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Filtering Data

Hi,

I'm a SAS newbie, and I have a pretty simple question:

I have about 20 data sets representing specific sections of a road network, and each have 2 columns that I am interested in: Vehicle_ID and Time_Step. At each time step (1 second), the data set lists the the vehicles which are at a particular position on that section of the road. I would like to match the Vehicle_ID in all tables and obtain the vehicles that are travelling along the entire route in the

network. For each of these vehicles, I would like to find the total travel from from the time steps. I was wondering if there is a simple way in which to filter the data in each data set to give me the

Vehicle_IDs and a count of the corresponding Time_Steps in which the vehicle appeared on the section. For example, if Vehicle 2242 was found to be in the data set at Time_Steps 43, 44, 45, 46, 47, 48 then it was travelling for 6 seconds on that section. Its a relatively simple calculation but I am having trouble performing the filtering on 20+ large data sets.

I would really appreciate it anyone could give me some advice on how I might be able to do this in SAS or forward my question to someone who could help me.

Thanks in advance!

Anya


Accepted Solutions
Solution
‎10-19-2011 06:13 PM
SAS Employee
Posts: 51

Filtering Data

Anja,

If I understand what you've asked  correctly, I think you can do this with some simple SQL summarization functions.  For example:

select vehicle_id,count(Time_Step)

from <table>

group by vehicle_id;

This can be accomplished with the "Extract" or "SQL Join" transformations. 

If your ultimate goal is to create one data set that has the data from all 20 datasets consolidated into a single dataset, rolled up by vehicle and segment, this would be a good use of the Loop transform.  You would create one inner job that did the basic summarization by vehicle_id, as described above, derive the "segment name" from a parameter passed into the job (maybe the input dataset name), parameterize the name of the input table and set the options on the Table Loader for the target table to append.  You would then create an outer job that calls the Inner Job using the Loop transformation, running each step sequentially instead of in parallel.  If processing time is a problem, there are strategies to execute the iterations in parallel and then gather all the separate pieces later.  Please see this post for a detailed explanation of how to use the Loop transform and Inner and Outer jobs in this manner.  There are other ways to accomplish grouping all the datasets together, but this one will allow you to extend/reduce the number of segments without changing your job at all.

Thanks,

Tim Stearn

View solution in original post


All Replies
Solution
‎10-19-2011 06:13 PM
SAS Employee
Posts: 51

Filtering Data

Anja,

If I understand what you've asked  correctly, I think you can do this with some simple SQL summarization functions.  For example:

select vehicle_id,count(Time_Step)

from <table>

group by vehicle_id;

This can be accomplished with the "Extract" or "SQL Join" transformations. 

If your ultimate goal is to create one data set that has the data from all 20 datasets consolidated into a single dataset, rolled up by vehicle and segment, this would be a good use of the Loop transform.  You would create one inner job that did the basic summarization by vehicle_id, as described above, derive the "segment name" from a parameter passed into the job (maybe the input dataset name), parameterize the name of the input table and set the options on the Table Loader for the target table to append.  You would then create an outer job that calls the Inner Job using the Loop transformation, running each step sequentially instead of in parallel.  If processing time is a problem, there are strategies to execute the iterations in parallel and then gather all the separate pieces later.  Please see this post for a detailed explanation of how to use the Loop transform and Inner and Outer jobs in this manner.  There are other ways to accomplish grouping all the datasets together, but this one will allow you to extend/reduce the number of segments without changing your job at all.

Thanks,

Tim Stearn

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 1 reply
  • 195 views
  • 0 likes
  • 2 in conversation