I have the following variables VehicleID, TransactDate, TransactTime, OdometerReading, TransactCity, TransactState.
VehicleID is the unique vehicle ID, OdometerReading is the Odometer Reading, and the others are information related to the transaction time and location of the fuel card (similar to a credit card).
The records will be first grouped and sorted by VehicleID, TransactDate, TransactTime and OdometerReading. Then all records where the Vehicle ID and TransactDate is same for consecutive rows, AND TransactCity or TransactState are different for consecutive rows should be printed.
I also would like to add two derived variables.
1. Miles will be a derived variable that is the difference between consecutive odometer readings for the same Vehicle ID.
2. TimeDiff will be the second derived variable that will categorize the time difference for a particular vehicle on the same day.
My report should look like:
VehID TrDt TrTime TimeDiff Odometer Miles TrCity TrState
1296 1/30/2008 08:22:42 0:00:00 18301 000 Omaha NE
1296 1/30/2008 15:22:46 7:00:04 18560 259 KEARNEY NE
This is not directly an ODS or Report Procedure( PRINT, REPORT or TABULATE) question.
There are several different ways that you could solve this problem -- but all the ways that I can think off involve the DATA step -- either using ARRAYs to hold the values when you find the "set" of consecutive variables that meet your criteria or using the LAG function. (There are probably other solutions, but those are the places where I would start.) Inside your DATA step program, you'd have to calculate your new variables and THEN, generate your report with PRINT or REPORT.
Thanks Cynthia, I have tried the Lag function to compare values in consecutive rows but the problem is that it selects or extracts only the row below whenever there is a match. In the example below, only obs 2, 3, and 5 will be selected. But it should be obs 1, 2, 4 and 5.
obs ori value lag value
2 100 100
3 300 100
4 200 300
5 200 200
Using arrays seems a better option but I don't how much time I can spend with the help manuals/documents 'coz this project's already under a time crunch! I'm still a newbie when it comes to reports/data manipulation as I use SAS Stat procedures most of the times on cleaned data :-)
I thought your criteria was that the vehicle ids and dates had to be the SAME on the consecutive obs and the the city/state had to be different on the consecutive obs. Without seeing the cities and states that were in your data and seeing your code with LAG, it's hard to understand why the LAG function didn't work for you. That may be a question for Tech Support. They could look at your data and your code with LAG and help you figure out how to make LAG work.
The thing is that when SAS is reading on obs #1, it can't make the decision about whether to output #1 because it hasn't read obs #2 yet -- the decision, when dealing with consecutive pairs can't be made until the second obs in the pair has been encountered (since SAS reads through the dataset sequentially).
This may mean that you have to code LAG to "save" more values from the LAGged observation. This might also mean that the structure of your final dataset could/will be different than the structure of the input dataset. I envision something more like this coming out of LAG and then you could code your conditional logic:
Basically, you'd use LAG to build 1 OBS out of every pair of consecutive observations. Sure, you'll throw away some observations because they don't make sense (like when Veh ID changes or the date is wrong), but with the above information all on 1 obs, it would be easy to see whether the 2 vehicle IDs were the same and the cities, etc and then just subtract to get the miles. One thing that is mentioned in most of the papers and notes about LAG is that you cannot use LAG conditionally (inside an IF statement) -- and I have found this to be true.
There is a Tech Support example http://support.sas.com/kb/24/665.html that explains:
/* This example shows the difference in output when you use conditional */
/* and unconditional logic in your program. Because the LAG function stores */
/* values on the queue only when it is called, you must call LAG unconditionally */
/* (outside the IF condition) to get the correct answers. (SAS Language Reference, */
/* Dictionary --> Functions and CALL Routines --> LAG function, Example 2) */
This means that you really need to issue the LAG function on every obs and THEN test the conditions and calculate your MILES and/or TIME. With this scenario -- using LAG for every OBS, you will probably have fewer observations coming out of the data step program than you had going into the data step program -- because there will be some combinations of the LAGged observations that do not make sense to test or output.