BookmarkSubscribeRSS Feed
datalligence
Fluorite | Level 6
I have the following variables VehicleID, TransactDate, TransactTime, OdometerReading, TransactCity, TransactState.

VehicleID is the unique vehicle ID, OdometerReading is the Odometer Reading, and the others are information related to the transaction time and location of the fuel card (similar to a credit card).

The records will be first grouped and sorted by VehicleID, TransactDate, TransactTime and OdometerReading. Then all records where the Vehicle ID and TransactDate is same for consecutive rows, AND TransactCity or TransactState are different for consecutive rows should be printed.

I also would like to add two derived variables.

1. Miles will be a derived variable that is the difference between consecutive odometer readings for the same Vehicle ID.

2. TimeDiff will be the second derived variable that will categorize the time difference for a particular vehicle on the same day.

My report should look like:

VehID TrDt TrTime TimeDiff Odometer Miles TrCity TrState
1296 1/30/2008 08:22:42 0:00:00 18301 000 Omaha NE
1296 1/30/2008 15:22:46 7:00:04 18560 259 KEARNEY NE

Can someone please help me here?

Thanks,
Romakanta
6 REPLIES 6
deleted_user
Not applicable
I don't wish to be rude, but, most of us have learned this stuff by reading the SAS documentation and then creatively experimenting through our own coding efforts.

That used to be one of the hallmarks of American education, learning to think for ourselves.
Cynthia_sas
SAS Super FREQ
Hi:
This is not directly an ODS or Report Procedure( PRINT, REPORT or TABULATE) question.

There are several different ways that you could solve this problem -- but all the ways that I can think off involve the DATA step -- either using ARRAYs to hold the values when you find the "set" of consecutive variables that meet your criteria or using the LAG function. (There are probably other solutions, but those are the places where I would start.) Inside your DATA step program, you'd have to calculate your new variables and THEN, generate your report with PRINT or REPORT.

Basically, the data -- in the original form you describe is not ready for a report procedure yet. If you don't understand how the DATA step works or how ARRAYS or the LAG function works, then reading the documentation would be the place to start. You could also search on the web for SUGI or SGF papers on the LAG function or on ARRAYS. A quick Google search found these 3 papers on the LAG function.
http://nesug.org/proceedings/nesug06/cc/cc32.pdf
http://www2.sas.com/proceedings/forum2007/029-2007.pdf
http://www.howles.com/saspapers/CC33.pdf

A similar search would reveal papers on the use of the DATA step and/or the use of ARRAYs.

cynthia
datalligence
Fluorite | Level 6
Thanks Cynthia, I have tried the Lag function to compare values in consecutive rows but the problem is that it selects or extracts only the row below whenever there is a match. In the example below, only obs 2, 3, and 5 will be selected. But it should be obs 1, 2, 4 and 5.

obs ori value lag value
1 100
2 100 100
3 300 100
4 200 300
5 200 200

Using arrays seems a better option but I don't how much time I can spend with the help manuals/documents 'coz this project's already under a time crunch! I'm still a newbie when it comes to reports/data manipulation as I use SAS Stat procedures most of the times on cleaned data 🙂

romakanta
deleted_user
Not applicable
This could be done a number of different ways.

1) use of transpose to put all the odometer readings in the same observation and then use proc report and computed fields.

2)
[pre]
proc sort data=indata;
by vid td tt odo;
run;

data outdata;
set indata;
by vid td tt odo;
hh = hour(tt);
mm = minute(tt):
ss = second(tt);
datetime = dhms(td,hh,mm,ss);

if first.vid then do;
lag(odo);
lag(datetime);
end;

miles = odo - lag(odo);
timediff = datetime - lag(datetime);

drop hh mm ss datetime;
run;

proc print data=outdata;
var vid td tt timediff odo miles city state;
format td mmddyy10. timediff time. ;
run;
quit;
[/pre]
Or something there about
Cynthia_sas
SAS Super FREQ
Hi:
I thought your criteria was that the vehicle ids and dates had to be the SAME on the consecutive obs and the the city/state had to be different on the consecutive obs. Without seeing the cities and states that were in your data and seeing your code with LAG, it's hard to understand why the LAG function didn't work for you. That may be a question for Tech Support. They could look at your data and your code with LAG and help you figure out how to make LAG work.

The thing is that when SAS is reading on obs #1, it can't make the decision about whether to output #1 because it hasn't read obs #2 yet -- the decision, when dealing with consecutive pairs can't be made until the second obs in the pair has been encountered (since SAS reads through the dataset sequentially).

This may mean that you have to code LAG to "save" more values from the LAGged observation. This might also mean that the structure of your final dataset could/will be different than the structure of the input dataset. I envision something more like this coming out of LAG and then you could code your conditional logic:
[pre]

V_ID EndV_ID Start_Dt End_Dt Start_TM End_TM StartODO EndODO StartCity StartSt EndCity EndSt StartObs EndObs
1296 1296 1/30/2008 1/30/2008 0:00:00 08:22:42 18301 18560 Omaha NE Kearney NE 1 2
[/pre]


Basically, you'd use LAG to build 1 OBS out of every pair of consecutive observations. Sure, you'll throw away some observations because they don't make sense (like when Veh ID changes or the date is wrong), but with the above information all on 1 obs, it would be easy to see whether the 2 vehicle IDs were the same and the cities, etc and then just subtract to get the miles. One thing that is mentioned in most of the papers and notes about LAG is that you cannot use LAG conditionally (inside an IF statement) -- and I have found this to be true.

There is a Tech Support example http://support.sas.com/kb/24/665.html that explains:

/* This example shows the difference in output when you use conditional */
/* and unconditional logic in your program. Because the LAG function stores */
/* values on the queue only when it is called, you must call LAG unconditionally */
/* (outside the IF condition) to get the correct answers. (SAS Language Reference, */
/* Dictionary --> Functions and CALL Routines --> LAG function, Example 2) */


This means that you really need to issue the LAG function on every obs and THEN test the conditions and calculate your MILES and/or TIME. With this scenario -- using LAG for every OBS, you will probably have fewer observations coming out of the data step program than you had going into the data step program -- because there will be some combinations of the LAGged observations that do not make sense to test or output.

Here are some more examples of using LAG:
http://support.sas.com/kb/25/938.html
http://support.sas.com/kb/24/694.html

cynthia
datalligence
Fluorite | Level 6
chuck and cynthia, thanks a lot!!!

now, i think i know how to attack this:-) i will work on this over the weekend.

thanks again.
romakanta

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 9252 views
  • 0 likes
  • 3 in conversation