I have date like the one below
Machine | Date | Miles |
HP0001 | 10DEC07 | 1885462.00000 |
HP0001 | 12DEC07 | 1885469.00000 |
HP0001 | 30APR09 | 1885473.00000 |
HP0001 | 15JUL09 | 1886577.00000 |
HP0001 | 16JUL09 | 1887667.00000 |
HP0001 | 17JUL09 | 1889181.00000 |
HP0001 | 17JUL09 | 1888825.00000 |
HP0001 | 18JUL09 | 1889621.00000 |
HP0001 | 20JUL09 | 1891363.00000 |
HP0001 | 21JUL09 | 1892398.00000 |
HP0001 | 22JUL09 | 1893513.00000 |
HP0002 | 15DEC07 | 1985462.00000 |
HP0002 | 17DEC07 | 1985462.00000 |
HP0002 | 28APR09 | 1985462.00000 |
HP0002 | 10JUL09 | 1986577.00000 |
HP0002 | 17JUL09 | 1987667.00000 |
HP0002 | 20JUL09 | 1989181.00000 |
HP0002 | 22JUL09 | 1988825.00000 |
HP0002 | 28JUL09 | 1989621.00000 |
HP0002 | 31JUL09 | 1991363.00000 |
HP0002 | 21AUG09 | 1992398.00000 |
HP0002 | 24AUG09 | 1993513.00000 |
HP0003 | 10NOV07 | 2085462.00000 |
HP0003 | 18NOV07 | 2085462.00000 |
HP0003 | 30FEB09 | 2085462.00000 |
HP0003 | 15JUN09 | 2086577.00000 |
HP0003 | 16JUL09 | 2087667.00000 |
HP0003 | 17AUG09 | 2089181.00000 |
HP0003 | 10SEP09 | 2088825.00000 |
HP0003 | 18OCT09 | 2089621.00000 |
HP0003 | 30OCT09 | 2091363.00000 |
HP0003 | 21NOV09 | 2092398.00000 |
HP0003 | 22DEC09 | 2093513.00000 |
What I want to do is to calculate difference between dates and miles something like the sown in the table below:
1. first row will have DaysDiff values as zero because that is first record of data for each machine
2. similarly MilesDiff is zero in the first row, but that it is a difference between first row and second row data.
Machine | Date | Miles | DaysDiff | MilesDiff |
HP0001 | 10DEC07 | 1885462.00000 | 0 | 0 |
HP0001 | 12DEC07 | 1885469.00000 | 2 | 9 |
HP0001 | 30APR09 | 1885473.00000 | 4 | |
HP0001 | 15JUL09 | 1886577.00000 | ||
HP0001 | 16JUL09 | 1887667.00000 | ||
HP0001 | 17JUL09 | 1889181.00000 | ||
HP0001 | 17JUL09 | 1888825.00000 | ||
HP0001 | 18JUL09 | 1889621.00000 | ||
HP0001 | 20JUL09 | 1891363.00000 | ||
HP0001 | 21JUL09 | 1892398.00000 | ||
HP0001 | 22JUL09 | 1893513.00000 | ||
HP0002 | 15DEC07 | 1985462.00000 | 0 | 0 |
HP0002 | 17DEC07 | 1985465.00000 | 2 | 3 |
HP0002 | 28APR09 | 1985468.00000 | 3 | |
HP0002 | 10JUL09 | 1986577.00000 | ||
HP0002 | 17JUL09 | 1987667.00000 | ||
HP0002 | 20JUL09 | 1989181.00000 | ||
HP0002 | 22JUL09 | 1988825.00000 | ||
HP0002 | 28JUL09 | 1989621.00000 | ||
HP0002 | 31JUL09 | 1991363.00000 | ||
HP0002 | 21AUG09 | 1992398.00000 | ||
HP0002 | 24AUG09 | 1993513.00000 | ||
HP0003 | 10NOV07 | 2085462.00000 | 0 | 0 |
HP0003 | 18NOV07 | 2085465.00000 | 8 | 3 |
HP0003 | 30FEB09 | 2085469.00000 | 6 | |
HP0003 | 15JUN09 | 2086577.00000 | ||
HP0003 | 16JUL09 | 2087667.00000 | ||
HP0003 | 17AUG09 | 2089181.00000 | ||
HP0003 | 10SEP09 | 2088825.00000 | ||
HP0003 | 18OCT09 | 2089621.00000 | ||
HP0003 | 30OCT09 | 2091363.00000 | ||
HP0003 | 21NOV09 | 2092398.00000 | ||
HP0003 | 22DEC09 | 2093513.00000 |
I applied lag but it did not work and also for each row when Machine is changing, I am not able to put zero. I was trying something like this. I dont know how to check each machine type and then run the rest calculations.
data want;
set have;
by MachCode;
DateDiff = lag( Date );
MilesDiff = dif( MileageValue );
run;
Please help I am new to SAS and learning it.
For HP0001, 2rd line: (03OCT14) - (02OCT14) = 1 = one day difference is right.
2900334 - 2899148 = 1186 - is correct
why is in RED ?
For HP0002 - It seems that LAG function act in a different way than I toght,
maybe it is because I don't use LAG on firt.MachCode row.
Therefore I will use RETAIN instead LAG, to save data for the next observation:
data want;
set have;
by machCode; /* assumed data is sorted by machine date */
RETAIN lag_date lag_miles;
DROP lag_date lag_miles;
if first.machCode
then do; /* 1st row of a machine */
DaysDiff=0;
MilesDiff=0;
end;
else do;
daysdiff = date - lag_date; /* calculating the difference */
milesdiff = miles - lag_miles;
end;
lag_date = date;
lag_miles = miles;
run;
Use LAG function to get a variable value from previous row.
According to your post you want to do it per machine:
data want;
set have;
by machine; /* assumed data is sorted by machine date */
if first.machine then do; DaysDiff=0; MilesDiff=0; end;
else do;
daysdiff = date - lag(date);
milesdiff = miles - lag(miles);
end;
run;
Hello Shmuel,
Thanks for your efforts. Unfortunately, there is an error. In the second row, it is putting a dot (.) value, i think for NULL. and from third row it is calculating. I tried with _N_ = 1 as well, but whenever Machine is changing, both methods fail. :(''
This is what your code is doing, when Machine changes form one to another.. It is not taking last recort for lag but second last row of the HP001 (i.e. instead t - (t-1) it is calculating t - (t-2) 😞
Machine | Date | Miles | DateDiff | MilesDiff |
HP0001 | 01OCT14 | 2898108.00000 | 1 | 1059 |
HP0001 | 02OCT14 | 2899148.00000 | 1 | 1040 |
HP0001 | 03OCT14 | 2900334.00000 | 1 | 1186 |
HP0002 | 17JAN08 | 926384.00000 | 0 | 0 |
HP0002 | 18JAN08 | 926384.00000 | -2450 | -1973950 |
HP0002 | 28APR09 | 1237332.00000 | 466 | 310948 |
HP0002 | 29APR09 | 1238599.00000 | 1 | 1267 |
Regards,
Manu
Are dates converted to SAS dates using informat date7. ?
Is MilesDiff on second row is missing value too ?
I changed datepart(DateTimeStamp) to Date using date. formate.
MilesDiff is 0 in first row but in second row it is a dot . for HP0001 but for next machine HP0002 it is something t - (t-2), i explained that earlier.
If next lines is your code ?
data want;
set have;
by MachCode;
DateDiff = lag( Date ); /* this does not calculate the difference */
MilesDiff = dif( MileageValue ); /* even not this */
run;
than you have not used code as I post:
data want;
set have;
by machCode; /* assumed data is sorted by machine date */
if first.machCode then do; DaysDiff=0; MilesDiff=0; end; /* 1st row of a machine */
else do;
daysdiff = date - lag(date); /* calculating the difference */
milesdiff = miles - lag(miles);
end;
run;
I am sorry confusing you.
I used your code, excatly
data want;
set have;
by machCode; /* assumed data is sorted by machine date */
if first.machCode then do; DaysDiff=0; MilesDiff=0; end; /* 1st row of a machine */
else do;
daysdiff = date - lag(date); /* calculating the difference */
milesdiff = miles - lag(miles);
end;
run;
And what I got result is as:
Machine | Date | Miles | DateDiff | MilesDiff |
HPUS0001 | 10DEC07 | 1885462.00000 | 0 | 0 |
HPUS0001 | 12DEC07 | 1885462.00000 | . | . |
HPUS0001 | 30APR09 | 1885462.00000 | 505 | 0 |
HPUS0001 | 15JUL09 | 1886577.00000 | 76 | 1115 |
HPUS0001 | 16JUL09 | 1887667.00000 | 1 | 1090 |
HPUS0001 | 17JUL09 | 1889181.00000 | 1 | 1514 |
HPUS0001 | 17JUL09 | 1888825.00000 | 0 | -356 |
.
.
.
.
.
Machine | Date | Miles | DateDiff | MilesDiff |
HP0001 | 01OCT14 | 2898108.00000 | 1 | 1059 |
HP0001 | 02OCT14 | 2899148.00000 | 1 | 1040 |
HP0001 | 03OCT14 | 2900334.00000 | 1 | 1186 |
HP0002 | 17JAN08 | 926384.00000 | 0 | 0 |
HP0002 | 18JAN08 | 926384.00000 | -2450 | -1973950 |
HP0002 | 28APR09 | 1237332.00000 | 466 | 310948 |
HP0002 | 29APR09 | 1238599.00000 | 1 | 1267 |
when machine changed, lag is taken as t - (t-2) and not as t - (t-1)
NOTE: Please ignore the data in the first three columns as they are different from the data posted in the original quesiton. I am concentrating only on results.
Thanks very much.
For the lag() function to always have the value of the preceding observation, it must not be called conditionally, as it only puts data into its FIFO chain when it is called!
data want;
set have;
by machCode; /* assumed data is sorted by machine date */
daysdiff = date - lag(date); /* calculating the difference */
milesdiff = miles - lag(miles);
if first.machCode /* 1st row of a machine */
then do;
DaysDiff=0;
MilesDiff=0;
end;
run;
If you don't want the NOTE about missing values created in the first iteration, use 2 additional variables to hold the pure value of the lag() calls temporarily and drop them.
For HP0001, 2rd line: (03OCT14) - (02OCT14) = 1 = one day difference is right.
2900334 - 2899148 = 1186 - is correct
why is in RED ?
For HP0002 - It seems that LAG function act in a different way than I toght,
maybe it is because I don't use LAG on firt.MachCode row.
Therefore I will use RETAIN instead LAG, to save data for the next observation:
data want;
set have;
by machCode; /* assumed data is sorted by machine date */
RETAIN lag_date lag_miles;
DROP lag_date lag_miles;
if first.machCode
then do; /* 1st row of a machine */
DaysDiff=0;
MilesDiff=0;
end;
else do;
daysdiff = date - lag_date; /* calculating the difference */
milesdiff = miles - lag_miles;
end;
lag_date = date;
lag_miles = miles;
run;
About LAG meaning see next blog in this forum:
https://communities.sas.com/t5/Base-SAS-Programming/Lag-Logic/m-p/309971
Just a quick query... in the last column are the absolute values. I want to sum them up group by date. e.g. on 01Aug09 total will be 1707+733
similarly on 03AUG09 the total = 2261+487+974
and so forth. how to compute TOTAL?
For absolute values, we can tweak the code like this
DaysDiff = abs(Date - lag_date); /* calculating the difference */
MilesDiff = abs(miles - lag_miles);
HPUS0001 01AUG09 1903209.00000 1 1707 1707
HPUS0001 01AUG09 1902476.00000 0 -733 733
HPUS0001 03AUG09 1904737.00000 2 2261 2261
HPUS0001 03AUG09 1905224.00000 0 487 487
HPUS0001 03AUG09 1904250.00000 0 -974 974
HPUS0001 06AUG09 1906078.00000 3 1828 1828
HPUS0001 08AUG09 1907846.00000 2 1768 1768
HPUS0001 08AUG09 1906766.00000 0 -1080 1080
HPUS0001 10AUG09 1909027.00000 2 2261 2261
HPUS0001 10AUG09 1908540.00000 0 -487 487
Adiing TOTALs colomns, using same code with some changes and addon:
data tmp1 (drop=tot_days tot_miles)
tmp2 (keep= machCode tot_days tot_miles)
;
set have;
by machCode; /* assumed data is sorted by machine date */
RETAIN lag_date lag_miles tot tot_days tot_miles;
DROP lag_date lag_miles;
if first.machCode
then do; /* 1st row of a machine */
DaysDiff=0;
MilesDiff=0;
tot_days=0;
tot_miles=0;
end;
else do;
daysdiff = date - lag_date; /* calculating the difference */
milesdiff = miles - lag_miles;
tot_days+daysdiff;
tot_miles+abs(milesdiff);
end;
lag_date = date;
lag_miles = miles;
output tmp1;
if last machCode then output tmp2;
run;
data want;
merge tmp1 tmp2;
by machCode;
run;
Thank you so much. It worked. 🙂
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.