BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
imanojkumar1
Quartz | Level 8

I have date like the one below

 

MachineDateMiles
HP000110DEC071885462.00000
HP000112DEC071885469.00000
HP000130APR091885473.00000
HP000115JUL091886577.00000
HP000116JUL091887667.00000
HP000117JUL091889181.00000
HP000117JUL091888825.00000
HP000118JUL091889621.00000
HP000120JUL091891363.00000
HP000121JUL091892398.00000
HP000122JUL091893513.00000
HP000215DEC071985462.00000
HP000217DEC071985462.00000
HP000228APR091985462.00000
HP000210JUL091986577.00000
HP000217JUL091987667.00000
HP000220JUL091989181.00000
HP000222JUL091988825.00000
HP000228JUL091989621.00000
HP000231JUL091991363.00000
HP000221AUG091992398.00000
HP000224AUG091993513.00000
HP000310NOV072085462.00000
HP000318NOV072085462.00000
HP000330FEB092085462.00000
HP000315JUN092086577.00000
HP000316JUL092087667.00000
HP000317AUG092089181.00000
HP000310SEP092088825.00000
HP000318OCT092089621.00000
HP000330OCT092091363.00000
HP000321NOV092092398.00000
HP000322DEC092093513.00000

 

 

What I want to do is to calculate difference between dates and miles something like the sown in the table below:

 

 1. first row will have DaysDiff values as zero because that is first record of data for each machine

 2. similarly MilesDiff is zero in the first row, but that it is a difference between first row and second row data.

 

MachineDateMilesDaysDiffMilesDiff
HP000110DEC071885462.0000000
HP000112DEC071885469.0000029
HP000130APR091885473.00000 4
HP000115JUL091886577.00000  
HP000116JUL091887667.00000  
HP000117JUL091889181.00000  
HP000117JUL091888825.00000  
HP000118JUL091889621.00000  
HP000120JUL091891363.00000  
HP000121JUL091892398.00000  
HP000122JUL091893513.00000  
HP000215DEC071985462.0000000
HP000217DEC071985465.000002 3
HP000228APR091985468.00000  3
HP000210JUL091986577.00000  
HP000217JUL091987667.00000  
HP000220JUL091989181.00000  
HP000222JUL091988825.00000  
HP000228JUL091989621.00000  
HP000231JUL091991363.00000  
HP000221AUG091992398.00000  
HP000224AUG091993513.00000  
HP000310NOV072085462.0000000
HP000318NOV072085465.000008 3
HP000330FEB092085469.00000  6
HP000315JUN092086577.00000  
HP000316JUL092087667.00000  
HP000317AUG092089181.00000  
HP000310SEP092088825.00000  
HP000318OCT092089621.00000  
HP000330OCT092091363.00000  
HP000321NOV092092398.00000  
HP000322DEC092093513.00000  

 

I applied lag but it did not work and also for each row when Machine is changing, I am not able to put zero. I was trying something like this. I dont know how to check each machine type and then run the rest calculations.

 

data want;
set have;
by MachCode;
   DateDiff = lag( Date );
   MilesDiff = dif( MileageValue );
run;

 

Please help I am new to SAS and learning it.

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

For HP0001, 2rd line: (03OCT14) - (02OCT14) = 1 = one day difference is right.

                                   2900334 - 2899148 = 1186   - is correct

    why is in RED

 

For HP0002 - It seems that LAG function act in a different way than I toght,

maybe it is because I don't use LAG on firt.MachCode row.

 

Therefore I will use RETAIN instead LAG, to save data for the next observation:

 

data want;

 set have;

    by machCode;   /* assumed data is sorted by machine date */

          RETAIN lag_date lag_miles;

          DROP   lag_date lag_miles; 

           if first.machCode

              then do;                                  /* 1st row of a machine */

                      DaysDiff=0;

                      MilesDiff=0;

 

              end; 

              else do;

                  daysdiff = date - lag_date;            /* calculating the difference */

                  milesdiff = miles - lag_miles;

          end;

          lag_date = date;

          lag_miles = miles;

run;

View solution in original post

12 REPLIES 12
Shmuel
Garnet | Level 18

Use LAG function to get a variable value from previous row.

According to your post you want to do it per machine:

 

data want;

 set have;

    by machine;   /* assumed data is sorted by machine date */

           if first.machine then do; DaysDiff=0; MilesDiff=0; end;

           else do;

                  daysdiff = date - lag(date);

                  milesdiff = miles - lag(miles);

          end;

run;

 

imanojkumar1
Quartz | Level 8

Hello Shmuel,

 

Thanks for your efforts. Unfortunately, there is an error. In the second row, it is putting a dot (.) value, i think for NULL. and from third row it is calculating. I tried with _N_ = 1 as well, but whenever Machine is changing, both methods fail. :(''

 

This is what your code is doing, when Machine changes form one to another.. It is not taking last recort for lag but second last row of the HP001 (i.e.  instead t - (t-1) it is calculating t - (t-2) 😞

 

MachineDateMilesDateDiffMilesDiff
HP000101OCT142898108.0000011059
HP000102OCT142899148.0000011040
HP000103OCT142900334.0000011186
HP000217JAN08926384.0000000
HP000218JAN08926384.00000-2450-1973950
HP000228APR091237332.00000466310948
HP000229APR091238599.0000011267

 

Regards,

Manu

Shmuel
Garnet | Level 18

Are dates converted to SAS dates using informat date7. ?

Is MilesDiff on second row is missing value too ?

imanojkumar1
Quartz | Level 8

I changed datepart(DateTimeStamp) to Date using date. formate.

MilesDiff is 0 in first row but in second row it is a dot . for HP0001 but for next machine HP0002 it is something t - (t-2), i explained that earlier.

 

 

Shmuel
Garnet | Level 18

If next lines is your code ?

    data want;
      set have;
      by MachCode;
           DateDiff = lag( Date );                    /* this does not calculate the difference */
          MilesDiff = dif( MileageValue );                   /* even not this */
run;

 

than you have not used code as I post:

  

data want;

 set have;

    by machCode;   /* assumed data is sorted by machine date */

           if first.machCode then do; DaysDiff=0; MilesDiff=0; end;  /* 1st row of a machine */

           else do;

                  daysdiff = date - lag(date);            /* calculating the difference */

                  milesdiff = miles - lag(miles);

          end;

run;

imanojkumar1
Quartz | Level 8

I am sorry confusing you.

 

I used your code, excatly

 

data want;

 set have;

    by machCode;   /* assumed data is sorted by machine date */

           if first.machCode then do; DaysDiff=0; MilesDiff=0; end;  /* 1st row of a machine */

           else do;

                  daysdiff = date - lag(date);            /* calculating the difference */

                  milesdiff = miles - lag(miles);

          end;

run;

 

 

And what I got result is as:

 

 

MachineDateMilesDateDiffMilesDiff
HPUS000110DEC071885462.0000000
HPUS000112DEC071885462.00000..
HPUS000130APR091885462.000005050
HPUS000115JUL091886577.00000761115
HPUS000116JUL091887667.0000011090
HPUS000117JUL091889181.0000011514
HPUS000117JUL091888825.000000-356

.

.

.

.

.

MachineDateMilesDateDiffMilesDiff
HP000101OCT142898108.0000011059
HP000102OCT142899148.0000011040
HP000103OCT142900334.0000011186
HP000217JAN08926384.0000000
HP000218JAN08926384.00000-2450-1973950
HP000228APR091237332.00000466310948
HP000229APR091238599.0000011267

 

when machine changed, lag is taken as t - (t-2) and not as t - (t-1)

 

NOTE: Please ignore the data in the first three columns as they are different from the data posted in the original quesiton. I am concentrating only on results.

 

Thanks very much.

Kurt_Bremser
Super User

For the lag() function to always have the value of the preceding observation, it must not be called conditionally, as it only puts data into its FIFO chain when it is called!

data want;
set have;
by machCode;   /* assumed data is sorted by machine date */
daysdiff = date - lag(date);            /* calculating the difference */
milesdiff = miles - lag(miles);
if first.machCode   /* 1st row of a machine */
then do;
  DaysDiff=0;
  MilesDiff=0;
end;
run;

If you don't want the NOTE about missing values created in the first iteration, use 2 additional variables to hold the pure value of the lag() calls temporarily and drop them.

Shmuel
Garnet | Level 18

For HP0001, 2rd line: (03OCT14) - (02OCT14) = 1 = one day difference is right.

                                   2900334 - 2899148 = 1186   - is correct

    why is in RED

 

For HP0002 - It seems that LAG function act in a different way than I toght,

maybe it is because I don't use LAG on firt.MachCode row.

 

Therefore I will use RETAIN instead LAG, to save data for the next observation:

 

data want;

 set have;

    by machCode;   /* assumed data is sorted by machine date */

          RETAIN lag_date lag_miles;

          DROP   lag_date lag_miles; 

           if first.machCode

              then do;                                  /* 1st row of a machine */

                      DaysDiff=0;

                      MilesDiff=0;

 

              end; 

              else do;

                  daysdiff = date - lag_date;            /* calculating the difference */

                  milesdiff = miles - lag_miles;

          end;

          lag_date = date;

          lag_miles = miles;

run;

imanojkumar1
Quartz | Level 8

Just a quick query... in the last column are the absolute values. I want to sum them up group by date. e.g. on 01Aug09 total will be 1707+733
similarly on 03AUG09 the total = 2261+487+974
and so forth. how to compute TOTAL?

For absolute values, we can tweak the code like this
DaysDiff = abs(Date - lag_date); /* calculating the difference */
MilesDiff = abs(miles - lag_miles);


HPUS0001 01AUG09 1903209.00000 1 1707 1707
HPUS0001 01AUG09 1902476.00000 0 -733 733
HPUS0001 03AUG09 1904737.00000 2 2261 2261
HPUS0001 03AUG09 1905224.00000 0 487 487
HPUS0001 03AUG09 1904250.00000 0 -974 974
HPUS0001 06AUG09 1906078.00000 3 1828 1828
HPUS0001 08AUG09 1907846.00000 2 1768 1768
HPUS0001 08AUG09 1906766.00000 0 -1080 1080
HPUS0001 10AUG09 1909027.00000 2 2261 2261
HPUS0001 10AUG09 1908540.00000 0 -487 487

Shmuel
Garnet | Level 18

Adiing  TOTALs colomns, using same code with some changes and addon:

 

data tmp1 (drop=tot_days tot_miles)

        tmp2 (keep= machCode tot_days tot_miles)

    ;

 set have;

    by machCode;   /* assumed data is sorted by machine date */

          RETAIN lag_date lag_miles tot  tot_days tot_miles;

          DROP   lag_date lag_miles; 

           if first.machCode

              then do;                                  /* 1st row of a machine */

                      DaysDiff=0;

                      MilesDiff=0;

                      tot_days=0;

                      tot_miles=0;

              end; 

              else do;

                  daysdiff = date - lag_date;            /* calculating the difference */

                  milesdiff = miles - lag_miles;

                  tot_days+daysdiff;

                  tot_miles+abs(milesdiff);

          end;

          lag_date = date;

          lag_miles = miles;

          output tmp1;

         if last machCode then output tmp2;

run;

 

data want;

  merge tmp1 tmp2;

       by machCode;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1409 views
  • 3 likes
  • 3 in conversation