BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
YoLohse
Fluorite | Level 6

Hi

 

I have a panel data set where I want to make 2 new variables. The data set is of people who run 5k runs. I want a variable that specifies how many days it has been since their last run, and one that specifies how much they have improved. I have made an overview in Excel of what I would like 

Capture.PNG

Days_Between_Run is runDate_t - runDate_t-1 and the same goes for improvement which is just time_t - time_t-1

 

Hope you guys can help me and please let me know if i need to elaborate

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26
data DataIM;
    set DataAW;
    prevrunnerid=lag(runnerid);
    prevruntime=lag(runtime);
    prevrundate=lag(rundate);
    if prevRunnerID = RunnerID then Improvement = prevRunTime-RunTime;
    if prevRunnerID = RunnerID then DBR = RunDate-prevRunDate;
    drop prev:;
run;
--
Paige Miller

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

Use the data step LAG function to determine the days between the previous run; and also to determine the previous time.

 

Examples here:

https://documentation.sas.com/?cdcId=pgmmvacdc&cdcVersion=9.4&docsetId=lefunctionsref&docsetTarget=n...

--
Paige Miller
ballardw
Super User

@YoLohse wrote:

Hi

 

I have a panel data set where I want to make 2 new variables. The data set is of people who run 5k runs. I want a variable that specifies how many days it has been since their last run, and one that specifies how much they have improved. I have made an overview in Excel of what I would like 

Capture.PNG

Days_Between_Run is runDate_t - runDate_t-1 and the same goes for improvement which is just time_t - time_t-1

 

Hope you guys can help me and please let me know if i need to elaborate


Are your SAS variables actually date values, type is numeric and a SAS format such as ddmmyy10.? or character? Similar with the times. I strongly suspect the time ariables may not. What UNITS is your "Improvement" in? Seconds?

 

Again from your first post: Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

YoLohse
Fluorite | Level 6
data WORK.DATAWAW;
  infile datalines dsd truncover;
  input RunDate:MMDDYY10. RunnerID:BEST. Gender:$1. Age:BEST. RunTime:BEST. Placement:BEST. After_Woman:32.;
  format RunDate MMDDYY10. RunnerID BEST. Age BEST. RunTime BEST. Placement BEST.;
  label RunDate="RunDate" RunnerID="RunnerID" Gender="Gender" Age="Age" RunTime="RunTime" Placement="Placement";
datalines;
10/22/2011 198111 1 39 1102 1 0
10/22/2011 33415 1 44 1134 2 0
10/22/2011 196982 1 44 1164 3 0
10/22/2011 92330 1 24 1182 4 0
10/22/2011 33809 1 59 1188 5 0
;;;;
PaigeMiller
Diamond | Level 26

That's great, thank you for the data in this format.

 

However, no runner has a previous time. And in any event, as I said, if they did have a previous time, you would use the data step LAG function, after things are properly sorted by runner and date.

--
Paige Miller
YoLohse
Fluorite | Level 6

I see what you mean. The sample I sent you have no reoccurring runners. but in the sample below, you can see that runner '1674' has participated in 2 runs for instance. I guess i should sort it like I have done here by RunnerID and RunDate right?

 

data WORK.ALLDATA;
  infile datalines dsd truncover;
  input RunDate:MMDDYY10. RunnerID:BEST. Gender:$1. Age:BEST. RunTime:BEST. Placement:BEST.;
  format RunDate MMDDYY10. RunnerID BEST. Age BEST. RunTime BEST. Placement BEST.;
  label RunDate="RunDate" RunnerID="RunnerID" Gender="Gender" Age="Age" RunTime="RunTime" Placement="Placement";
datalines;
09/16/2017 398 1 59 1440 38
10/22/2016 1623 0 64 1897 86
05/25/2013 1674 1 54 1322 15
05/26/2018 1674 1 54 1195 16
08/11/2018 1867 1 39 1666 76
08/18/2018 1867 1 39 1529 65
10/31/2015 1876 1 29 1279 14
01/02/2016 1876 1 29 1093 1
;;;;
PaigeMiller
Diamond | Level 26

So now you can modify our code to use the LAG function and determine the deltas, but only if the runnerID is the same as the previous runnerID.

--
Paige Miller
YoLohse
Fluorite | Level 6

Hi Paige

 

I almost got it working. My problem is that my code seems to lag the variable 2 periods and not 1. When i get two observations with the same RunnerID it seems to take the current run time and subtract the runtime two obsercations above and not one. furthermore it seems that it does not catch all of the cases where the RunnerID are the same as the previous. for instance observation 3 and 4 have the same RunnerID but observation 4 has no Improvement or DBR variable.

 

My code is: 

proc sort 
data=DataAW; 
by runnerid rundate; 
run;
data DataIM;
set DataAW;
by RunnerID;
retain prevRunnerID;
if prevRunnerID = RunnerID then Improvement = lag(RunTime)-RunTime;
if prevRunnerID = RunnerID then DBR = RunDate-lag(RunDate);
prevRunnerID = RunnerID;
drop prevRunnerID;
run;

my input data is 

data WORK.DATAAW;
  infile datalines dsd truncover;
  input RunDate:MMDDYY10. RunnerID:BEST. Gender:$1. Age:BEST. RunTime:BEST. Placement:BEST. After_Woman:32.;
  format RunDate MMDDYY10. RunnerID BEST. Age BEST. RunTime BEST. Placement BEST.;
  label RunDate="RunDate" RunnerID="RunnerID" Gender="Gender" Age="Age" RunTime="RunTime" Placement="Placement";
datalines;
09/16/2017 398 1 59 1440 38 0
10/22/2016 1623 0 64 1897 86 .
05/25/2013 1674 1 54 1322 15 0
05/26/2018 1674 1 54 1195 16 0
08/11/2018 1867 1 39 1666 76 1
08/18/2018 1867 1 39 1529 65 1
10/31/2015 1876 1 29 1279 14 0
01/02/2016 1876 1 29 1093 1 0
06/10/2017 2288 1 59 1199 8 0
11/25/2017 2577 1 29 1004 1 0
;;;;

and my output data atm looks like:

data WORK.DATAIM;
  infile datalines dsd truncover;
  input RunDate:MMDDYY10. RunnerID:BEST. Gender:$1. Age:BEST. RunTime:BEST. Placement:BEST. After_Woman:32. Improvement:32. DBR:32.;
  format RunDate MMDDYY10. RunnerID BEST. Age BEST. RunTime BEST. Placement BEST.;
  label RunDate="RunDate" RunnerID="RunnerID" Gender="Gender" Age="Age" RunTime="RunTime" Placement="Placement";
datalines;
09/16/2017 398 1 59 1440 38 0 . .
10/22/2016 1623 0 64 1897 86 . . .
05/25/2013 1674 1 54 1322 15 0 . .
05/26/2018 1674 1 54 1195 16 0 . .
08/11/2018 1867 1 39 1666 76 1 . .
08/18/2018 1867 1 39 1529 65 1 -334 84
10/31/2015 1876 1 29 1279 14 0 . .
01/02/2016 1876 1 29 1093 1 0 436 -959
06/10/2017 2288 1 59 1199 8 0 . .
11/25/2017 2577 1 29 1004 1 0 . .
;;;;
PaigeMiller
Diamond | Level 26
data DataIM;
    set DataAW;
    prevrunnerid=lag(runnerid);
    prevruntime=lag(runtime);
    prevrundate=lag(rundate);
    if prevRunnerID = RunnerID then Improvement = prevRunTime-RunTime;
    if prevRunnerID = RunnerID then DBR = RunDate-prevRunDate;
    drop prev:;
run;
--
Paige Miller
YoLohse
Fluorite | Level 6
Yes both the improvement and time are in seconds

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1634 views
  • 0 likes
  • 3 in conversation