BookmarkSubscribeRSS Feed
ulricius
Obsidian | Level 7

Hi,

 

I'm working with disaggregate trip data (boardings, in time and space) from public transport (PT) fare cards and aim to infer alighting stops by utilising auxiliary data regarding the actual movements of PT vehicles. The raw card data are in the form [boardingID];[card ID];[boarding datetime];[route no];[stop no]. The idea is to use the boarding of the subsequent trip as candidate ("target stop") for alighting location. Thus, I have successfully introduced a variable [stop no(ID+1)] and merged this with up to 20 candidate stops for possible alightings (see variable [ToStopCandidate] in sheet "StopCandidatesTrip" in enclosed attachment, which is based on a stop-stop matrix based on network distances, generated outside of SAS).

 

My issue now is; how do I search and reference the stop itineraries of each PT line route (see sheet "RouteItineraryExcpt" in attachment) in order to identify the closest (in terms of walk distance or duration) stop to the "target stop" of that route? The script would have to search through the line itinerary of  "RouteItineraryExcpt" to find the route no stated in the StopCandidatesTrip file, and then flag the stop no that has the shortest walk distance or duration to the "target stop" among the stops serviced by this route.

 

I suspect, the most straightforward advice would be to merge the line itineraries to the trip file, but this would generate an enormous amount of redundant rows. The entire stop candidates file is more than 26 million rows already, and the route itinerary file is more than one million rows. Is there a more efficient way in SAS to reference between this kind of huge datasets?

 

I'll be happy to clarify if there are ambiguities in my description above! I use SAS Enterprise Guide 7.15 HF8 (7.100.5.6214) (32 bit).

29 REPLIES 29
tomrvincent
Rhodochrosite | Level 12
Could you post an example of what you want the outcome to look like and what data would be used for that example?
ulricius
Obsidian | Level 7

Sure! Something like this is how I would like the the output (column-wise):

[PT route no];[Boarding_stp];[Date];[ToStopCandidate];[Total_Mete];[Total_seco];[Flag for closest stop]

 

Where the last variable (in bold)  is added and indicates with a "1" when 1. the [ToStopCandidate] is in the itinerary of the route of [PT route no] and 2. it has the shortest distance according to [Total_Mete] or [Total_Seco].

 

/Ulrik

mkeintz
PROC Star

Could you show, in the form of a SAS data step with actual data values, both a sample of the input data, and a sample of the desired resulting data?  This will make it possible to suggest programs that can be tested before submitting them to this discussion.   Help us help you.

 

You can see how to do this by looking at the "example of how to create a data step of your data" link in the How to get fast helpful answers page.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ulricius
Obsidian | Level 7

Well, I can try...

 

Trip data (relevant variables are PTrouteNo, Date, ToStopCandidate, Total_Mete and TotalSeco):

 

 

label dev_id="dev_id" dev_log_num="dev_log_num" dev_trs_num="dev_trs_num" PTrouteNo="linje" dep_num="dep_num" zon="zon" stp="stp" dst_pnt="dst_pnt" trs_dt="trs_dt" log_rec_id="log_rec_id" crd_num="crd_num" crd_trs_num="crd_trs_num" zonnum="zonnum" From="From" 
ToStopCandidate="To" Total_Mete="Total_Mete" Total_Seco="Total_Seco" InvDist="InvDist";
datalines;
46226 1000533 332347 803 954 7001 7400096 0 14NOV2016 72 3 2 2016 8 20772.313345 7001 7400250 7400250 7400250 0 0 .
46252 4000359 214519 803 954 7001 7400235 0 17NOV2016 72 3 2 2016 8 20775.322535 7001 7400250 7400250 7400250 0 0 .
46937 834 807857 803 814 8060 7400511 0 25NOV2016 72 3 18 2016 8 20783.673241 8060 . . . . . .
46210 2000693 396089 804 75 25020 7400161 0 14NOV2016 72 27 18 2016 8 20772.680243 25020 7400315 7400315 7400315 0 0 .
46926 709 96746 804 93 25020 7400161 0 17NOV2016 72 27 18 2016 8 20775.353495 25020 7400315 7400315 7400315 0 0 .
46976 1000054 43501 804 8 25020 7400161 0 18NOV2016 72 27 19 2016 8 20776.681493 25020 7400315 7400315 7400315 0 0 .
46459 2000082 80846 804 93 25010 7400315 0 21NOV2016 72 27 19 2016 8 20779.355914 25010 7400002 7400002 7400002 0 0 .
46999 2000482 290220 804 93 25020 7400161 0 22NOV2016 72 27 19 2016 8 20780.352755 25020 7400315 7400315 7400315 0 0 .
46193 1000028 29242 804 775 1030 7400110 0 25NOV2016 72 27 31 2016 8 20783.354167 1030 . . . . . .
46232 3000268 169361 817 2 2050 7401547 0 14NOV2016 72 59 31 2016 8 20772.699931 2050 7400003 7400003 7415745 928.47124344 668.4992945 0.0010770393
46232 3000268 169361 817 2 2050 7401547 0 14NOV2016 72 59 31 2016 8 20772.699931 2050 7400003 7400003 7413039 1330.3410244 957.84552638 0.000751687
46232 3000268 169361 817 2 2050 7401547 0 14NOV2016 72 59 31 2016 8 20772.699931 2050 7400003 7400003 7415816 1257.7971129 905.61392546 0.0007950408

 

Line itinerary for (two services of) line 802 (refer to attached excel sheet in original post for context - relevant variables are Date, PTrouteNo and StopCode😞

 

 

label Date="Datum" PTrouteNo="Linjenummer" Turnummer="Turnummer" StopSequenceNo="Hpl_sekv_nr" StopName="Hallplats" HplGTFS="HplGTFS" StopCode="HplKodGTFS" Planerad_avgangstid="Planerad_avgangstid" ActualArrTime="Verklig_ankomsttid" Verklig_avgangst
id="Verklig_avgangstid" Tidshallning_avgang="Tidshallning_avgang" Tidshallning_avgang_kopia="Tidshallning_avgang_kopia" Ank_avvikelse="Ank_avvikelse" Planerad_ankomsttid="Planerad_ankomsttid" Riktn_Norrgaende="Riktn_Norrgaende" ID="ID";
datalines;
14NOV2016 802 1007 1 Lund C / 3 Lund Centralstation 7400120 4:19:00 . 4:19:00 0:00:00 0:00:00 . 4:19:00 0 0
14NOV2016 802 1007 2 Malmö C / 1b Malmö Centralstation 7400003 4:33:00 4:28:00 4:33:00 0:00:00 0:00:00 . 4:28:00 0 0
14NOV2016 802 1007 3 Malmö Triangeln / 1 Malmö Triangeln station 7401587 4:36:00 4:34:00 4:36:00 0:00:00 0:00:00 . 4:34:00 0 0
14NOV2016 802 1007 4 Malmö Hyllie / 1b Malmö Hyllie station 7401586 4:41:00 4:38:00 4:41:00 0:00:00 0:00:00 . 4:38:00 0 0
14NOV2016 802 1007 5 CPH Airport / 2 Copenhagen Airport 8600858 4:54:00 . . 0:00:00 . . 4:54:00 0 0
14NOV2016 802 1007 6 Tårnby / 1 Tårnby 8600857 4:58:00 . . 0:00:00 . . 4:58:00 0 0
14NOV2016 802 1007 7 Ørestad / 1 Örestad 8600856 5:00:00 . . 0:00:00 . . 5:00:00 0 0
14NOV2016 802 1007 8 København H / 1 Köbenhavn H 8600626 5:12:00 . . 0:00:00 . . 5:12:00 0 0
14NOV2016 802 1007 9 København Nørreport / 1 Nörreport 8600646 5:15:00 . . 0:00:00 . . 5:15:00 0 0
14NOV2016 802 1007 10 København Østerport / 1 Österport 8600650 5:19:00 . . 0:00:00 . . 5:19:00 0 0
14NOV2016 802 1007 11 Hellerup / 1 Hellerup 8600655 5:24:00 . . 0:00:00 . . 5:24:00 0 0
14NOV2016 802 1007 12 Kokkedal / 1 Kokkedal 8600664 5:38:00 . . 0:00:00 . . 5:38:00 0 0
14NOV2016 802 1007 13 Nivå / 1 Nivå 8600665 5:42:00 . . 0:00:00 . . 5:42:00 0 0
14NOV2016 802 1007 14 Humlebæk / 1 Humlebaek 8600666 5:46:00 . . 0:00:00 . . 5:46:00 0 0
14NOV2016 802 1007 15 Espergærde / 1 Espergaerde 8600667 5:50:00 . . 0:00:00 . . 5:50:00 0 0
14NOV2016 802 1007 16 Snekkersten / 1 Snekkersten 8600668 5:54:00 . . 0:00:00 . . 5:54:00 0 0
14NOV2016 802 1007 17 Helsingør station / 1 Helsingör 8600669 5:54:00 . . 0:00:00 . . 5:54:00 0 0
14NOV2016 802 1010 1 CPH Airport / 1 Copenhagen Airport 8600858 5:46:00 . . 0:00:00 . . 5:46:00 1 0
14NOV2016 802 1010 2 Malmö Hyllie / 4b Malmö Hyllie station 7401586 6:08:00 5:59:00 6:08:00 0:00:00 0:00:00 . 5:59:00 1 0
14NOV2016 802 1010 3 Malmö Triangeln / 2 Malmö Triangeln station 7401587 6:11:00 6:10:00 6:12:00 0:01:00 0:01:00 . 6:10:00 1 0
14NOV2016 802 1010 4 Malmö C / 4a Malmö Centralstation 7400003 6:29:00 6:14:00 6:30:00 0:01:00 0:01:00 . 6:14:00 1 0
14NOV2016 802 1010 5 Lund C / 4 Lund Centralstation 7400120 6:41:00 6:38:00 6:41:00 0:00:00 0:00:00 . 6:38:00 1 0

(i'm sorry I have to include such large chunks of data, but otherwise it is hard to get the context!)

 

I hope this helps!

 

 

Patrick
Opal | Level 21

@ulricius 

The reason "we" are asking for sample data provided via a working SAS data step: It allows "us" to spend the time solving your problem instead of spending time preparing the data. Soo... could you please post the sample data in the form of a fully working SAS data step? There are still bits missing in what you've posted.

ulricius
Obsidian | Level 7

Dear Patrick,

I have no doubt that you are sincerely trying to help me out, but your reply gives me no hint to what is actually missing in what I have provided to you. As indicated in my profile, I'm a novice at SAS, and I am neither a native English speaker, so please forgive any flaws in what I have provided to you so far. I have a script in MatLab code, though, which does something similar to what I would like SAS to do for me (searching through line itineraries). Do you think it would simplify your efforts to assist me if I provided you with this code?

 

/Ulrik

tomrvincent
Rhodochrosite | Level 12
What I was looking for was a single example of the relevant source records and what the desired results are.
ulricius
Obsidian | Level 7

OK, let me provide you with an example (with example data - I reduce the number of columns to the ones relevant to the query): I want to get the exact, or at least the closest stop (or the stop candidate "To" to target stop ID - "stp_t" - 7400003 with the shortest Total_Mete or Total_seco) for the trip made by card id no 666, along line route ("linje") 817, according to the trip table below:

 

data PossibleTargetStops7400003;
infile datalines truncover;
input linje 4. dep_num 4. stp trs_dt DATE9. crd_num:32. time:TIME. stp_t:32. To:32. Total_Mete:32. Total_Seco:32.;
format trs_dt DDMMYYC10. time TIME.;
label linje="linje" dep_num="dep_num" stp="stp" trs_dt="trs_dt" crd_num="crd_num" time="time" To="To" Total_Mete="Total_Mete" Total_Seco="Total_Seco";
datalines;
817 223 7400003 23NOV2016 666 20:10:20 7423209 7423209 0 0
817 223 7400003 23NOV2016 666 20:10:20 7423209 7400120 177.94374725 128.11949792
817 223 7400003 23NOV2016 666 20:10:20 7423209 7416996 182.68941177 131.53637897
817 223 7400003 23NOV2016 666 20:10:20 7423209 7417039 279.36951404 201.14604856
817 223 7400003 23NOV2016 666 20:10:20 7423209 7416995 371.59214966 267.54635143
817 223 7400003 23NOV2016 666 20:10:20 7423209 7416547 418.3915942 301.24195019
817 223 7400003 23NOV2016 666 20:10:20 7423209 7417038 432.86920559 311.66582988
817 223 7400003 23NOV2016 666 20:10:20 7423209 7416595 474.30527436 341.49979002
817 223 7400003 23NOV2016 666 20:10:20 7423209 7416594 492.87472427 354.8697986
817 223 7400003 23NOV2016 666 20:10:20 7423209 7417297 518.44331708 373.27919083
817 223 7400003 23NOV2016 666 20:10:20 7423209 7417004 541.37789108 389.79206988
817 223 7400003 23NOV2016 666 20:10:20 7423209 7417294 566.7279597 408.04413585
817 223 7400003 23NOV2016 666 20:10:20 7423209 7417104 611.07258177 439.97225975
817 223 7400003 23NOV2016 666 20:10:20 7423209 7416546 644.50533806 464.04384592
817 223 7400003 23NOV2016 666 20:10:20 7423209 7417293 670.00833066 482.40599831
817 223 7400003 23NOV2016 666 20:10:20 7423209 7417044 684.10199126 492.55343059
817 223 7400003 23NOV2016 666 20:10:20 7423209 7417298 706.62704179 508.77146768
817 223 7400003 23NOV2016 666 20:10:20 7423209 7416981 872.20038294 627.98426415
817 223 7400003 23NOV2016 666 20:10:20 7423209 7417109 885.47051119 637.53876562
817 223 7400003 23NOV2016 666 20:10:20 7423209 7417300 912.68055691 657.12999389
;;;;;;;;;
run;

 

The possible candidate stops for line route 817 are provided in the column "HplKodGTFS" in the line route itinerary below:

 

Data Punkt817;
infile datalines truncover;
input Datum:DATE9. Linjenummer:32. Hpl_sekv_nr:32. HplKodGTFS:32. Verklig_avgangstid:TIME8. Verklig_ankomsttid:TIME8. Riktn_Norrgaende:32. ID:32.;
format Datum DDMMYYC10. Verklig_avgangstid TIME. Verklig_ankomsttid TIME.;
datalines;
14NOV2016 817 1 7400088 5:09:00 . 1 0
14NOV2016 817 2 7401614 5:19:00 5:17:00 1 0
14NOV2016 817 3 7401615 . . 1 0
14NOV2016 817 4 7401546 5:32:00 5:31:00 1 0
14NOV2016 817 5 7401586 5:36:00 5:34:00 1 0
14NOV2016 817 6 7401587 5:40:00 5:38:00 1 0
14NOV2016 817 7 7400003 5:44:00 5:41:00 1 0
14NOV2016 817 8 7400120 5:56:00 5:55:00 1 0
14NOV2016 817 9 7400941 6:00:00 5:58:00 1 0
14NOV2016 817 10 7400945 6:05:00 6:03:00 1 0
14NOV2016 817 11 7400939 6:09:00 6:08:00 1 0
14NOV2016 817 12 7401547 6:14:00 6:13:00 1 0
14NOV2016 817 13 7401554 6:20:00 6:17:00 1 0
14NOV2016 817 14 7401558 6:24:00 6:23:00 1 0
14NOV2016 817 15 7401557 6:28:00 6:26:00 1 0
14NOV2016 817 16 7401270 6:33:00 6:31:00 1 0
14NOV2016 817 17 7400044 . 6:37:00 1 0
;;;;;;;;;
run;

So, I would like the code to flag the stop ID 7400120 (identified in Table #2 for line route no 817, see second line in Table #3 below) in the "To" column because it is the "HplKodGTFS" stop of Table #2 with the shortest distance (Total_mete) (by foot) to "stp_t" stop #7423209 according to the Table #1 above (whereas "stp_t" stop 7423209 itself is not found in Table #2).

 

The result table should include the closest (i e minimum of Total_mete or Total_seco) matching "To" stop ID along with the column values Total_mete, Total_seco from the trip table and ActualArrivalTime ("Verklig_ankomsttid") from the line route itinerary table for this stop ID. The latter in order to be able to calculate travel times.

 

Something like this is what I'm after:

 

data want;
infile datalines delimiter='09'x truncover;
input linje:32. stp:32. trs_dt:DATE9. time:TIME. crd_num:3. stp_t:32. To:32. Total_Mete:32. Total_Seco:32. FlagForClosestStopCandidate:2.;
format trs_dt DDMMYYC10. time TIME.;
datalines;
817 7400003 23nov16 20:10:20 666 7423209 7423209 0 0 0
817 7400003 23nov16 20:10:20 666 7423209 7400120 177.94374725 128.11949792 1
817 7400003 23nov16 20:10:20 666 7423209 7416996 182.68941177 131.53637897 0
817 7400003 23nov16 20:10:20 666 7423209 7417039 279.36951404 201.14604856 0
817 7400003 23nov16 20:10:20 666 7423209 7416995 371.59214966 267.54635143 0
817 7400003 23nov16 20:10:20 666 7423209 7416547 418.3915942 301.24195019 0
817 7400003 23nov16 20:10:20 666 7423209 7417038 432.86920559 311.66582988 0
817 7400003 23nov16 20:10:20 666 7423209 7416595 474.30527436 341.49979002 0
817 7400003 23nov16 20:10:20 666 7423209 7416594 492.87472427 354.8697986 0
817 7400003 23nov16 20:10:20 666 7423209 7417297 518.44331708 373.27919083 0
817 7400003 23nov16 20:10:20 666 7423209 7417004 541.37789108 389.79206988 0
817 7400003 23nov16 20:10:20 666 7423209 7417294 566.7279597 408.04413585 0
817 7400003 23nov16 20:10:20 666 7423209 7417104 611.07258177 439.97225975 0
817 7400003 23nov16 20:10:20 666 7423209 7416546 644.50533806 464.04384592 0
817 7400003 23nov16 20:10:20 666 7423209 7417293 670.00833066 482.40599831 0
817 7400003 23nov16 20:10:20 666 7423209 7417044 684.10199126 492.55343059 0
817 7400003 23nov16 20:10:20 666 7423209 7417298 706.62704179 508.77146768 0
817 7400003 23nov16 20:10:20 666 7423209 7416981 872.20038294 627.98426415 0
817 7400003 23nov16 20:10:20 666 7423209 7417109 885.47051119 637.53876562 0
817 7400003 23nov16 20:10:20 666 7423209 7417300 912.68055691 657.12999389 0
;;;;;;;;;
run;

Is there enough material now for you to help me with the coding?

 

 

 

ulricius
Obsidian | Level 7

Now that I have updated the latest post with records that should enable you to follow my line of thought, but still no reply from you, @tomrvincent, @Patrick - is there still something missing or is the description still simply too messy? I would appreciate any feedback to my latest post!

andreas_lds
Jade | Level 19

@ulricius wrote:

Now that I have updated the latest post with records that should enable you to follow my line of thought, but still no reply from you, @tomrvincent, @Patrick - is there still something missing or is the description still simply too messy? I would appreciate any feedback to my latest post!


The reason for absence of replies could be that you still don't post working data steps. Hint: a data step begins with

data datasetname;

 

 

ulricius
Obsidian | Level 7

Hi,

 

OK, so now I have added dataset code to the datalines, by editing message #9 above. Is this enough or is there still missing pieces of information you need? Please forgive me for coding errors - as I mentioned, I am a novice to SAS (and I am not a particularly experienced programmer either).

 

/Ulrik

andreas_lds
Jade | Level 19

Have you actually executed the data steps you have posted? The don't seem to be error-free.

ulricius
Obsidian | Level 7

I'm sorry, but I seem to be lost in translation here.

1. I'm not sure what the purpose of the datastep code is. What is the desired result? I have run code in order to produce the sample datasets, whose datalines I have copied from the log file and pasted into this conversation (as I was tipped off in the thread suggesting code that produce datalines from data files, https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...) - both for input and desired output data. How should I have done it differently to enable you to help me? Please, be specific and explain in detail (coding examples, please)!

2. I am not sure how this code should be formulated. I have looked at similar conversations in this forum, and they present the issues in a similar fashion to how I have done it. Is my issue different in any way, motivating a different data/problem description? In that case, please be specific in how I should formulate the problem, since I find few similar questions/conversations in this forum to learn from!

 

Thank you for your patience!

 

Patrick
Opal | Level 21

@ulricius wrote:

I'm sorry, but I seem to be lost in translation here.

1. I'm not sure what the purpose of the datastep code is. What is the desired result? I have run code in order to produce the sample datasets, whose datalines I have copied from the log file and pasted into this conversation (as I was tipped off in the thread suggesting code that produce datalines from data files, https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...) - both for input and desired output data. How should I have done it differently to enable you to help me? Please, be specific and explain in detail (coding examples, please)!

2. I am not sure how this code should be formulated. I have looked at similar conversations in this forum, and they present the issues in a similar fashion to how I have done it. Is my issue different in any way, motivating a different data/problem description? In that case, please be specific in how I should formulate the problem, since I find few similar questions/conversations in this forum to learn from!

 

Thank you for your patience!

 


"1. I'm not sure what the purpose of the datastep code is."

To provide us with sample source data so we have something to work with and can actually test the code we then post as a solution to your problem.

 

"What is the desired result?"

Instead of only describing in words what you want to do with the data you also provide us with the actual data or report which you want as a result of the logic to be implemented. And it should be the data/report which uses the sample source data.

Posting such data/reports helps a lot in clarifying what you describe and it also allows us to test the code we propose (=it needs to create the data/report you post as desired result).

 

"2. I am not sure how this code should be formulated."

The way you've posted it now is perfect BUT you should also test the code you post. What you've posted doesn't fully work so we would now first have to spend time fixing your stuff instead of dealing with the actual question.  ...and that's where I often loose motivation to spend any further time.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 29 replies
  • 1802 views
  • 5 likes
  • 5 in conversation