BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
camryndastrup
Fluorite | Level 6
0

I am working on a project that involves two separate CSV files. The first data set, "Trips" has seven columns, with a trip_id, bike_id, duration, from_station_id, to_station_id, capacity and usertype. User type is the only character values, the rest are numerical. The second csv file has station_id and station_name. The objective is to merge the files in some way that will input the station name from the second csv file into the "from" and "to" station sections in the first, based on station id. I know that this would be extremely easy in excel with an xlookup, but I am wondering the correct way to approach this in SAS.

I am using the SAS university edition (the free online one) if that makes any difference. Our code so far is as follows:

'''data DivvyTrips; infile '/home/u59304398/sasuser.v94/DivvyTrips.csv' dsd; input trip_id bikeid tripduration from_station_id to_station_id capacity usertype $; title "Trips"; run;

data DivvyStations; infile '/home/u59304398/sasuser.v94/Divvy_Stations.csv' dsd; input station_id station_name $; title "Stations"; run;'''

All this does is import the data. I do not think a merge with a sort will work because we need both from and to station names.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Don't truncate you station names to 8 characters.

Use the station table to define a format and then use the format to display the station ids as their names.

data DivvyStations;
     infile 'c:\downloads\Divvy_Stations.csv' dsd firstobs=2;
     input station_id station_name :$40.;
run;

data format;
   fmtname='STATION';
   set divvystations;
   rename station_id=start station_name=label;
run;
proc format cntlin=format;
run;

proc print data=divvytrips(obs=10) width=min;
  format from_station_id to_station_id station.;
run;

image.png

View solution in original post

8 REPLIES 8
Reeza
Super User
Merge and sort is exactly the right answer, exactly how depends on your data. Please make a small useable example with fake data if necessary if you need help with code.

FYI - it really helps if you format your posts to be legible.
camryndastrup
Fluorite | Level 6

I have attached the data we are using. 

 

data DivvyTrips;
     infile '/home/u59304398/sasuser.v94/DivvyTrips.csv' dsd;
     input trip_id
     bikeid
     tripduration
     from_station_id
     to_station_id
     capacity
     usertype $;
title "Trips";
run;

 

data DivvyStations;
     infile '/home/u59304398/sasuser.v94/Divvy_Stations.csv' dsd;
     input station_id
     station_name $;
title "Stations";
run;

 

This is what we have so far, simply to import the data into the online SAS. The files I imported did not have labels in the first row, I only added them to the data files I am uploading for clarity. 

 

Let me know if that is enough information, and I appreciate your help!

Tom
Super User Tom
Super User

Don't truncate you station names to 8 characters.

Use the station table to define a format and then use the format to display the station ids as their names.

data DivvyStations;
     infile 'c:\downloads\Divvy_Stations.csv' dsd firstobs=2;
     input station_id station_name :$40.;
run;

data format;
   fmtname='STATION';
   set divvystations;
   rename station_id=start station_name=label;
run;
proc format cntlin=format;
run;

proc print data=divvytrips(obs=10) width=min;
  format from_station_id to_station_id station.;
run;

image.png

zoromakoni
Calcite | Level 5

@Tom Any idea on how to get these results using the same data sets?

 

1. Determine the number of trips that start and end at the same station. 
2. Find the average trip duration for trips that start and end at the same station. 

Reeza
Super User


proc means data=divvytrips N MEAN;
where from_station_id = to_station_id;
var tripduration;
run;

zoromakoni
Calcite | Level 5

Thanks that helps

sbxkoenk
SAS Super FREQ

Hello,

 

There are several ways of doing this using a merge with data step or PROC SQL or with a hash table lookup (also in data step).

 

But you do not even have to join the data sets if you use the powerful concept of a SAS format.

The second csv file has station_id and station_name

Create a format that "translates" station_id into station_name like is done here:

SAS 9.4 / Viya 3.5
Base SAS Procedures Guide
FORMAT Procedure
Example 13: Creating a Format from a CNTLIN= Data Set
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/n1e19y6lrektafn1kj6nbvhus59w.htm

 

Just apply the newly created format to 

from_station_id and to_station_id

in the first dataset.
While the internal value(s) stays the same (station_id) it is now displayed as station_name.

 

Cheers,

Koen

camryndastrup
Fluorite | Level 6
I will try that!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1039 views
  • 2 likes
  • 5 in conversation