- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am working on a project that involves two separate CSV files. The first data set, "Trips" has seven columns, with a trip_id, bike_id, duration, from_station_id, to_station_id, capacity and usertype. User type is the only character values, the rest are numerical. The second csv file has station_id and station_name. The objective is to merge the files in some way that will input the station name from the second csv file into the "from" and "to" station sections in the first, based on station id. I know that this would be extremely easy in excel with an xlookup, but I am wondering the correct way to approach this in SAS.
I am using the SAS university edition (the free online one) if that makes any difference. Our code so far is as follows:
'''data DivvyTrips; infile '/home/u59304398/sasuser.v94/DivvyTrips.csv' dsd; input trip_id bikeid tripduration from_station_id to_station_id capacity usertype $; title "Trips"; run;
data DivvyStations; infile '/home/u59304398/sasuser.v94/Divvy_Stations.csv' dsd; input station_id station_name $; title "Stations"; run;'''
All this does is import the data. I do not think a merge with a sort will work because we need both from and to station names.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Don't truncate you station names to 8 characters.
Use the station table to define a format and then use the format to display the station ids as their names.
data DivvyStations;
infile 'c:\downloads\Divvy_Stations.csv' dsd firstobs=2;
input station_id station_name :$40.;
run;
data format;
fmtname='STATION';
set divvystations;
rename station_id=start station_name=label;
run;
proc format cntlin=format;
run;
proc print data=divvytrips(obs=10) width=min;
format from_station_id to_station_id station.;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
FYI - it really helps if you format your posts to be legible.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have attached the data we are using.
data DivvyTrips;
infile '/home/u59304398/sasuser.v94/DivvyTrips.csv' dsd;
input trip_id
bikeid
tripduration
from_station_id
to_station_id
capacity
usertype $;
title "Trips";
run;
data DivvyStations;
infile '/home/u59304398/sasuser.v94/Divvy_Stations.csv' dsd;
input station_id
station_name $;
title "Stations";
run;
This is what we have so far, simply to import the data into the online SAS. The files I imported did not have labels in the first row, I only added them to the data files I am uploading for clarity.
Let me know if that is enough information, and I appreciate your help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Don't truncate you station names to 8 characters.
Use the station table to define a format and then use the format to display the station ids as their names.
data DivvyStations;
infile 'c:\downloads\Divvy_Stations.csv' dsd firstobs=2;
input station_id station_name :$40.;
run;
data format;
fmtname='STATION';
set divvystations;
rename station_id=start station_name=label;
run;
proc format cntlin=format;
run;
proc print data=divvytrips(obs=10) width=min;
format from_station_id to_station_id station.;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Tom Any idea on how to get these results using the same data sets?
1. Determine the number of trips that start and end at the same station.
2. Find the average trip duration for trips that start and end at the same station.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc means data=divvytrips N MEAN;
where from_station_id = to_station_id;
var tripduration;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks that helps
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
There are several ways of doing this using a merge with data step or PROC SQL or with a hash table lookup (also in data step).
But you do not even have to join the data sets if you use the powerful concept of a SAS format.
The second csv file has station_id and station_name
Create a format that "translates" station_id into station_name like is done here:
SAS 9.4 / Viya 3.5
Base SAS Procedures Guide
FORMAT Procedure
Example 13: Creating a Format from a CNTLIN= Data Set
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/n1e19y6lrektafn1kj6nbvhus59w.htm
Just apply the newly created format to
from_station_id and to_station_id
in the first dataset.
While the internal value(s) stays the same (station_id) it is now displayed as station_name.
Cheers,
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content