I am working on a project that involves two separate CSV files. The first data set, "Trips" has seven columns, with a trip_id, bike_id, duration, from_station_id, to_station_id, capacity and usertype. User type is the only character values, the rest are numerical. The second csv file has station_id and station_name. I am trying to identify the number of observations that have the same from_station_id and to_station_id (frequency of repeated trips). I have tried to use proc freq, proc sort, and data DUPS, but have not been able to produce a table that shows the frequency of repeated trips. Below is the code that I have tried and I have attached the csv files with my data for reference. (DIvvyTrips is the file with from_station_id and to_station_id). title Repeat Trips;
proc freq data = divvytrips;
tables from_station_id to_station_id;
run;
/*proc sort data = DivvyTrips nouniquekeys out=duplicates;
by from_station_id to_station_id;
run;
title "Repeated Trips";
proc print data = DivvyTrips;
run;
/*data DUPS;
set DivvyTrips;
by from_station_id to_station_id;
if from_station_id and to_station_id then list;
else output;
run;
title "Repeated Trips";
proc print data=DivvyTrips;
run;*/
... View more