I am working on a project that involves two separate CSV files. The first data set, "Trips" has seven columns, with a trip_id, bike_id, duration, from_station_id, to_station_id, capacity and usertype. User type is the only character values, the rest are numerical. The second csv file has station_id and station_name. I am trying to identify the number of observations that have the same from_station_id and to_station_id (frequency of repeated trips).
I have tried to use proc freq, proc sort, and data DUPS, but have not been able to produce a table that shows the frequency of repeated trips. Below is the code that I have tried and I have attached the csv files with my data for reference. (DIvvyTrips is the file with from_station_id and to_station_id).
title Repeat Trips;
proc freq data = divvytrips;
tables from_station_id to_station_id;
run;
/*proc sort data = DivvyTrips nouniquekeys out=duplicates;
by from_station_id to_station_id;
run;
title "Repeated Trips";
proc print data = DivvyTrips;
run;
/*data DUPS;
set DivvyTrips;
by from_station_id to_station_id;
if from_station_id and to_station_id then list;
else output;
run;
title "Repeated Trips";
proc print data=DivvyTrips;
run;*/
Add a filter to the output data set?
title Repeat Trips;
proc freq data = divvytrips noprint;
tables from_station_id*to_station_id / list out=want (where=(count>1));
run;
proc print data=want;
run;
@camryndastrup wrote:
Thank you! How can I get that table to show only the recurring observations, and eliminate the observations that only occur once?
title Repeat Trips;
proc freq data = divvytrips;
tables from_station_id*to_station_id / list out=want;
run;
@camryndastrup wrote:
I am working on a project that involves two separate CSV files. The first data set, "Trips" has seven columns, with a trip_id, bike_id, duration, from_station_id, to_station_id, capacity and usertype. User type is the only character values, the rest are numerical. The second csv file has station_id and station_name. I am trying to identify the number of observations that have the same from_station_id and to_station_id (frequency of repeated trips).
I have tried to use proc freq, proc sort, and data DUPS, but have not been able to produce a table that shows the frequency of repeated trips. Below is the code that I have tried and I have attached the csv files with my data for reference. (DIvvyTrips is the file with from_station_id and to_station_id).
title Repeat Trips; proc freq data = divvytrips; tables from_station_id to_station_id; run; /*proc sort data = DivvyTrips nouniquekeys out=duplicates; by from_station_id to_station_id; run; title "Repeated Trips"; proc print data = DivvyTrips; run; /*data DUPS; set DivvyTrips; by from_station_id to_station_id; if from_station_id and to_station_id then list; else output; run; title "Repeated Trips"; proc print data=DivvyTrips; run;*/
Thank you! How can I get that table to show only the recurring observations, and eliminate the observations that only occur once?
Add a filter to the output data set?
title Repeat Trips;
proc freq data = divvytrips noprint;
tables from_station_id*to_station_id / list out=want (where=(count>1));
run;
proc print data=want;
run;
@camryndastrup wrote:
Thank you! How can I get that table to show only the recurring observations, and eliminate the observations that only occur once?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.