I am working on a project that involves two separate CSV files. The first data set, "Trips" has seven columns, with a trip_id, bike_id, duration, from_station_id, to_station_id, capacity and usertype. User type is the only character values, the rest are numerical. The second csv file has station_id and station_name. I am trying to identify the number of observations that have the same from_station_id and to_station_id (frequency of repeated trips).
I have tried to use proc freq, proc sort, and data DUPS, but have not been able to produce a table that shows the frequency of repeated trips. Below is the code that I have tried and I have attached the csv files with my data for reference. (DIvvyTrips is the file with from_station_id and to_station_id).
title Repeat Trips;
proc freq data = divvytrips;
tables from_station_id to_station_id;
run;
/*proc sort data = DivvyTrips nouniquekeys out=duplicates;
by from_station_id to_station_id;
run;
title "Repeated Trips";
proc print data = DivvyTrips;
run;
/*data DUPS;
set DivvyTrips;
by from_station_id to_station_id;
if from_station_id and to_station_id then list;
else output;
run;
title "Repeated Trips";
proc print data=DivvyTrips;
run;*/
Add a filter to the output data set?
title Repeat Trips;
proc freq data = divvytrips noprint;
tables from_station_id*to_station_id / list out=want (where=(count>1));
run;
proc print data=want;
run;
@camryndastrup wrote:
Thank you! How can I get that table to show only the recurring observations, and eliminate the observations that only occur once?
title Repeat Trips;
proc freq data = divvytrips;
tables from_station_id*to_station_id / list out=want;
run;
@camryndastrup wrote:
I am working on a project that involves two separate CSV files. The first data set, "Trips" has seven columns, with a trip_id, bike_id, duration, from_station_id, to_station_id, capacity and usertype. User type is the only character values, the rest are numerical. The second csv file has station_id and station_name. I am trying to identify the number of observations that have the same from_station_id and to_station_id (frequency of repeated trips).
I have tried to use proc freq, proc sort, and data DUPS, but have not been able to produce a table that shows the frequency of repeated trips. Below is the code that I have tried and I have attached the csv files with my data for reference. (DIvvyTrips is the file with from_station_id and to_station_id).
title Repeat Trips; proc freq data = divvytrips; tables from_station_id to_station_id; run; /*proc sort data = DivvyTrips nouniquekeys out=duplicates; by from_station_id to_station_id; run; title "Repeated Trips"; proc print data = DivvyTrips; run; /*data DUPS; set DivvyTrips; by from_station_id to_station_id; if from_station_id and to_station_id then list; else output; run; title "Repeated Trips"; proc print data=DivvyTrips; run;*/
Thank you! How can I get that table to show only the recurring observations, and eliminate the observations that only occur once?
Add a filter to the output data set?
title Repeat Trips;
proc freq data = divvytrips noprint;
tables from_station_id*to_station_id / list out=want (where=(count>1));
run;
proc print data=want;
run;
@camryndastrup wrote:
Thank you! How can I get that table to show only the recurring observations, and eliminate the observations that only occur once?
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.