I have a dataset in which I am trying to bring rows into columns from multiple rows eg:-
date | Time | AVD_ID | device_NO | device__No_id | ID | Check_In | Check_out | Route | Stop_name | Amount |
01/03/2019 | 5:06:00 PM | B1910_EVD_04 | 166450 | 23442 | 602803314916 : | 1 | 0 | 22 | Canada Water bus station | 7.5 |
01/03/2019 | 5:38:00 PM | B1910_EVD_05 | 166451 | 23443 | 602803314916 : | 0 | 1 | 22 | Tottenham Court Road station | -4.5 |
now I want to create In_time, out_time on the basis of time and bring Amount, check_in and out in one row for eg:-
date | In time | Out time | ID | Check_In | Check_out | Route | In Stop_name | Out Stop_name | In Amount | Out Amount |
01/03/2019 | 5:06:00 PM | 5:38:00 PM | 1130602803314816 : | 1 | 1 | 22 | Canada Water bus station | Tottenham Court Road station | 7.5 | -4.5 |
I have tried using partition by, lag, row_no function but it not giving me the result.
Please help me on this.
Try
data want (drop=AVD_ID device_NO device__No_id Route);
merge
have (where=(Check_In=1) drop=Check_Out rename=(Time=In_Time Stop_Name=In_Stop_Name Amount=In_Amount))
have (where=(Check_Out=1) drop=Check_In rename=(Time=Out_Time Stop_Name=Out_Stop_Name Amount=Out_Amount))
;
by Date ID;
run;
What is identified by ID? A single trip, or something like a person or travel card?
I will make some assumptions:
are these assumptions correct?
For working with sequences of data, the data step is better suited than SQL.
Then I suggest this data step solution:
data have;
infile datalines dlm='09'x;
input
date :mmddyy10.
Time :time11.
AVD_ID :$12.
device_NO
device__No_id
ID :$12.
Check_In
Check_out
Route
Stop_name :$30.
Amount
;
dt = dhms(date,0,0,time);
format dt e8601dt19.;
datalines;
01/03/2019 5:06:00 PM B1910_EVD_04 166450 23442 602803314916: 1 0 22 Canada Water bus station 7.5
01/03/2019 5:38:00 PM B1910_EVD_05 166451 23443 602803314916: 0 1 22 Tottenham Court Road station -4.5
;
proc sort data=have;
by id dt;
run;
data trips;
retain /* for column order also */
id
in_time
out_time
check_in
check_out
in_stop_name
out_stop_name
in_amount
out_amount
;
format
in_time
out_time e8601dt19.
;
set have (rename=(
check_in=_check_in
amount=out_amount
dt=out_time
stop_name=out_stop_name
));
by id route notsorted;
if first.route
then do;
in_time = out_time;
check_in = _check_in;
in_stop_name = out_stop_name;
in_amount = out_amount;
end;
if last.route;
keep
id
in_time
out_time
check_in
check_out
in_stop_name
out_stop_name
in_amount
out_amount
;
run;
By using datetimes instead of separate dates and times, this code will also deal correctly with trips crossing the day boundary.
Try
data want (drop=AVD_ID device_NO device__No_id Route);
merge
have (where=(Check_In=1) drop=Check_Out rename=(Time=In_Time Stop_Name=In_Stop_Name Amount=In_Amount))
have (where=(Check_Out=1) drop=Check_In rename=(Time=Out_Time Stop_Name=Out_Stop_Name Amount=Out_Amount))
;
by Date ID;
run;
Yes, but if your data are already sorted by DATE/ID, the DATA step solution provided by @Kurt_Bremser will be MUCH faster, since unlike SQL, it won't be obligated to do a Cartesian comparison of all checkin ID's against all checkout ID's while attempting to find all matches.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.