Solved: Re: How to bring all data in one row

umeshgiri48 · Posted 06-01-2020 12:46 AM

I have a dataset in which I am trying to bring rows into columns from multiple rows eg:-

date	Time	AVD_ID	device_NO	device__No_id	ID	Check_In	Check_out	Route	Stop_name	Amount
01/03/2019	5:06:00 PM	B1910_EVD_04	166450	23442	602803314916 :	1	0	22	Canada Water bus station	7.5
01/03/2019	5:38:00 PM	B1910_EVD_05	166451	23443	602803314916 :	0	1	22	Tottenham Court Road station	-4.5

now I want to create In_time, out_time on the basis of time and bring Amount, check_in and out in one row for eg:-

date	In time	Out time	ID	Check_In	Check_out	Route	In Stop_name	Out Stop_name	In Amount	Out Amount
01/03/2019	5:06:00 PM	5:38:00 PM	1130602803314816 :	1	1	22	Canada Water bus station	Tottenham Court Road station	7.5	-4.5

I have tried using partition by, lag, row_no function but it not giving me the result.

Please help me on this.

sustagens · Posted 06-01-2020 02:34 AM

Try

data want (drop=AVD_ID device_NO device__No_id Route);
merge 
have (where=(Check_In=1) drop=Check_Out rename=(Time=In_Time Stop_Name=In_Stop_Name Amount=In_Amount)) 
have (where=(Check_Out=1) drop=Check_In rename=(Time=Out_Time Stop_Name=Out_Stop_Name Amount=Out_Amount))
;
by Date ID;
run;

View solution in original post

Kurt_Bremser · Posted 06-01-2020 02:22 AM

What is identified by ID? A single trip, or something like a person or travel card?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

umeshgiri48 · Posted 06-01-2020 02:23 AM

A person travel card

Kurt_Bremser · Posted 06-01-2020 06:59 AM

I will make some assumptions:

a "trip" can be identified by id and route
it begins with a check_in and ends with a check_out
there are always two observations (1 check_in and 1 check_out) for a trip
check_in happens before check_out
there cannot be overlapping time spans for two trips for a single id on the same route

are these assumptions correct?

For working with sequences of data, the data step is better suited than SQL.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

umeshgiri48 · Posted 06-01-2020 10:29 AM

yes, all assumptions are correct

Kurt_Bremser · Posted 06-03-2020 08:16 AM

Then I suggest this data step solution:

data have;
infile datalines dlm='09'x;
input
  date :mmddyy10.
  Time :time11.
  AVD_ID :$12.
  device_NO
  device__No_id
  ID :$12.
  Check_In
  Check_out
  Route
  Stop_name :$30.
  Amount
;
dt = dhms(date,0,0,time);
format dt e8601dt19.;
datalines;
01/03/2019	5:06:00 PM	B1910_EVD_04	166450	23442	602803314916:	1	0	22	Canada Water bus station	7.5
01/03/2019	5:38:00 PM	B1910_EVD_05	166451	23443	602803314916:	0	1	22	Tottenham Court Road station	-4.5
;

proc sort data=have;
by id dt;
run;

data trips;
retain /* for column order also */
  id
  in_time
  out_time
  check_in
  check_out
  in_stop_name
  out_stop_name
  in_amount
  out_amount
;
format
  in_time
  out_time e8601dt19.
;
set have (rename=(
  check_in=_check_in
  amount=out_amount
  dt=out_time
  stop_name=out_stop_name
));
by id route notsorted;
if first.route
then do;
  in_time = out_time;
  check_in = _check_in;
  in_stop_name = out_stop_name;
  in_amount = out_amount;
end;
if last.route;
keep
  id
  in_time
  out_time
  check_in
  check_out
  in_stop_name
  out_stop_name
  in_amount
  out_amount
;
run;

By using datetimes instead of separate dates and times, this code will also deal correctly with trips crossing the day boundary.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

sustagens · Posted 06-01-2020 02:34 AM

Try

data want (drop=AVD_ID device_NO device__No_id Route);
merge 
have (where=(Check_In=1) drop=Check_Out rename=(Time=In_Time Stop_Name=In_Stop_Name Amount=In_Amount)) 
have (where=(Check_Out=1) drop=Check_In rename=(Time=Out_Time Stop_Name=Out_Stop_Name Amount=Out_Amount))
;
by Date ID;
run;

umeshgiri48 · Posted 06-01-2020 03:22 AM

can we also do it by using proc sql

mkeintz · Posted 06-01-2020 11:06 PM

Yes, but if your data are already sorted by DATE/ID, the DATA step solution provided by @Kurt_Bremser will be MUCH faster, since unlike SQL, it won't be obligated to do a Cartesian comparison of all checkin ID's against all checkout ID's while attempting to find all matches.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Registration is open

SAS Training: Just a Click Away