I'm looking for a way to create a dataset from observations/records that are time-based. The example below shows records of Data and Seconds.
data arbitrary;
input Data $ Seconds;
datalines;
AAA 111.1
BBB 112.0
CCC 115.0
DDD 117.2
EEE 118.0
FFF 120.3
GGG 121.0
HHH 123.0
JJJ 130.0
;
The goal is the create an new dataset consisting only of pairs of records separated by less than one second of time that are then concatenated into a single record. So the resulting dataset should look like:
Data1 Seconds1 Data2 Seconds2
AAA 111.1 BBB 112.0
DDD 117.2 EEE 118.0
FFF 120.3 GGG 121.0
I've been scratching my head over this for a few days without success. Thanks in advance for any advice, help, pointers anyone can provide.
Regards,
Gene
What if there are more than two records that match the requirement exactly for the time difference?
Is one supposed to be larger than the other?
This duplicates your example but with more records/values is likely not to be a complete answer.
data have; input Data $ Seconds; datalines; AAA 111.1 BBB 112.0 CCC 115.0 DDD 117.2 EEE 118.0 FFF 120.3 GGG 121.0 HHH 123.0 JJJ 130.0 ; proc sql; create table want as select a.data as data1, a.seconds as seconds1 ,b.data as data2, b.seconds as seconds2 from have as a, have as b where 0 le b.seconds - a.seconds lt 1 and a.data ne b.data ; quit;
Proc SQL implements ANSI standard SQL actions such as using a reflexive Cartesian join, the From have as a, have as b which joins a data set on itself for each record and then allows use of where to filter the results. More rules would be needed and may add significant code.
What if there are more than two records that match the requirement exactly for the time difference?
Is one supposed to be larger than the other?
This duplicates your example but with more records/values is likely not to be a complete answer.
data have; input Data $ Seconds; datalines; AAA 111.1 BBB 112.0 CCC 115.0 DDD 117.2 EEE 118.0 FFF 120.3 GGG 121.0 HHH 123.0 JJJ 130.0 ; proc sql; create table want as select a.data as data1, a.seconds as seconds1 ,b.data as data2, b.seconds as seconds2 from have as a, have as b where 0 le b.seconds - a.seconds lt 1 and a.data ne b.data ; quit;
Proc SQL implements ANSI standard SQL actions such as using a reflexive Cartesian join, the From have as a, have as b which joins a data set on itself for each record and then allows use of where to filter the results. More rules would be needed and may add significant code.
Thank for this. I tried using various approaches with RETAIN and LAG() but got tripped up by the nuances of using them in conditional IF Statements. I never would have gotten to this approach. I'll test it on the target dataset and let you know how it goes.
Yes, there are instances were three (or more) consecutive records might meet the time difference criteria. I will probably want/have to deal with that eventually but thought I'd start with this simpler case. I'm interested any ideas you might have on how to deal with that more complex problem.
Thanks again for the prompt response.
Regards,
Gene
If the data2 and seconds2 of an observation can be the data1 and seconds1 of the next observation when there are more than 2 consecutive obs within a second (such as the sequence JJJ-MMM below), then the logic for creating the pairs becomes a lot simpler:
data want;
retain;
set arbitrary;
if seconds - 1 <= seconds1 then output;
data1 = data;
seconds1 = seconds;
rename data=data2 seconds=seconds2;
run;
data1 seconds1 data2 seconds2 AAA 111.1 BBB 112.0 DDD 117.2 EEE 118.0 FFF 120.3 GGG 121.0 JJJ 130.0 KKK 130.2 KKK 130.2 LLL 130.5 LLL 130.5 MMM 130.9
My sincere thanks to all who responded to my query. This is such an amazing community, so quick to help. I chose this solution among the many because it opened my eyes to the power of SQL.
Thanks to all,
Gene
Read the data only once with RETAIN + some logic:
data want;
retain;
set arbitrary;
if missing(seconds1) then do;
data1 = data;
seconds1 = seconds;
end;
else do;
if seconds - seconds1 <= 1 then do;
output;
call missing(data1, seconds1);
end;
else do;
data1 = data;
seconds1 = seconds;
end;
end;
rename data=data2 seconds=seconds2;
run;
Here is another one :
data a2;
set arbitrary;
length old_data $3;
result= catx(" ",old_data, data);
diff=seconds - lag(seconds);
if diff le 1 and _n_ gt 1 then output;
old_data = data;
retain old_data ;
drop old_data data seconds;
run;
Greetings,
Mathias.
data have; input Data $ Seconds; datalines; AAA 111.1 BBB 112.0 CCC 115.0 DDD 117.2 EEE 118.0 FFF 120.3 GGG 121.0 HHH 123.0 JJJ 130.0 ; run; proc transpose data=have; id data; run; proc print; run;
_NAME_ AAA BBB CCC DDD EEE FFF GGG HHH JJJ
Seconds 111.1 112 115 117.2 118 120.3 121 123 130
This would solve multiple occurrences of less than one second. However, if there are multiple sequences of less than one second, you will get one line with all the data that are less than one second (maybe you should set a reference point?). I enriched a little your example, just to make sure you can get all of the values.
data arbitrary;
input Data $ Seconds;
datalines;
AAa 108.0
AAA 111.1
AAB 111.2
BBB 112.0
CBC 112.1
CBD 112.2
CBE 112.7
CCC 115.0
DDD 117.2
EEE 118.0
EEF 118.1
FFF 120.3
GGG 121.0
HHH 123.0
JJJ 130.0
;;;
run;
proc sort data = work.arbitrary;
by seconds data;
run;
data work.new_arbitrary;
set work.arbitrary;
retain grp 0;
lag_seconds = lag(seconds);
if ~missing(lag_seconds) then diff_seconds = seconds - lag_seconds;
else diff_seconds = .;
if diff_seconds < 1 then grp = grp;
else grp = grp + 1;
run;
proc sort data = work.new_arbitrary;
by grp seconds data;
run;
proc transpose data = work.new_arbitrary out = work.new_data prefix = data;
by grp;
var data;
run;
proc transpose data = work.new_arbitrary out = work.new_seconds prefix = seconds;
by grp;
var seconds;
run;
data work.final (where = (~missing(data2)) drop = grp);
merge work.new_data (in = a drop = _name_) work.new_seconds (in = b drop = _name_);
by grp;
if a and b;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.