BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
genemroz
Quartz | Level 8

I'm looking for a way to create a dataset from observations/records that are time-based. The example below shows records of Data and Seconds.
data arbitrary;
input Data $ Seconds;
datalines;
AAA 111.1
BBB 112.0
CCC 115.0
DDD 117.2
EEE 118.0
FFF 120.3
GGG 121.0
HHH 123.0
JJJ 130.0
;
The goal is the create an new dataset consisting only of pairs of records separated by less than one second of time that are then concatenated into a single record. So the resulting dataset should look like:

Data1 Seconds1 Data2 Seconds2
AAA 111.1 BBB 112.0
DDD 117.2 EEE 118.0
FFF 120.3 GGG 121.0

 

I've been scratching my head over this for a few days without success.  Thanks in advance for any advice, help, pointers anyone can provide.

 

Regards,

 

Gene

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

What if there are more than two records that match the requirement exactly for the time difference?

Is one supposed to be larger than the other?

 

This duplicates your example but with more records/values is likely not to be a complete answer.

data have;
input Data $ Seconds;
datalines;
AAA 111.1
BBB 112.0
CCC 115.0
DDD 117.2
EEE 118.0
FFF 120.3
GGG 121.0
HHH 123.0
JJJ 130.0
;

proc sql;
   create table want as
   select a.data as data1, a.seconds as seconds1 
         ,b.data as data2, b.seconds as seconds2
   from have as a,
        have as b
   where 0 le b.seconds - a.seconds lt 1
        and a.data ne b.data
   ;
quit;

Proc SQL implements ANSI standard SQL actions such as using a reflexive Cartesian join, the From have as a, have as b which joins a data set on itself for each record and then allows use of where to filter the results. More rules would be needed and may add significant code.

View solution in original post

8 REPLIES 8
ballardw
Super User

What if there are more than two records that match the requirement exactly for the time difference?

Is one supposed to be larger than the other?

 

This duplicates your example but with more records/values is likely not to be a complete answer.

data have;
input Data $ Seconds;
datalines;
AAA 111.1
BBB 112.0
CCC 115.0
DDD 117.2
EEE 118.0
FFF 120.3
GGG 121.0
HHH 123.0
JJJ 130.0
;

proc sql;
   create table want as
   select a.data as data1, a.seconds as seconds1 
         ,b.data as data2, b.seconds as seconds2
   from have as a,
        have as b
   where 0 le b.seconds - a.seconds lt 1
        and a.data ne b.data
   ;
quit;

Proc SQL implements ANSI standard SQL actions such as using a reflexive Cartesian join, the From have as a, have as b which joins a data set on itself for each record and then allows use of where to filter the results. More rules would be needed and may add significant code.

genemroz
Quartz | Level 8

Thank for this.  I tried using various approaches with RETAIN and LAG() but got tripped up by the nuances of using them in conditional IF Statements.  I never would have gotten to this approach.  I'll test it on the target dataset and let you know how it goes. 

 

Yes, there are instances were three (or more) consecutive records might meet the time difference criteria.  I will probably want/have to deal with that eventually but thought I'd start with this simpler case.  I'm interested any ideas you might have on how to deal with that more complex problem.

 

Thanks again for the prompt response.

Regards,

Gene

PGStats
Opal | Level 21

If the data2 and seconds2 of an observation can be the data1 and seconds1 of the next observation when there are more than 2 consecutive obs within a second (such as the sequence JJJ-MMM below), then the logic for creating the pairs becomes a lot simpler:

 

data want;
retain;
set arbitrary;
if seconds - 1 <= seconds1 then output;
data1 = data;
seconds1 = seconds;
rename data=data2 seconds=seconds2;
run;
data1 	seconds1 data2 	seconds2
AAA 	111.1 	BBB 	112.0
DDD 	117.2 	EEE 	118.0
FFF 	120.3 	GGG 	121.0
JJJ 	130.0 	KKK 	130.2
KKK 	130.2 	LLL 	130.5
LLL 	130.5 	MMM 	130.9
PG
genemroz
Quartz | Level 8

My sincere thanks to all who responded to my query.  This is such an amazing community, so quick to help.  I chose this solution among the many because it opened my eyes to the power of SQL.

 

Thanks to all,

 

Gene

PGStats
Opal | Level 21

Read the data only once with RETAIN + some logic:

 

data want;
retain;
set arbitrary;
if missing(seconds1) then do;
    data1 = data;
    seconds1 = seconds;
    end;
else do;
    if seconds - seconds1 <= 1 then do;
        output;
        call missing(data1, seconds1);
        end;
    else do;
        data1 = data;
        seconds1 = seconds;
        end;
    end;
rename data=data2 seconds=seconds2;
run;

 

PG
MCoopmans
SAS Employee

Here is another one :


data a2;
  set arbitrary;
  length old_data $3;
  result= catx(" ",old_data, data);
  diff=seconds - lag(seconds);
  if diff le 1 and _n_ gt 1 then output;
  old_data = data;
  retain old_data ;
 drop old_data data seconds;
run;

Greetings,

Mathias.

 

JMS
Obsidian | Level 7 JMS
Obsidian | Level 7

 

data have;                    
  input Data $ Seconds;        
  datalines;                    
AAA 111.1                    
BBB 112.0                    
CCC 115.0                    
DDD 117.2                    
EEE 118.0                    
FFF 120.3                    
GGG 121.0                    
HHH 123.0                    
JJJ 130.0                    
;                            
run;                          
proc transpose data=have;    
 id data;                    
run;                          
                             
proc print;                  
run;                          

 

 

_NAME_      AAA     BBB    CCC     DDD     EEE     FFF     GGG    HHH    JJJ  
Seconds    111.1    112    115    117.2    118    120.3    121    123    130  

alex_a
Fluorite | Level 6

This would solve multiple occurrences of less than one second. However, if there are multiple sequences of less than one second, you will get one line with all the data that are less than one second (maybe you should set a reference point?). I enriched a little your example, just to make sure you can get all of the values.

 

data arbitrary;
input Data $ Seconds;
datalines;
AAa 108.0
AAA 111.1
AAB 111.2
BBB 112.0
CBC 112.1
CBD 112.2
CBE 112.7
CCC 115.0
DDD 117.2
EEE 118.0
EEF 118.1
FFF 120.3
GGG 121.0
HHH 123.0
JJJ 130.0
;;;
run;

proc sort data = work.arbitrary;
by seconds data;
run;

data work.new_arbitrary;
set work.arbitrary;
retain grp 0;
lag_seconds = lag(seconds);
if ~missing(lag_seconds) then diff_seconds = seconds - lag_seconds;
else diff_seconds = .;
if diff_seconds < 1 then grp = grp;
else grp = grp + 1;
run;

proc sort data = work.new_arbitrary;
by grp seconds data;
run;

proc transpose data = work.new_arbitrary out = work.new_data prefix = data;
by grp;
var data;
run;

proc transpose data = work.new_arbitrary out = work.new_seconds prefix = seconds;
by grp;
var seconds;
run;

data work.final (where = (~missing(data2)) drop = grp);
merge work.new_data (in = a drop = _name_) work.new_seconds (in = b drop = _name_);
by grp;
if a and b;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 914 views
  • 0 likes
  • 6 in conversation