Contributor
Posts: 51

# Aligning data columns

I have a rather complex situation and being new to SAS, I am beating my head against the wall trying to figure out a solution. I have two datasets (controller, daq) and in each dataset there is a measurement of power. I need to align the controller data with the daq data. There is a time stamp in each dataset BUT, they didn't bother time syncing the daq with the controller so there is an indeterminate amount of time delta between the two. To further complicate matters, both systems sample the data at different rates.... and while the controller is only recording data during the test, the daq records for longer periods of time.  So for a typical test run the controller has about 1000 rows of data and the daq has 30,000 rows at different sample rates (which means that the absolute measurements will not likely match exactly).

I am trying to figure out a way to automatically align the data - i.e. figure out where the curve of the controller data most closely matches the curve of the daq data - giving us the time delta.

Keep in mind that the times do not match up (we have "heard" that the timestamps may be off sync from each other by approximately 10 minutes+). The point is that the the curve from the controller is a much shorter interval than the daq and we are trying to determine the time difference by aligning where the curve of the controller most closely matches the curve of the controller. I say curve, because initially thought of just matching the max value, but while there is only one max value from the controller data, the daq data goes on for a much longer period and the power curve crosses that value many times, so it would difficult to align the data based on just that.

My current thought is to iterate through two arrays, subtracting daqrow from controllerow ,adding up the delta's  and then shifting the index for the daq and calculating the new delta's for the curve and finding the minimum delta

set work.daqPower work.controlPower

array pwr_daq{*} daqPwr;  /* daqPwr is name of power variable in work.daqPower */

array pwr_control{*} controlPwr;  /* controllPwr is name of power variable in work.controlPower */

do idaq=1 to (30000 - 1000);

x = idaq;

tmp = 0;

do jcontrol=1 to 1000;

tmp = tmp + ABS(pwr_daq - pwr_control[jcontrol]);

x = x + 1;

end;

output;

end;

I am apparently not understanding the array documentation. Apparently I can't loop through two arrays?  I would appreciate any links to similar examples or if you have any ideas for a better approach.

Thanks,

Fred

Posts: 3,852

## Re: Aligning data columns

I don't think you will need ARRAYs at all for this problem.  Sounds more like a fuzzy join.  Can you post a sample of the power data DAT and CONTROL, and how it should look when joined.

Contributor
Posts: 22

## Re: Aligning data columns

FredGIII,

I have a somewhat simplistic/bruteforce solution. However we must choose to keep only some measurement combinations. By that I mean, we can:

1) keep all measurements from DAQ, and only those measurements from CONTROLLER that match closest

2) keep all measurements from CONTROLLER, and only those measurements from DAQ that match closest

3) keep matches that fall within a certain time difference criteria.

The solution below chose (1):

CREATE TEST DATA

data daq;
format smaple_time datetime24.3;
do times=1 to 30000;
sample+1;
sample_time=round(intnx('seconds', datetime(), 1000*RAND('uniform')) + RAND('uniform'),.01);
output;
end;
drop times;
run;

data controller;
format smaple_time datetime24.3;
do times=1 to 1000;
sample+1;
sample_time=round(intnx('seconds', datetime(), 1000*RAND('uniform')) + RAND('uniform'),.01);
output;
end;
drop times;
run;

SOLUTION

proc sql;
create table alignment as
select daq.sample as daq_sample,
daq.sample_time as daq_sample_time format = datetime24.3,
controller.sample as controller_sample,
controller.sample_time as controller_sample_time format = datetime24.3,
MIN(ABS(daq.sample_time-controller.sample_time)) as min_sample_time_diff
from daq, controller
group by daq.sample
having ABS(daq.sample_time-controller.sample_time) = min(ABS(daq.sample_time-controller.sample_time));
quit;

/* Duplicates will arrise when two or more time differences are exactly the same for a given daq_sample ID. */

proc sort data=alignment nodupkey dupout=looky out=alignment;
by daq_sample;
run;

This code will take 10-60 seconds depending on your machine. By the way, it is possible that the same CONTROLLER measurement will be matched to several DAQ measurements depending on when the measurements took place. Let me know what you think.

Hope this helps,

Huey

Discussion stats
• 2 replies
• 230 views
• 3 likes
• 3 in conversation