Reading the file sequentially. If there is a 0->1 switch, it is a potential starting point for a run. So we store it in a hash object. For each record (0 or 1): we update the data for each stored potential run: number of 1s if actual record is 1 we need to check conditions of a valid run (length>=10, 80% purity) if check is OK, we also store the end position of the (valid) run. If there is already an end position for this valid run, we can safely overwrite it (only longest run needed) When there are 5 consecutive 0s (or end of file), we can output the good runs (start pos, end pos) Postprocessing: merging overlapping runs. Following code is not tested (lot of syntax and logical errors), will test, when I have SAS. Tested. data have; input bit @@; datalines; 0 0 0 0 0 1 0 0 1 0 0 1 1 0 1 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 1 0 0 0 1 1 1 1 1 1 1 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 ; run; data periods(keep=startPos endPos); if _n_=1 then do; dcl hash runs(ordered:'y'); dcl hiter riter('runs'); runs.defineKey('startPos'); runs.defineData('startPos'); runs.defineData('num1s'); runs.defineData('endPos'); runs.defineDone(); end; set have end=eof; pos+1; by bit notsorted; if bit=1 and first.bit then do;/*0->1 switch*/ runs.add(key: pos, data: pos, data:0, data:0);/*potential run, storing it*/ end; if bit=1 then do; rc=riter.first(); do while(rc=0);/*looping through all potential runs*/ num1s=num1s+bit;/*maintaining count of 1s*/ if pos-startPos+1>=10 and num1s/(pos-startPos+1)>=0.8 then do;/*good run*/ endPos=pos;/*setting end position*/ end; runs.replace();/*storing*/ rc=riter.next(); end; numCons0=0; end; else do; numCons0+1; end; if numCons0=5 or eof then do;/*cut the runs: output good runs, delete everything*/ rc=riter.first(); do while(rc=0);/*looping through all potential runs*/ if endPos>0 then output; rc=riter.next(); end; runs.clear(); end; run; /*Post processing*/ proc sort data=periods out=periods_sorted; by endPos startPos; run; data periodsNoOverlap; set periods_sorted; by endPos; if first.endPos;/*keeping the lowest startPos*/ run; Sorry, but right now, I'm not sure, maybe not all the overlapping time intervals are removed by this procedure. There are still overlapping areas after this procedure Do you realy want to remove them? Experiment with this application, maybe you want to refine the definition of "run". According to the currend definition "maximal" runs are: 35-56, 49-80 Message was edited by: Gergely Bathó
... View more