Here is the exact problem:
I have a Data Set containing some observations (say 10).
I have to merge two observation if they are similar.
Now limitations are:
We can merge only consecutive observation and
In one step only one merge should happen on one dataset (i.e if obs# 1 and 2 are similar, and obs# 4 and 5 are similar, we can't merge them in single step), so basically as soon as one merging is done I should get out of the merging step, and again run the merging procedure on the new data set obtained by first merging.
The technique i tried to follow is: I look for rows which are more similar. Say 1 and 2 are more similar (have less "p-value") than 4 and 5. Then I merge 1 and 2 first, otherwise I merge 4 and 5 first. Problem is that some times "p-values" are similar for more than 1 pair of observations. Then I have to merge any 2 observations (1 pair) and again run the merging procedure on new dataset.
Dataset Contains following fields:
GoodCount, TotalCount, pValue
Here is the Code which I tried: (If you have a solution for my problem you can ignore the code and just tell me the solution, otherwise please tell me where i m wrong)
%macro generatePValues(InputDS);
/* some code here to regenerate p values every time we call this macro, works fine*/
%mend;
%macro merge(InputDs, minPVal);
proc sql noprint; select count(*) into:flag from &InputDS;
data tempDS; set &InputDS;
if &flag = _N_-1 then stop;
dummy1 = lag(goodCount) + goodCount;
dummy2 = lag(totalCount) + totalCount;
if (pVal-&minPVal)<.0001 then do;
goodCount = dummy1;
totalCount = dummy2;
call symput ( 'deleteRow', _N_-1);
call symput ( 'Flag', _N_);
end;
run;
Data &OutputDS; set tempDS;
if _N_ NE &deleteRow;
drop dummy1 dummy2;
run;
generatePValues(&OuputDS, AlphaLevel); /*refresh p-values*/
%mend merge;
%macro showResults(InputDS,OutputDS);
generatePValues(&InputDS);
proc sql noprint; select min(pVal) into:MinPVal from &InputDS;
proc sql noprint; select count(*) into:TotalObsThisTime from &InputDS;
proc sql noprint; select count(*)+1 into:TotalObsLastTime from &InputDS; /* merge only if the number of observation changed in last iteration of while loop*/
%do %while (&minPVal < &AlphaLevel and & TotalObs ~= &TotalObsLastTime);
%let TotalObsLastTime = TotalObsThisTime;
%merge(InputDS, &MinPVal, &OutputDS);
proc sql noprint; select count(*) into:temp1 from &OutputDS;
%let TotalObsThisTime = &temp1;
proc sql noprint; select min(pVal) into:temp2 from &OutputDS;
%let minPVal = &temp2;
%end;
%mend showResults;
Now when I call showresults()... the problem is it should stop in the next iteration after it reaches the merging logic abs(pValue -MinPVal)<.0001 first time.
but it doesn't.