BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Here is the exact problem:

I have a Data Set containing some observations (say 10).
I have to merge two observation if they are similar.

Now limitations are:
We can merge only consecutive observation and
In one step only one merge should happen on one dataset (i.e if obs# 1 and 2 are similar, and obs# 4 and 5 are similar, we can't merge them in single step), so basically as soon as one merging is done I should get out of the merging step, and again run the merging procedure on the new data set obtained by first merging.


The technique i tried to follow is: I look for rows which are more similar. Say 1 and 2 are more similar (have less "p-value") than 4 and 5. Then I merge 1 and 2 first, otherwise I merge 4 and 5 first. Problem is that some times "p-values" are similar for more than 1 pair of observations. Then I have to merge any 2 observations (1 pair) and again run the merging procedure on new dataset.

Dataset Contains following fields:
GoodCount, TotalCount, pValue

Here is the Code which I tried: (If you have a solution for my problem you can ignore the code and just tell me the solution, otherwise please tell me where i m wrong)

%macro generatePValues(InputDS);
/* some code here to regenerate p values every time we call this macro, works fine*/
%mend;

%macro merge(InputDs, minPVal);
proc sql noprint; select count(*) into:flag from &InputDS;

data tempDS; set &InputDS;
if &flag = _N_-1 then stop;
dummy1 = lag(goodCount) + goodCount;
dummy2 = lag(totalCount) + totalCount;
if (pVal-&minPVal)<.0001 then do;
goodCount = dummy1;
totalCount = dummy2;
call symput ( 'deleteRow', _N_-1);
call symput ( 'Flag', _N_);
end;
run;

Data &OutputDS; set tempDS;
if _N_ NE &deleteRow;
drop dummy1 dummy2;
run;

generatePValues(&OuputDS, AlphaLevel); /*refresh p-values*/
%mend merge;

%macro showResults(InputDS,OutputDS);
generatePValues(&InputDS);
proc sql noprint; select min(pVal) into:MinPVal from &InputDS;
proc sql noprint; select count(*) into:TotalObsThisTime from &InputDS;
proc sql noprint; select count(*)+1 into:TotalObsLastTime from &InputDS; /* merge only if the number of observation changed in last iteration of while loop*/
%do %while (&minPVal < &AlphaLevel and & TotalObs ~= &TotalObsLastTime);
%let TotalObsLastTime = TotalObsThisTime;
%merge(InputDS, &MinPVal, &OutputDS);
proc sql noprint; select count(*) into:temp1 from &OutputDS;
%let TotalObsThisTime = &temp1;
proc sql noprint; select min(pVal) into:temp2 from &OutputDS;
%let minPVal = &temp2;
%end;
%mend showResults;

Now when I call showresults()... the problem is it should stop in the next iteration after it reaches the merging logic abs(pValue -MinPVal)<.0001 first time.
but it doesn't.
2 REPLIES 2
deleted_user
Not applicable
remaining code.. after generatePvalues
generatePValues(&OuputDS, AlphaLevel); /*refresh p-values*/
%mend merge;

%macro showResults(InputDS,OutputDS);
generatePValues(&InputDS);
proc sql noprint; select min(pVal) into:MinPVal from &InputDS;
proc sql noprint; select count(*) into:TotalObsThisTime from &InputDS;
proc sql noprint; select count(*)+1 into:TotalObsLastTime from &InputDS; /* merge only if the number of observation changed in last iteration of while loop*/
%do %while (&minPVal < &AlphaLevel and & TotalObs ~= &TotalObsLastTime);
%let TotalObsLastTime = TotalObsThisTime;
%merge(InputDS, &MinPVal, &OutputDS);
proc sql noprint; select count(*) into:temp1 from &OutputDS;
%let TotalObsThisTime = &temp1;
proc sql noprint; select min(pVal) into:temp2 from &OutputDS;
%let minPVal = &temp2;
%end;
%mend showResults;

Now when I call showresults()... the problem is it should stop in the next iteration after it reaches the merging logic abs(pValue -MinPVal)<.0001 first time.
but it doesn't.
GertNissen
Barite | Level 11
I'm not sure I understand your challenge - can you perhaps provide some 'before' and 'after' sample data.

Is it something like this your are trying ?[pre]
data input;
input value1 value2;
datalines;
1.1 1
1.2 2
2.0 3
3.1 4
3.2 5
1.3 6
4.0 7
;run;



data dist;
set input;
retain key 0;
prev_val = lag1(value1);
if _n_ = 1 then key = 1;
else if prev_val*0.9 <= value1 <= prev_val*1.1 then;
else key+1;
run;
[/pre]In this sample code each record gets a (new) surrogate key based on if the value (current record) is within a range (based on the previous record). After this you can use the new key, to merge, summarize etc. your data.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 1026 views
  • 0 likes
  • 2 in conversation