turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- How to merge two observations in a single Dataset ...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-19-2008 07:45 AM

Here is the exact problem:

I have a Data Set containing some observations (say 10).

I have to merge two observation if they are similar.

Now limitations are:

We can merge only consecutive observation and

In one step only one merge should happen on one dataset (i.e if obs# 1 and 2 are similar, and obs# 4 and 5 are similar, we can't merge them in single step), so basically as soon as one merging is done I should get out of the merging step, and again run the merging procedure on the new data set obtained by first merging.

The technique i tried to follow is: I look for rows which are more similar. Say 1 and 2 are more similar (have less "p-value") than 4 and 5. Then I merge 1 and 2 first, otherwise I merge 4 and 5 first. Problem is that some times "p-values" are similar for more than 1 pair of observations. Then I have to merge any 2 observations (1 pair) and again run the merging procedure on new dataset.

Dataset Contains following fields:

GoodCount, TotalCount, pValue

Here is the Code which I tried: (If you have a solution for my problem you can ignore the code and just tell me the solution, otherwise please tell me where i m wrong)

%macro generatePValues(InputDS);

/* some code here to regenerate p values every time we call this macro, works fine*/

%mend;

%macro merge(InputDs, minPVal);

proc sql noprint; select count(*) into:flag from &InputDS;

data tempDS; set &InputDS;

if &flag = _N_-1 then stop;

dummy1 = lag(goodCount) + goodCount;

dummy2 = lag(totalCount) + totalCount;

if (pVal-&minPVal)<.0001 then do;

goodCount = dummy1;

totalCount = dummy2;

call symput ( 'deleteRow', _N_-1);

call symput ( 'Flag', _N_);

end;

run;

Data &OutputDS; set tempDS;

if _N_ NE &deleteRow;

drop dummy1 dummy2;

run;

generatePValues(&OuputDS, AlphaLevel); /*refresh p-values*/

%mend merge;

%macro showResults(InputDS,OutputDS);

generatePValues(&InputDS);

proc sql noprint; select min(pVal) into:MinPVal from &InputDS;

proc sql noprint; select count(*) into:TotalObsThisTime from &InputDS;

proc sql noprint; select count(*)+1 into:TotalObsLastTime from &InputDS; /* merge only if the number of observation changed in last iteration of while loop*/

%do %while (&minPVal < &AlphaLevel and & TotalObs ~= &TotalObsLastTime);

%let TotalObsLastTime = TotalObsThisTime;

%merge(InputDS, &MinPVal, &OutputDS);

proc sql noprint; select count(*) into:temp1 from &OutputDS;

%let TotalObsThisTime = &temp1;

proc sql noprint; select min(pVal) into:temp2 from &OutputDS;

%let minPVal = &temp2;

%end;

%mend showResults;

Now when I call showresults()... the problem is it should stop in the next iteration after it reaches the merging logic abs(pValue -MinPVal)<.0001 first time.

but it doesn't.

I have a Data Set containing some observations (say 10).

I have to merge two observation if they are similar.

Now limitations are:

We can merge only consecutive observation and

In one step only one merge should happen on one dataset (i.e if obs# 1 and 2 are similar, and obs# 4 and 5 are similar, we can't merge them in single step), so basically as soon as one merging is done I should get out of the merging step, and again run the merging procedure on the new data set obtained by first merging.

The technique i tried to follow is: I look for rows which are more similar. Say 1 and 2 are more similar (have less "p-value") than 4 and 5. Then I merge 1 and 2 first, otherwise I merge 4 and 5 first. Problem is that some times "p-values" are similar for more than 1 pair of observations. Then I have to merge any 2 observations (1 pair) and again run the merging procedure on new dataset.

Dataset Contains following fields:

GoodCount, TotalCount, pValue

Here is the Code which I tried: (If you have a solution for my problem you can ignore the code and just tell me the solution, otherwise please tell me where i m wrong)

%macro generatePValues(InputDS);

/* some code here to regenerate p values every time we call this macro, works fine*/

%mend;

%macro merge(InputDs, minPVal);

proc sql noprint; select count(*) into:flag from &InputDS;

data tempDS; set &InputDS;

if &flag = _N_-1 then stop;

dummy1 = lag(goodCount) + goodCount;

dummy2 = lag(totalCount) + totalCount;

if (pVal-&minPVal)<.0001 then do;

goodCount = dummy1;

totalCount = dummy2;

call symput ( 'deleteRow', _N_-1);

call symput ( 'Flag', _N_);

end;

run;

Data &OutputDS; set tempDS;

if _N_ NE &deleteRow;

drop dummy1 dummy2;

run;

generatePValues(&OuputDS, AlphaLevel); /*refresh p-values*/

%mend merge;

%macro showResults(InputDS,OutputDS);

generatePValues(&InputDS);

proc sql noprint; select min(pVal) into:MinPVal from &InputDS;

proc sql noprint; select count(*) into:TotalObsThisTime from &InputDS;

proc sql noprint; select count(*)+1 into:TotalObsLastTime from &InputDS; /* merge only if the number of observation changed in last iteration of while loop*/

%do %while (&minPVal < &AlphaLevel and & TotalObs ~= &TotalObsLastTime);

%let TotalObsLastTime = TotalObsThisTime;

%merge(InputDS, &MinPVal, &OutputDS);

proc sql noprint; select count(*) into:temp1 from &OutputDS;

%let TotalObsThisTime = &temp1;

proc sql noprint; select min(pVal) into:temp2 from &OutputDS;

%let minPVal = &temp2;

%end;

%mend showResults;

Now when I call showresults()... the problem is it should stop in the next iteration after it reaches the merging logic abs(pValue -MinPVal)<.0001 first time.

but it doesn't.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

11-19-2008 07:47 AM

remaining code.. after generatePvalues

generatePValues(&OuputDS, AlphaLevel); /*refresh p-values*/

%mend merge;

%macro showResults(InputDS,OutputDS);

generatePValues(&InputDS);

proc sql noprint; select min(pVal) into:MinPVal from &InputDS;

proc sql noprint; select count(*) into:TotalObsThisTime from &InputDS;

proc sql noprint; select count(*)+1 into:TotalObsLastTime from &InputDS; /* merge only if the number of observation changed in last iteration of while loop*/

%do %while (&minPVal < &AlphaLevel and & TotalObs ~= &TotalObsLastTime);

%let TotalObsLastTime = TotalObsThisTime;

%merge(InputDS, &MinPVal, &OutputDS);

proc sql noprint; select count(*) into:temp1 from &OutputDS;

%let TotalObsThisTime = &temp1;

proc sql noprint; select min(pVal) into:temp2 from &OutputDS;

%let minPVal = &temp2;

%end;

%mend showResults;

Now when I call showresults()... the problem is it should stop in the next iteration after it reaches the merging logic abs(pValue -MinPVal)<.0001 first time.

but it doesn't.

generatePValues(&OuputDS, AlphaLevel); /*refresh p-values*/

%mend merge;

%macro showResults(InputDS,OutputDS);

generatePValues(&InputDS);

proc sql noprint; select min(pVal) into:MinPVal from &InputDS;

proc sql noprint; select count(*) into:TotalObsThisTime from &InputDS;

proc sql noprint; select count(*)+1 into:TotalObsLastTime from &InputDS; /* merge only if the number of observation changed in last iteration of while loop*/

%do %while (&minPVal < &AlphaLevel and & TotalObs ~= &TotalObsLastTime);

%let TotalObsLastTime = TotalObsThisTime;

%merge(InputDS, &MinPVal, &OutputDS);

proc sql noprint; select count(*) into:temp1 from &OutputDS;

%let TotalObsThisTime = &temp1;

proc sql noprint; select min(pVal) into:temp2 from &OutputDS;

%let minPVal = &temp2;

%end;

%mend showResults;

Now when I call showresults()... the problem is it should stop in the next iteration after it reaches the merging logic abs(pValue -MinPVal)<.0001 first time.

but it doesn't.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

11-19-2008 09:27 AM

I'm not sure I understand your challenge - can you perhaps provide some 'before' and 'after' sample data.

Is it something like this your are trying ?[pre]

data input;

input value1 value2;

datalines;

1.1 1

1.2 2

2.0 3

3.1 4

3.2 5

1.3 6

4.0 7

;run;

data dist;

set input;

retain key 0;

prev_val = lag1(value1);

if _n_ = 1 then key = 1;

else if prev_val*0.9 <= value1 <= prev_val*1.1 then;

else key+1;

run;

[/pre]In this sample code each record gets a (new) surrogate key based on if the value (current record) is within a range (based on the previous record). After this you can use the new key, to merge, summarize etc. your data.

Is it something like this your are trying ?[pre]

data input;

input value1 value2;

datalines;

1.1 1

1.2 2

2.0 3

3.1 4

3.2 5

1.3 6

4.0 7

;run;

data dist;

set input;

retain key 0;

prev_val = lag1(value1);

if _n_ = 1 then key = 1;

else if prev_val*0.9 <= value1 <= prev_val*1.1 then;

else key+1;

run;

[/pre]In this sample code each record gets a (new) surrogate key based on if the value (current record) is within a range (based on the previous record). After this you can use the new key, to merge, summarize etc. your data.