I'm re-uploading this topic because the previous one was considered as solved i don't know why...
I have this dataset :
Data Table;
Input ID Y VAR $;
datalines;
1 -1762.05 A
2 -776.07 A
3 -670 A
4 -535 A
6 -531 A
7 -528 B
8 -294 B
9 -281 B
10 -227 B
;
run;
I'm now creating variables to get the difference of values between previous and next values, for each observation :
%MACRO COMPARE(table= , var= );
Data &table.;
retain id;
set &table.;
id=_N_;
run;
data &table._COMPARE;
set &table.;
by ID;
set &table. ( firstobs = 2 keep = &var. rename = (&var. = Next_Measure) )
&table. ( obs = 1 drop = _all_ );
Label Next_Measure=Next_Measure;
Prev_Measure = lag(&var.) ;
Diff_prev= (&var.-lag(&var.))/lag(&var.) ;
Diff_Next= (Next_Measure-&var.)/&var. ;
run;
%MEND;
%COMPARE(table=Table, var=Y);
data TABLE_COMPARE ;
Set TABLE_COMPARE;
If abs(Diff_prev) < 0.01 or abs(Diff_Next) < 0.01 then Group= 1;
run;
What I'm looking to get is :
Data TABLE_COMPARE_WANT;
Set TABLE_COMPARE;
Input Group_WANT;
datalines;
1
.
.
2
2
1
.
.
2
;
run;
So i'm looking to have for each variable, several groups that must contain very close observations (a difference of 0.01 with the previous OR the next observation).
So in my example i would have two group 1 and group 2 for VAR A and Group1 and group 2 for VAR B.
Thank You for your answer
I still don't think your description is clear enough, as the very first record seems to be assigned to group 1, yet it does not meet the condition. This needs further explanation, as do several other points in your explanation.
Nevertheless, the way to get any macro code to work is to first create code that works without macros and without macro variables. It does not seem that you have done this, and I would strongly recommend you do that. Here is my attempt to get this to work without macros and without macro variables, although as I said, I don't see why you say the first record is in group 1; my code assigns it to the missing group.
Also, please state why macros are needed here at all. Thanks.
proc expand data=table out=table1;
by var;
convert y=lag_y/transformout=(lag 1);
convert y=lead_y/transformout=(lead 1);
run;
data want;
set table1;
by var;
retain prev_group;
prev_group=lag(group);
diff_prev=abs((y-lag_y)/lag_y);
diff_next=abs((lead_y-y)/y);
condition=(diff_prev<0.01 and not missing(diff_prev)) or (diff_next<0.01 and not missing(diff_next));
if first.var then counter=0;
counter+1;
if counter=1 and condition then group=1; else group=.;
if counter=2 and condition then group=1; else group=.;
if counter>2 then do;
if condition and (prev_group=1 or prev_group=.) then group=1;
if condition and missing(prev_group) then group=2;
else group=.;
end;
prev_group=lag(group);
run;
You haven't stated what criterion is used for assigning observations to group 2.
The criterion is :
The first time the variable is encountered, if the condition "abs(Diff_prev) < 0.01 or abs(Diff_Next) < 0.01" is met, then Group=1.
For the second observation of this variable :
- if the condition is met, then group is still equal to 1 : indeed, it is in the same group as the previous observation, as the difference with first observation's value of Y is small (<0.01)).
- If the condition is not met, then group = "." (don't do anything)
For the third observation
- If the condition is met :
- if previous value of Group variable was 1 then, Group = 1 for thirs observation
- if previous value of Group variable was "." then Group = 2 : indeed, a new group is created as this thirs observation has close value with the next observation ("Next diff"<0.01).
- If the condition is not met : then group ="."
@Mathis1 wrote:
The criterion is :
The first time the variable is encountered
Which variable? What do you mean by "encountered"? Variables are either present or not present, they are not "encountered" as I understand the word.
The variable is the "VAR" column in my example, and takes the value"A" or "B" in the table Table.
I still don't think your description is clear enough, as the very first record seems to be assigned to group 1, yet it does not meet the condition. This needs further explanation, as do several other points in your explanation.
Nevertheless, the way to get any macro code to work is to first create code that works without macros and without macro variables. It does not seem that you have done this, and I would strongly recommend you do that. Here is my attempt to get this to work without macros and without macro variables, although as I said, I don't see why you say the first record is in group 1; my code assigns it to the missing group.
Also, please state why macros are needed here at all. Thanks.
proc expand data=table out=table1;
by var;
convert y=lag_y/transformout=(lag 1);
convert y=lead_y/transformout=(lead 1);
run;
data want;
set table1;
by var;
retain prev_group;
prev_group=lag(group);
diff_prev=abs((y-lag_y)/lag_y);
diff_next=abs((lead_y-y)/y);
condition=(diff_prev<0.01 and not missing(diff_prev)) or (diff_next<0.01 and not missing(diff_next));
if first.var then counter=0;
counter+1;
if counter=1 and condition then group=1; else group=.;
if counter=2 and condition then group=1; else group=.;
if counter>2 then do;
if condition and (prev_group=1 or prev_group=.) then group=1;
if condition and missing(prev_group) then group=2;
else group=.;
end;
prev_group=lag(group);
run;
You should seriously reconsider use of this programming construct:
Data &table.; retain id; set &table.;
When you use the same name for the output data set as the source data you completely replace the source data. So if there is any logic issue you may not have the data you need at all. This is further compounded by burying it in a macro. You may have already so altered the source data with testing that you can't get what you want.
Did you have something that worked as needed before attempting the macro? If not, get a version working without any macro calls or variables.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.