Hi,
suppsoe I have the following data:
| var1 | var2 | var3 | var4 | var5 | var6 | 
| a | a | b | a | b | b | 
| a | b | b | a | b | a | 
| a | b | a | a | b | a | 
var1 - var 3 are considered one group, and var4 - var6 are considered a second group.
what I would like to do is to select observations where there are identical entries adjacent to each other within the same group.
So the first row will be selected because the value "a" is present for var1 and var2 (and they are adjacent within the first group), and also the value "b" is present in var5 and var6 (and they are adjacent within the second group).
Likewise the second row will also be selected because "b" is in var2 and var3. The third row will NOT be selected.
I guess that the tricky art here is to code for "adjacentness within group"
Thanks!
This is getting very close to the solution I pictured for hundreds of variables. There are two changes to consider:
Here would be the result:
data want;
set have;
array group1 {*} first list of many variable names;
array group2 {*} another set containing variables that belong in the second group;
flag=0;
do i=1 to dim(group1)-1 until (flag=1);
   if group1{i}=group1{i+1} then flag=1;
end;
if flag=0 then do i=1 to dim(group2)-1 until (flag=1);
   if group2{i}=group2{i+1} then flag=1;
end;
if flag;
run;
If this really represents your problem, it's easy enough:
data want;
set have;
if var1=var2 or var2=var3 or var4=var5 or var5=var6;
run;
If you actually have hundreds of variables instead, the solution would use arrays. But let's not go there unless it's needed.
Another option, define two different arrays and check for adjacent within each array.
I'm not sure if I understand the requirements but here is the solution for what I think is being asked:
data have;
  infile cards dlm=',';
  informat var1-var6 $1.;
  length test $1;
  input var1-var6;
  array b $ var4-var6;
  call missing(test);
  if var1 eq var2 then test=var1;
  else if var2 eq var3 then test=var2;
  if not missing(test) and
     var1 in b and
     var3 in b;
  cards;
a,a,b,a,b,b
a,b,b,a,b,a
a,b,a,a,b,a
;
Art, CEO, AnalystFinder.com
Hi art297,
thanks for the code. I ran it and it gave me the first 2 rows. But then to test it further I made a small change to the second observation:
a,b,a,a,b,b --> I made the b's be adjacent in the second group. When I ran th ecode again I got only the first row and not also the second row, so its as if the second group isn't included in the selection process.
Then try it with this modification:
data have (drop=test:);
  infile cards dlm=',';
  informat var1-var6 $1.;
  length test1 test2 $1;
  input var1-var6;
  array a $ var1-var3;
  array b $ var4-var6;
  call missing(test1);
  call missing(test2);
  if var1 eq var2 then test1=var1;
  else if var2 eq var3 then test1=var2;
  else if var4 eq var5 then test2=var4;
  else if var5 eq var6 then test2=var5;
  if not missing(test1) then do;
    if var1 in b and var3 in b then output;
  end;
  else if not missing(test2) then do;
    if var4 in a and var6 in a then output;
  end;
  cards;
a,a,b,a,b,b
a,b,b,a,b,a
a,b,a,a,b,a
a,b,a,a,b,b
;
Art, CEO, AnalystFinder.com
Hi,
Define two separate arrays based on two group of variables. Please try this.
data want;
set have;
array v13(*) var1-var3;
array v46(*) var4-var6;
flag=0;
do i=1 to dim(v13)-1;
   if v13{i}=v13{i+1} then flag+1;
   if v46{i}=v46{i+1} then flag+1;
end;
if flag;
run;
This is getting very close to the solution I pictured for hundreds of variables. There are two changes to consider:
Here would be the result:
data want;
set have;
array group1 {*} first list of many variable names;
array group2 {*} another set containing variables that belong in the second group;
flag=0;
do i=1 to dim(group1)-1 until (flag=1);
   if group1{i}=group1{i+1} then flag=1;
end;
if flag=0 then do i=1 to dim(group2)-1 until (flag=1);
   if group2{i}=group2{i+1} then flag=1;
end;
if flag;
run;
If there are multple groups of varying sizes, there is still a way to avoid multiple arrays and do groups. Say you have 100 vars in 6 groups of size 10, 20, 30, 20,10, 10. Placed in an array of 100 vars, they would have "upper bounds" at elements 10, 30, 60, 80, 90, and 100 respectively:
data want;
  set have;
  array var{*}   var1-var100;
  array upbnds{6} _temporary_ (10 30 60 80 90 100);
  U=1;
  do V=1 to dim(var)-1 until (flag=1);
    if V=upbnds{U} then U=U+1;
    else if var{v}=var{v+1} then flag=1;
  end;
  if flag;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
