DATA Step, Macro, Functions and more

select observations if same entries are adjacent within the same variable group

Accepted Solution Solved
Reply
Super Contributor
Posts: 413
Accepted Solution

select observations if same entries are adjacent within the same variable group

Hi,

 

suppsoe I have the following data:

 

var1 var2 var3 var4 var5 var6
a a b a b b
a b b a b a
a b a a b a

 

var1 - var 3 are considered one group, and var4 - var6 are considered a second group.

 

what I would like to do is to select observations where there are identical entries adjacent to each other within the same group.

So the first row will be selected because the value "a" is present for var1 and var2 (and they are adjacent within the first group), and also the value "b" is present in var5 and var6 (and they are adjacent within the second group).

Likewise the second row will also be selected because "b" is in var2 and var3. The third row will NOT be selected.

I guess that the tricky art here is to code for "adjacentness within group"

 

Thanks!


Accepted Solutions
Solution
‎01-21-2017 09:24 AM
Super User
Posts: 5,084

Re: select observations if same entries are adjacent within the same variable group

This is getting very close to the solution I pictured for hundreds of variables.  There are two changes to consider:

 

  • Must both groups contain the same number of variables?
  • Should processing be cut short once a match is found?

Here would be the result:

 

data want;
set have;
array group1 {*} first list of many variable names;
array group2 {*} another set containing variables that belong in the second group;
flag=0;
do i=1 to dim(group1)-1 until (flag=1);
   if group1{i}=group1{i+1} then flag=1;
end;
if flag=0 then do i=1 to dim(group2)-1 until (flag=1);
   if group2{i}=group2{i+1} then flag=1;
end;

if flag;
run;

View solution in original post


All Replies
Super User
Posts: 5,084

Re: select observations if same entries are adjacent within the same variable group

If this really represents your problem, it's easy enough:

 

data want;

set have;

if var1=var2 or var2=var3 or var4=var5 or var5=var6;

run;

 

If you actually have hundreds of variables instead, the solution would use arrays.  But let's not go there unless it's needed.

Super User
Posts: 17,840

Re: select observations if same entries are adjacent within the same variable group

Another option, define two different arrays and check for adjacent within each array. 

PROC Star
Posts: 7,363

Re: select observations if same entries are adjacent within the same variable group

I'm not sure if I understand the requirements but here is the solution for what I think is being asked:

 

data have;
  infile cards dlm=',';
  informat var1-var6 $1.;
  length test $1;
  input var1-var6;
  array b $ var4-var6;
  call missing(test);
  if var1 eq var2 then test=var1;
  else if var2 eq var3 then test=var2;
  if not missing(test) and
     var1 in b and
     var3 in b;
  cards;
a,a,b,a,b,b
a,b,b,a,b,a
a,b,a,a,b,a
;

Art, CEO, AnalystFinder.com

 

Super Contributor
Posts: 413

Re: select observations if same entries are adjacent within the same variable group

Hi art297,

 

thanks for the code. I ran it and it gave me the first 2 rows. But then to test it further I made a small change to the second observation:

 

a,b,a,a,b,b --> I made the b's be adjacent in the second group. When I ran th ecode again I got only the first row and not also the second row, so its as if the second group isn't included in the selection process.

PROC Star
Posts: 7,363

Re: select observations if same entries are adjacent within the same variable group

Then try it with this modification:

 

data have (drop=test:);
  infile cards dlm=',';
  informat var1-var6 $1.;
  length test1 test2 $1;
  input var1-var6;
  array a $ var1-var3;
  array b $ var4-var6;
  call missing(test1);
  call missing(test2);
  if var1 eq var2 then test1=var1;
  else if var2 eq var3 then test1=var2;
  else if var4 eq var5 then test2=var4;
  else if var5 eq var6 then test2=var5;
  if not missing(test1) then do;
    if var1 in b and var3 in b then output;
  end;
  else if not missing(test2) then do;
    if var4 in a and var6 in a then output;
  end;
  cards;
a,a,b,a,b,b
a,b,b,a,b,a
a,b,a,a,b,a
a,b,a,a,b,b
;

Art, CEO, AnalystFinder.com

 

Trusted Advisor
Posts: 1,204

Re: select observations if same entries are adjacent within the same variable group

Hi,

 

Define two separate arrays based on two group of variables. Please try this.

 

data want;
set have;
array v13(*) var1-var3;
array v46(*) var4-var6;
flag=0;
do i=1 to dim(v13)-1;
   if v13{i}=v13{i+1} then flag+1;
   if v46{i}=v46{i+1} then flag+1;
end;
if flag;
run;

Solution
‎01-21-2017 09:24 AM
Super User
Posts: 5,084

Re: select observations if same entries are adjacent within the same variable group

This is getting very close to the solution I pictured for hundreds of variables.  There are two changes to consider:

 

  • Must both groups contain the same number of variables?
  • Should processing be cut short once a match is found?

Here would be the result:

 

data want;
set have;
array group1 {*} first list of many variable names;
array group2 {*} another set containing variables that belong in the second group;
flag=0;
do i=1 to dim(group1)-1 until (flag=1);
   if group1{i}=group1{i+1} then flag=1;
end;
if flag=0 then do i=1 to dim(group2)-1 until (flag=1);
   if group2{i}=group2{i+1} then flag=1;
end;

if flag;
run;

Valued Guide
Posts: 797

Re: select observations if same entries are adjacent within the same variable group

If there are multple groups of varying sizes, there is still a way to avoid multiple arrays and do groups.  Say you have 100 vars in 6 groups of size 10, 20, 30, 20,10, 10.  Placed in an array of 100 vars, they would have "upper bounds" at elements 10, 30, 60, 80, 90, and 100 respectively:

 

data want;
  set have;
  array var{*}   var1-var100;
  array upbnds{6} _temporary_ (10 30 60 80 90 100);

  U=1;
  do V=1 to dim(var)-1 until (flag=1);
    if V=upbnds{U} then U=U+1;
    else if var{v}=var{v+1} then flag=1;
  end;
  if flag;
run;

 

 

 

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 230 views
  • 6 likes
  • 6 in conversation