Solved: Re: select observations if same entries are adjacent within the same v...

ilikesas · Posted 01-20-2017 10:04 PM

Hi,

suppsoe I have the following data:

var1	var2	var3	var4	var5	var6
a	a	b	a	b	b
a	b	b	a	b	a
a	b	a	a	b	a

var1 - var 3 are considered one group, and var4 - var6 are considered a second group.

what I would like to do is to select observations where there are identical entries adjacent to each other within the same group.

So the first row will be selected because the value "a" is present for var1 and var2 (and they are adjacent within the first group), and also the value "b" is present in var5 and var6 (and they are adjacent within the second group).

Likewise the second row will also be selected because "b" is in var2 and var3. The third row will NOT be selected.

I guess that the tricky art here is to code for "adjacentness within group"

Thanks!

Astounding · Posted 01-21-2017 07:40 AM

This is getting very close to the solution I pictured for hundreds of variables. There are two changes to consider:

Must both groups contain the same number of variables?
Should processing be cut short once a match is found?

Here would be the result:

data want;
set have;
array group1 {*} first list of many variable names;
array group2 {*} another set containing variables that belong in the second group;
flag=0;
do i=1 to dim(group1)-1 until (flag=1);
if group1{i}=group1{i+1} then flag=1;
end;
if flag=0 then do i=1 to dim(group2)-1 until (flag=1);
if group2{i}=group2{i+1} then flag=1;
end;
if flag;
run;

View solution in original post

Astounding · Posted 01-20-2017 10:42 PM

If this really represents your problem, it's easy enough:

data want;

set have;

if var1=var2 or var2=var3 or var4=var5 or var5=var6;

run;

If you actually have hundreds of variables instead, the solution would use arrays. But let's not go there unless it's needed.

Reeza · Posted 01-20-2017 10:46 PM

Another option, define two different arrays and check for adjacent within each array.

art297 · Posted 01-20-2017 10:50 PM

I'm not sure if I understand the requirements but here is the solution for what I think is being asked:

data have;
  infile cards dlm=',';
  informat var1-var6 $1.;
  length test $1;
  input var1-var6;
  array b $ var4-var6;
  call missing(test);
  if var1 eq var2 then test=var1;
  else if var2 eq var3 then test=var2;
  if not missing(test) and
     var1 in b and
     var3 in b;
  cards;
a,a,b,a,b,b
a,b,b,a,b,a
a,b,a,a,b,a
;

Art, CEO, AnalystFinder.com

ilikesas · Posted 01-20-2017 11:53 PM

Hi art297,

thanks for the code. I ran it and it gave me the first 2 rows. But then to test it further I made a small change to the second observation:

a,b,a,a,b,b --> I made the b's be adjacent in the second group. When I ran th ecode again I got only the first row and not also the second row, so its as if the second group isn't included in the selection process.

art297 · Posted 01-21-2017 12:37 AM

Then try it with this modification:

data have (drop=test:);
  infile cards dlm=',';
  informat var1-var6 $1.;
  length test1 test2 $1;
  input var1-var6;
  array a $ var1-var3;
  array b $ var4-var6;
  call missing(test1);
  call missing(test2);
  if var1 eq var2 then test1=var1;
  else if var2 eq var3 then test1=var2;
  else if var4 eq var5 then test2=var4;
  else if var5 eq var6 then test2=var5;
  if not missing(test1) then do;
    if var1 in b and var3 in b then output;
  end;
  else if not missing(test2) then do;
    if var4 in a and var6 in a then output;
  end;
  cards;
a,a,b,a,b,b
a,b,b,a,b,a
a,b,a,a,b,a
a,b,a,a,b,b
;

Art, CEO, AnalystFinder.com

stat_sas · Posted 01-20-2017 11:48 PM

Hi,

Define two separate arrays based on two group of variables. Please try this.

data want;
set have;
array v13(*) var1-var3;
array v46(*) var4-var6;
flag=0;
do i=1 to dim(v13)-1;
if v13{i}=v13{i+1} then flag+1;
if v46{i}=v46{i+1} then flag+1;
end;
if flag;
run;

Astounding · Posted 01-21-2017 07:40 AM

This is getting very close to the solution I pictured for hundreds of variables. There are two changes to consider:

Must both groups contain the same number of variables?
Should processing be cut short once a match is found?

Here would be the result:

data want;
set have;
array group1 {*} first list of many variable names;
array group2 {*} another set containing variables that belong in the second group;
flag=0;
do i=1 to dim(group1)-1 until (flag=1);
if group1{i}=group1{i+1} then flag=1;
end;
if flag=0 then do i=1 to dim(group2)-1 until (flag=1);
if group2{i}=group2{i+1} then flag=1;
end;
if flag;
run;

mkeintz · Posted 01-21-2017 09:54 AM

If there are multple groups of varying sizes, there is still a way to avoid multiple arrays and do groups. Say you have 100 vars in 6 groups of size 10, 20, 30, 20,10, 10. Placed in an array of 100 vars, they would have "upper bounds" at elements 10, 30, 60, 80, 90, and 100 respectively:

data want;
  set have;
  array var{*}   var1-var100;
  array upbnds{6} _temporary_ (10 30 60 80 90 100);

  U=1;
  do V=1 to dim(var)-1 until (flag=1);
    if V=upbnds{U} then U=U+1;
    else if var{v}=var{v+1} then flag=1;
  end;
  if flag;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

select observations if same entries are adjacent within the same variable group

Re: select observations if same entries are adjacent within the same variable group

Re: select observations if same entries are adjacent within the same variable group

Re: select observations if same entries are adjacent within the same variable group

Re: select observations if same entries are adjacent within the same variable group

Re: select observations if same entries are adjacent within the same variable group

Re: select observations if same entries are adjacent within the same variable group

Re: select observations if same entries are adjacent within the same variable group

Re: select observations if same entries are adjacent within the same variable group

Re: select observations if same entries are adjacent within the same variable group

SAS Innovate 2026 Registration is Open

SAS Training: Just a Click Away