Solved: Re: Extracting all records that share variable 2 values, where at leas...

jibra · Posted 07-21-2025 02:57 PM

Here's a sample of what I have:

var1	var2
AA	Paul
BB	Paul
XX	Paul
dd	Mike
pp	Mike
cc	Dave
ee	Dave
ff	Dave
gg	Dave
XX	Dave
hh	Ken
ii	Ken
jj	Ken
XX	Ken
kk	Rick
ll	Rick
mm	Rick

Here's a sample of what I want:

var1	var2
AA	Paul
BB	Paul
XX	Paul
cc	Dave
ee	Dave
ff	Dave
gg	Dave
XX	Dave
hh	Ken
ii	Ken
jj	Ken
XX	Ken

Patrick · Posted 07-21-2025 07:37 PM

Or a data step hash table lookup

data want;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'have(where=(var1=:"XX"))');
      h1.defineKey('var2');
      h1.defineDone();
    end;
  set have;
  if h1.check()=0;
run;

View solution in original post

jibra · Posted 07-21-2025 02:57 PM

Any suggestion would be welcome.

Quentin · Posted 07-21-2025 03:39 PM

Please post your sample data as code (DATA step with CARDS statement). Also please post the code you have tried. This will help others help you.

Tom · Posted 07-21-2025 04:10 PM

Trying to guess what you mean it sounds like you want all observations for any person that has VAR1='XX'. (Note question are easier to read if the details are in the body of the question and not just the subject line.)

SQL should make that simple.

proc sql;
create table want as 
  select * from have
  where var2 in
    (select var2 from have where var1='XX')
;
quit;

Kurt_Bremser · Posted 07-21-2025 05:18 PM

Double DO method:

data want;
do until (last.var2);
  set have;
  by var2 notsorted;
  if var1 = "XX" then flag = 1;
end;
do until (last.var2);
  set have;
  by var2 notsorted;
  if flag then output;
end;
drop flag;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Patrick · Posted 07-21-2025 07:37 PM

Or a data step hash table lookup

data want;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'have(where=(var1=:"XX"))');
      h1.defineKey('var2');
      h1.defineDone();
    end;
  set have;
  if h1.check()=0;
run;

jibra · Posted 07-22-2025 11:30 AM

All suggestions in this thread worked, but the hash object technique seems to be the most efficient. Thank you!

jibra · Posted 07-22-2025 11:41 AM

Most efficient of all the solutions offered.

mkeintz · Posted 07-22-2025 12:05 AM

If the dataset is not particularly big, then I think your best option is the hash object technique suggested by @Patrick. The code is simple.

That solution requires reading the data in two unsynchronized data streams, which could create a performance hit in the case of large datasets, due to disk activity.

Now if the data were sorted by VAR2, you could do:

data want;
  merge have (where=(var1='XX')  in=wanted) 
        have;
  by var2;
  if wanted;
run;

which synchronizes the data streams and reduces disk activity.

Your data are not sorted by VAR2, although it is grouped by VAR2. So you can generate data stream synchronization with code such as this:

data want (drop=_:);
  set have (where=(var1='XX') rename=(var2=_var2));

  do until (last.var2=1 and var2=_var2);
    set have;
    by var2 notsorted;
    if var2=_var2 then output; 
  end;
run;

The above assumes no more than one var1='XX' for any given VAR2. If you can have two or more var1='XX' cases, then:

data want (drop=_:);
  set have (where=(var1='XX') rename=(var2=_var2));
  by _var2 notsorted;

  if last._var2 then do until (last.var2=1 and var2=_var2);
    set have;
    by var2 notsorted;
    if var2=_var2 then output; 
  end;
run;

And yes, the degree of synchronization achieved would depend on the distribution of the "XX" cases.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

yabwon · Posted 07-22-2025 03:35 AM

Just to have (more) complete picture.

The power of IN operator over a macro variable with constant list of values:

data have;
input var1 $ var2 $;
cards;
AA	Paul
BB	Paul
XX	Paul
dd	Mike
pp	Mike
cc	Dave
ee	Dave
ff	Dave
gg	Dave
XX	Dave
hh	Ken
ii	Ken
jj	Ken
XX	Ken
kk	Rick
ll	Rick
mm	Rick
;
run;
proc print data=have;
run;

proc SQL noprint;
  select 
  distinct quote(var2)
  into :myList separated by " "
  from have
  where var1 = "XX"
  ;
quit;

data want;
  set have;
  where var2 in (&myList.);
run;
proc print data=want;
run;

BTW. Just for fun, here is an article "Route Sixty-Six to SAS Programming, or 63 (+3) Syntax Snippets for a Table Look-up Task, or How to L..." that provides 66 examples of solving the table-lookup task 🙂

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Ksharp · Posted 07-22-2025 04:31 AM

Or you would like to get it automatically ? and not type XX by hand ?

data have;
input var1 $ var2 $;
cards;
AA	Paul
BB	Paul
XX	Paul
dd	Mike
pp	Mike
cc	Dave
ee	Dave
ff	Dave
gg	Dave
XX	Dave
hh	Ken
ii	Ken
jj	Ken
XX	Ken
kk	Rick
ll	Rick
mm	Rick
;

proc sql;
create table want as
select * from have where var2 in 
(
select distinct var2 from have group by var1 having count(distinct var2)>1
)
;
quit;

yabwon · Posted 07-22-2025 04:54 AM

I'm not sure if this is possible to have such data, but what about "Joe" (last record) ?

data have;
input var1 $ var2 $;
cards;
AA	Paul
BB	Paul
XX	Paul
dd	Mike
pp	Mike
cc	Dave
ee	Dave
ff	Dave
gg	Dave
XX	Dave
hh	Ken
ii	Ken
jj	Ken
XX	Ken
kk	Rick
ll	Rick
mm	Rick
AA	Joe
;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Ksharp · Posted 07-22-2025 05:21 AM

yabwom,
I don't know either .
It is all depend on what the OP is looking for.
The code I posted is just guessing the intention of OP .

Extracting all records that share variable 2 values, where at least one variable 1 value is = XX

Re: Extracting all records that share variable 2 values, where at least one variable 1 value is = XX

Re: Extracting all records that share variable 2 values, where at least one variable 1 value is = XX

Re: Extracting all records that share variable 2 values, where at least one variable 1 value is = XX

Re: Extracting all records that share variable 2 values, where at least one variable 1 value is = XX

Re: Extracting all records that share variable 2 values, where at least one variable 1 value is = XX

Re: Extracting all records that share variable 2 values, where at least one variable 1 value is = XX

Re: Extracting all records that share variable 2 values, where at least one variable 1 value is = XX

Re: Extracting all records that share variable 2 values, where at least one variable 1 value is = XX

Re: Extracting all records that share variable 2 values, where at least one variable 1 value is = XX

Re: Extracting all records that share variable 2 values, where at least one variable 1 value is = XX

Re: Extracting all records that share variable 2 values, where at least one variable 1 value is = XX

Re: Extracting all records that share variable 2 values, where at least one variable 1 value is = XX

Re: Extracting all records that share variable 2 values, where at least one variable 1 value is = XX

Registration is open