Help using Base SAS procedures

PROC SQL or SAS Distinct Clause

Accepted Solution Solved
Reply
Contributor
Posts: 22
Accepted Solution

PROC SQL or SAS Distinct Clause

Hello,

Is there a way in SAS or Proc SQL, where we can not only select Distinct Rows, but also, the individual column values are not equal to each other? Below, an example-


City 1           City 2

A1               B1            

A1               B2           

A2               B1

A2               B2

Select "Distinct City 1, City 2"- would give me all the 4 rows. But not only the rows to be distinct, but also the values individually should be different from their column values. There may be multiple such combinations. Here, acceptable combinations would be rows 1&3 OR rows 2&4. So two such sets possible here. Is it possible to extract out such sets? Where, not only entire rows are distinct, but individual values are also distinct? SAS or SQL, whichever way.


Accepted Solutions
Solution
‎06-30-2015 10:07 AM
Super User
Posts: 10,045

Re: PROC SQL or SAS Distinct Clause

Posted in reply to MikeCarter

Interesting Question.

Code: Program

data pairs;
input City1 $ City2 $;
datalines;
A1 B1 
A1 B2 
A2 B1
A2 B2
;
run;
data want;
if _n_ eq 1 then do;
  declare hash h1();
  h1.definekey('city1');
  h1.definedone();
  declare hash h2();
  h2.definekey('city2');
  h2.definedone();
end;
set pairs;
if h1.check() ne 0 and h2.check() ne 0 then do;
h1.replace();h2.replace();output;
end;
run;

Xia Keshan

View solution in original post


All Replies
Respected Advisor
Posts: 3,156

Re: PROC SQL or SAS Distinct Clause

Posted in reply to MikeCarter

What will be your wanted outcome and by what rules?

Contributor
Posts: 22

Re: PROC SQL or SAS Distinct Clause

Sure, as I said- wanted outcome would be subset of rows, where individual column values are not equal to each other.

City 1         City 2

A1               B1

A2               B2

Or

City 1          City 2

A1               B2

A2               B1

Both these outputs are acceptable, because values in individual columns (City 1, City2) are not repeated. To extract both of them would be great.

Proc SQL "Select Distinct City1, City2"- doesn't necessarily guarantee this, because although it may get us to distinct rows, the individual values will be different. But that's the subset I need.

Super User
Posts: 11,343

Re: PROC SQL or SAS Distinct Clause

Posted in reply to MikeCarter

Are any of the values for City1 also in City2? How to handle if so.

Super User
Super User
Posts: 7,076

Re: PROC SQL or SAS Distinct Clause

Posted in reply to MikeCarter

Not sure exactly what you want.  Sounds like you want to find all possible values of CITY1 and all possible values of CITY2.  But I don't think that in general it would be possible to generate such a set only by selecting rows from the original file.  Why not just generate two separate tables?

select distinct city1 from have;

select distinct city2 from have;

Perhaps you could generate an iterative process to delete the rows where both CITY1 and CITY2 are already included in the previous rows selected.  That would probably work best by using hash tables in a data step.  Not sure how you could find the minimum such set of rows.

Respected Advisor
Posts: 4,934

Re: PROC SQL or SAS Distinct Clause

Posted in reply to MikeCarter

Looks like an assignment (matching) problem to me. Not trivial. If you have SAS/OR licenced, you could use something like :

data pairs;

input City1 $ City2 $;

datalines;

A1               B1          

A1               B2         

A2               B1

A2               B2

;

proc optnet

   graph_direction = directed

   data_links      = pairs;

   data_links_var

      from      = City1

      to           = City2;

   linear_assignment

      out         = uniquePairs(drop=cost);

run;

proc print data=uniquePairs; run;

PG

PG
Contributor
Posts: 22

Re: PROC SQL or SAS Distinct Clause

Hello PGStats- this is great. Yes, I think it worked. I have never used OPTNET, great application. I think, I'll do a bit more research on this procedure to use it more effectively.

A quick question- is there a way to apply Linear Assignment, if there were 3 or 4 columns? It's like solving, min wgt matching problem on tripartite or qudra-partite graphs? Not sure, if this is even practical. But evetually, I need to extend this technique to 4 columns-- where individual values in those 4 columns should be different. Does that make sense? Basically- the same output needed, with same condition, when applied over 3 or 4 columns.

-Mike

Respected Advisor
Posts: 4,934

Re: PROC SQL or SAS Distinct Clause

Posted in reply to MikeCarter

Linear assignment is only defined between two sets. The clever hash method proposed by can be extended to more sets, but even for two sets it will not always give optimal results. Try for example :

data pairs;

input City1 $ City2 $;

datalines;

A1 B1

A1 B2

A1 B3

A2 B1

;

proc optnet

   graph_direction = directed

   data_links      = pairs;

   data_links_var

      from         = City1

      to           = City2;

   linear_assignment

      out          = uniquePairs(drop=cost);

run;

title "Linear assignment pairs";

proc print data=uniquePairs; run;

data HashPairs;

if _n_ eq 1 then do;

  declare hash h1();

  h1.definekey('city1');

  h1.definedone();

  declare hash h2();

  h2.definekey('city2');

  h2.definedone();

end;

set pairs;

if h1.check() ne 0 and h2.check() ne 0 then do;

h1.replace();h2.replace();output;

end;

run;

title "Hash pairs";

proc print data=HashPairs; run;

PG

PG
Contributor
Posts: 22

Re: PROC SQL or SAS Distinct Clause

@ PGStats, Okay, I see what you mean now. Got it. This makes sense. Important to note, Hash method doesn't always guarantee optimal (max) number of rows, although it can be feasible.

Thanks to you & xia keshan again for your both your ideas, will think on these directions further.

-Mike

Super User
Posts: 10,045

Re: PROC SQL or SAS Distinct Clause

PG,

Hash Table also can get such max(optimal) sub-set .

Code: Program

data have;
input a $ b $;
cards;
A1 B1
A1 B2
A1 B3
A2 B1
;
run;
data _null_;
if 0 then set have;
declare hash h1();
h1.definekey('a');
h1.definedone();

declare hash h2();
h2.definekey('b');
h2.definedone();
max=0;
do i=1 to nobs;
set have nobs=nobs point=i;
n=0;
if h1.check() ne 0 and h2.check() ne 0 then do;
  h1.add();h2.add();n+1;
end;
do j=1 to nobs;
   set have point=j;
   if h1.check() ne 0 and h2.check() ne 0 then do;
   h1.add();h2.add();n+1;
   end;
end;
if n gt max then do;max=n;call symputx('point',i);end;
h1.clear();h2.clear();
end;
stop;
run;



data want;
if 0 then set have;
declare hash h1();
h1.definekey('a');
h1.definedone();

declare hash h2();
h2.definekey('b');
h2.definedone();

point=&point;
set have point=point;
if h1.check() ne 0 and h2.check() ne 0 then do;
  h1.add();h2.add();output;
end;
do j=1 to nobs;
   set have nobs=nobs point=j;
   if h1.check() ne 0 and h2.check() ne 0 then do;
   h1.add();h2.add();output;
   end;
end;
stop;
run;

Xia Keshan

Contributor
Posts: 22

Re: PROC SQL or SAS Distinct Clause

Thanks Xia. A little complicated to understand in first-go, but will study this further. Ability to extract Max subset, for upto 3 or 4 columns, is what's needed. Thanks again for your time,

-Mike

Super User
Posts: 10,045

Re: PROC SQL or SAS Distinct Clause

Posted in reply to MikeCarter

Yeah. That is what you want - optimal(max) number of rows . Code could be shorter.

Code: Program

data have;
input a $ b $;
cards;
A1 B1
A1 B2
A1 B3
A2 B1
;
run;
data _null_;
if 0 then set have;
declare hash h1();
h1.definekey('a');
h1.definedone();

declare hash h2();
h2.definekey('b');
h2.definedone();
max=0;
do i=1 to nobs;
set have nobs=nobs point=i;
n=0;

  h1.add();h2.add();n+1;

do j=1 to nobs;
   set have point=j;
   if h1.check() ne 0 and h2.check() ne 0 then do;
   h1.add();h2.add();n+1;
   end;
end;
if n gt max then do;max=n;call symputx('point',i);end;
h1.clear();h2.clear();
end;
stop;
run;



data want;
if 0 then set have;
declare hash h1();
h1.definekey('a');
h1.definedone();

declare hash h2();
h2.definekey('b');
h2.definedone();

point=&point;
set have point=point;

  h1.add();h2.add();output;

do j=1 to nobs;
   set have nobs=nobs point=j;
   if h1.check() ne 0 and h2.check() ne 0 then do;
   h1.add();h2.add();output;
   end;
end;
stop;
run;
Contributor
Posts: 22

Re: PROC SQL or SAS Distinct Clause

Sure, thanks again. I shall extend this to 3 or 4 columns further. This is really helpful.

-Mike

Solution
‎06-30-2015 10:07 AM
Super User
Posts: 10,045

Re: PROC SQL or SAS Distinct Clause

Posted in reply to MikeCarter

Interesting Question.

Code: Program

data pairs;
input City1 $ City2 $;
datalines;
A1 B1 
A1 B2 
A2 B1
A2 B2
;
run;
data want;
if _n_ eq 1 then do;
  declare hash h1();
  h1.definekey('city1');
  h1.definedone();
  declare hash h2();
  h2.definekey('city2');
  h2.definedone();
end;
set pairs;
if h1.check() ne 0 and h2.check() ne 0 then do;
h1.replace();h2.replace();output;
end;
run;

Xia Keshan

Contributor
Posts: 22

Re: PROC SQL or SAS Distinct Clause

Hi Xia- this worked. Thank you for neat code, never used hash in SAS, will do a bit more reading on this technique. But thank you so much for your help. This worked for me.

Apprecite it,

-Mike

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 922 views
  • 6 likes
  • 6 in conversation