DATA Step, Macro, Functions and more

How to not get duplicate results.

Reply
Contributor
Posts: 37

How to not get duplicate results.

[ Edited ]

I have a spreadsheet attached below of 2 tables that I am trying to join. but I"m having trouble with getting duplicate results.

 

I'm trying to join left join T2 to T1 (join by "ID"), and return a table that looks like T3....but without the duplicate "costs".  I only want "Cost" to be returned once per ID.

 

I tried the statement below, but  it actually wont work for me because during the query process, there's other tables involved and the final result might not include 'Phone 1'.  I really just need the cost to be returned once to any row that is available

(Case when phone > 1 then 0 else Cost end) 

another option is to use the row_over and partition function, but I'm wondering if there is an easier/simpler logic that I can use instead.

 

I'd like to stay in proc sql if possible

Esteemed Advisor
Posts: 5,399

Re: How to not get duplicate results.

So T3 is not what you want. What do you want that't not in T2?

PG
Contributor
Posts: 37

Re: How to not get duplicate results.

My columns in T3 should have Phone, ID, and cost....but I only want cost to appear once for each ID.  

Esteemed Advisor
Posts: 5,399

Re: How to not get duplicate results.

I would try this:

 

proc sql;
create table T3 as
select
    a.ID,
    a.Phones,
    t2.cost
from 
    (select ID, max(Phones) as Phones from T1 group by ID) as a left join 
    T2 on a.ID = T2.ID;
quit;

(untested)

 

PG
PROC Star
Posts: 8,114

Re: How to not get duplicate results.

[ Edited ]

Couldn't you use:

 

           case
             when phones gt min(phones) 0
             else cost
end
group by id

Art, CEO, AnalystFinder.com

 

PROC Star
Posts: 8,114

Re: How to not get duplicate results.

I think you are looking for:

proc sql;
  create table t3 as
    select a.ID,a.Phones,
           case
             when phones gt min(phones) then 0
             else b.cost
           end as Cost
      from t1 a left join t2 b
        on a.ID=b.ID
          group by a.ID
            order by a.ID,a.Phones
  ;
quit;

Art, CEO, AnalystFinder.com

 

 

Contributor
Posts: 37

Re: How to not get duplicate results.

[ Edited ]

Thanks.  This works for me.....but now that I apply that to my code, I'm wondering if I can use data step just for that section.  My query is long and if I use min/max in my proc sql;, I need to do a huge group by. 

 

that's fine, but I think a separate data step just to remove the duplicates might be a bit cleaner.  Is it possible?

 

Here's what I had, but I have, but it's not working because it's evaluating it row by row, rather than each group.  I,e I'm getting the actual cost on each row

 

DATA WORK.T3;
SET WORK.T3;
IF Phones > MIN(phones) THEN cost 0; ELSE cost = cost;
RUN;

 I even tried adding a "by ID" but it didn't work. 

PROC Star
Posts: 8,114

Re: How to not get duplicate results.

If your current table3 has the first record for each ID having the lowest phone value, then I'd use a data step, using ID as the by variable, and use:

 

IF not first.ID THEN cost=0;

Art, CEO, AnalystFinder.com

 

Ask a Question
Discussion stats
  • 7 replies
  • 167 views
  • 0 likes
  • 3 in conversation