Re: How to not get duplicate results.

mrdlau · Posted 03-24-2018 12:24 AM

I have a spreadsheet attached below of 2 tables that I am trying to join. but I"m having trouble with getting duplicate results.

I'm trying to join left join T2 to T1 (join by "ID"), and return a table that looks like T3....but without the duplicate "costs". I only want "Cost" to be returned once per ID.

I tried the statement below, but it actually wont work for me because during the query process, there's other tables involved and the final result might not include 'Phone 1'. I really just need the cost to be returned once to any row that is available

(Case when phone > 1 then 0 else Cost end)

another option is to use the row_over and partition function, but I'm wondering if there is an easier/simpler logic that I can use instead.

I'd like to stay in proc sql if possible

PGStats · Posted 03-24-2018 12:32 AM

So T3 is not what you want. What do you want that't not in T2?

PG

mrdlau · Posted 03-24-2018 01:03 AM

My columns in T3 should have Phone, ID, and cost....but I only want cost to appear once for each ID.

PGStats · Posted 03-24-2018 01:12 AM

I would try this:

proc sql;
create table T3 as
select
    a.ID,
    a.Phones,
    t2.cost
from 
    (select ID, max(Phones) as Phones from T1 group by ID) as a left join 
    T2 on a.ID = T2.ID;
quit;

(untested)

PG

art297 · Posted 03-24-2018 12:54 AM

Couldn't you use:

           case
             when phones gt min(phones) 0
             else cost
           end
         group by id

Art, CEO, AnalystFinder.com

art297 · Posted 03-24-2018 10:07 AM

I think you are looking for:

proc sql;
  create table t3 as
    select a.ID,a.Phones,
           case
             when phones gt min(phones) then 0
             else b.cost
           end as Cost
      from t1 a left join t2 b
        on a.ID=b.ID
          group by a.ID
            order by a.ID,a.Phones
  ;
quit;

Art, CEO, AnalystFinder.com

mrdlau · Posted 03-24-2018 03:35 PM

Thanks. This works for me.....but now that I apply that to my code, I'm wondering if I can use data step just for that section. My query is long and if I use min/max in my proc sql;, I need to do a huge group by.

that's fine, but I think a separate data step just to remove the duplicates might be a bit cleaner. Is it possible?

Here's what I had, but I have, but it's not working because it's evaluating it row by row, rather than each group. I,e I'm getting the actual cost on each row

DATA WORK.T3;
SET WORK.T3;
IF Phones > MIN(phones) THEN cost 0; ELSE cost = cost;
RUN;

I even tried adding a "by ID" but it didn't work.

art297 · Posted 03-24-2018 04:31 PM

If your current table3 has the first record for each ID having the lowest phone value, then I'd use a data step, using ID as the by variable, and use:

IF not first.ID THEN cost=0;

Art, CEO, AnalystFinder.com

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away