BookmarkSubscribeRSS Feed
mrdlau
Obsidian | Level 7

I have a spreadsheet attached below of 2 tables that I am trying to join. but I"m having trouble with getting duplicate results.

 

I'm trying to join left join T2 to T1 (join by "ID"), and return a table that looks like T3....but without the duplicate "costs".  I only want "Cost" to be returned once per ID.

 

I tried the statement below, but  it actually wont work for me because during the query process, there's other tables involved and the final result might not include 'Phone 1'.  I really just need the cost to be returned once to any row that is available

(Case when phone > 1 then 0 else Cost end) 

another option is to use the row_over and partition function, but I'm wondering if there is an easier/simpler logic that I can use instead.

 

I'd like to stay in proc sql if possible

7 REPLIES 7
PGStats
Opal | Level 21

So T3 is not what you want. What do you want that't not in T2?

PG
mrdlau
Obsidian | Level 7

My columns in T3 should have Phone, ID, and cost....but I only want cost to appear once for each ID.  

PGStats
Opal | Level 21

I would try this:

 

proc sql;
create table T3 as
select
    a.ID,
    a.Phones,
    t2.cost
from 
    (select ID, max(Phones) as Phones from T1 group by ID) as a left join 
    T2 on a.ID = T2.ID;
quit;

(untested)

 

PG
art297
Opal | Level 21

Couldn't you use:

 

           case
             when phones gt min(phones) 0
             else cost
end
group by id

Art, CEO, AnalystFinder.com

 

art297
Opal | Level 21

I think you are looking for:

proc sql;
  create table t3 as
    select a.ID,a.Phones,
           case
             when phones gt min(phones) then 0
             else b.cost
           end as Cost
      from t1 a left join t2 b
        on a.ID=b.ID
          group by a.ID
            order by a.ID,a.Phones
  ;
quit;

Art, CEO, AnalystFinder.com

 

 

mrdlau
Obsidian | Level 7

Thanks.  This works for me.....but now that I apply that to my code, I'm wondering if I can use data step just for that section.  My query is long and if I use min/max in my proc sql;, I need to do a huge group by. 

 

that's fine, but I think a separate data step just to remove the duplicates might be a bit cleaner.  Is it possible?

 

Here's what I had, but I have, but it's not working because it's evaluating it row by row, rather than each group.  I,e I'm getting the actual cost on each row

 

DATA WORK.T3;
SET WORK.T3;
IF Phones > MIN(phones) THEN cost 0; ELSE cost = cost;
RUN;

 I even tried adding a "by ID" but it didn't work. 

art297
Opal | Level 21

If your current table3 has the first record for each ID having the lowest phone value, then I'd use a data step, using ID as the by variable, and use:

 

IF not first.ID THEN cost=0;

Art, CEO, AnalystFinder.com

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 885 views
  • 0 likes
  • 3 in conversation