I have a spreadsheet attached below of 2 tables that I am trying to join. but I"m having trouble with getting duplicate results.
I'm trying to join left join T2 to T1 (join by "ID"), and return a table that looks like T3....but without the duplicate "costs". I only want "Cost" to be returned once per ID.
I tried the statement below, but it actually wont work for me because during the query process, there's other tables involved and the final result might not include 'Phone 1'. I really just need the cost to be returned once to any row that is available
(Case when phone > 1 then 0 else Cost end)
another option is to use the row_over and partition function, but I'm wondering if there is an easier/simpler logic that I can use instead.
I'd like to stay in proc sql if possible
So T3 is not what you want. What do you want that't not in T2?
My columns in T3 should have Phone, ID, and cost....but I only want cost to appear once for each ID.
I would try this:
proc sql;
create table T3 as
select
a.ID,
a.Phones,
t2.cost
from
(select ID, max(Phones) as Phones from T1 group by ID) as a left join
T2 on a.ID = T2.ID;
quit;
(untested)
Couldn't you use:
case when phones gt min(phones) 0 else cost
end
group by id
Art, CEO, AnalystFinder.com
I think you are looking for:
proc sql; create table t3 as select a.ID,a.Phones, case when phones gt min(phones) then 0 else b.cost end as Cost from t1 a left join t2 b on a.ID=b.ID group by a.ID order by a.ID,a.Phones ; quit;
Art, CEO, AnalystFinder.com
Thanks. This works for me.....but now that I apply that to my code, I'm wondering if I can use data step just for that section. My query is long and if I use min/max in my proc sql;, I need to do a huge group by.
that's fine, but I think a separate data step just to remove the duplicates might be a bit cleaner. Is it possible?
Here's what I had, but I have, but it's not working because it's evaluating it row by row, rather than each group. I,e I'm getting the actual cost on each row
DATA WORK.T3;
SET WORK.T3;
IF Phones > MIN(phones) THEN cost 0; ELSE cost = cost;
RUN;
I even tried adding a "by ID" but it didn't work.
If your current table3 has the first record for each ID having the lowest phone value, then I'd use a data step, using ID as the by variable, and use:
IF not first.ID THEN cost=0;
Art, CEO, AnalystFinder.com
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.