- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Simple question probably, but I keep arranging and rearranging distinct and group by in the code and I'm not getting anything different.
I have two tables that share an ID between the two of them. Let's say table a has revenue data associated with each ID and table two has an instance where two titles are associated with the same ID. I use a left join of table an and ask for a distinct ID, and even group by, but I keep getting two row back for that ID, one for each name from table
proc sql;
create table tableau.metadata_match as
select
distinct a.product_id,
a.region,
b.product_name,
a.transactions,
a.revenue
from table a
left join
table b
on a.product_id = b.product_id;group by 1, 2
quit;
It returns these rows repeating the revenue. What I really want is only row for this.
product_id | region | product_name | transactions | revenue |
54321 | SW | product_a | 12 | 56.11 |
54321 | SW | product_b | 12 | 56.11 |
sd
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I see few problem with the sql query.
You should remove group by statement becuase I do not find any aggregate function being used in the select clause.
Then try again.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Which match in the second table should be selected?
Post some example input-datasets as data-steps, so that we see what you have.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Andreas,
I posted fake data to show an example of the scenario and what output I'm receiving. However, in the real scenario, If I need to specify just one of these, how do I do it by choosing either randomly or based off of occurrence using a date field (first occurrence, most recent occurrence)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The problem is that you still have product in your query and there are 2 different values which gives you 2 rows of data.
If you remove it you will get 1 row.
But if you still need product in your result you need to descide which one to keep, as Andreas says 🙂
//Fredrik
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If I need to specify just one of these, how do I do it by choosing either randomly or based off of occurrence using a date field (first occurrence, most recent occurrence)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think that you may be misunderstanding what the DISTINCT actually attempts to do. It gets distinct combinations of ALL variables on the select statement not just the first variable.