hello
I have two samples tables below : Production_Before and Production_After and data in the table should look the same base on the test i am conducting. I am trying to verify that the data entry is correct.
Product_id | Product _name | Category | Country | City | Market(Million) |
111 | Hewlett Parkard | Computer | USA | New York | 50-60 |
111 | Hewlett Parkard | Computer | USA | San Francisco | >60 |
111 | Hewlett Parkard | Computer | USA | Miami | 20-30 |
111 | Hewlett Parkard | Computer | USA | New Jersey | 40-50 |
111 | Hewlett Parkard | Computer | USA | Houston | < 10% of State Market |
111 | Hewlett Parkard | Computer | GB | Birmingham | < 5% of GB Market |
111 | Hewlett Parkard | Computer | GB | Bradford | >6 |
111 | Hewlett Parkard | Computer | GB | London | ❤️ |
111 | Hewlett Parkard | Computer | GB | Cambridge | <2 |
111 | Hewlett Parkard | Computer | GB | Manchester | >50 |
111 | Hewlett Parkard | Computer | Nigeria | Lagos | <2X |
111 | Hewlett Parkard | Computer | Germany | Berlin | 50-60 |
111 | Hewlett Parkard | Computer | Germany | Hambourg | 40-50 |
111 | Hewlett Parkard | Computer | Germany | Munich | Very concentrated |
111 | Hewlett Parkard | Computer | Germany | Cologne | <20 |
111 | Hewlett Parkard | Computer | Ghana | Accra | Not Meaningful |
111 | Hewlett Parkard | Computer | Tunisia | Tunis | NA |
and i use the code below to compare the entry
proc sql;
create table mywork.comparison as
select a.product_id, a.product_name,
case when a.market=b.market then 'match' else 'no match' end as Market_test
From mywork.Production_after a
left join mywork.Production_before b
where a.Product_id=b.Product_id;
The result of my code is multiplying the value of the fields (City and Market) in Look up like this:
Product_id | Product _name | City | Market(Million) |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
what am i doing wrong? Any help?
thank you
You can try popping a distinct in:
select distinct a.product_id
thanks for replying, here is what im getting by adding distinct :
Product_id | Product _name | City | Market(Million) | Market_test |
111 | Hewlett Parkard | New York | 50-60 | match |
111 | Hewlett Parkard | New York | 50-60 | no match |
111 | Hewlett Parkard | San Francisco | >60 | match |
111 | Hewlett Parkard | San Francisco | >60 | no match |
111 | Hewlett Parkard | Miami | 20-30 | match |
111 | Hewlett Parkard | Miami | 20-30 | no match |
111 | Hewlett Parkard | New Jersey | 40-50 | match |
111 | Hewlett Parkard | New Jersey | 40-50 | no match |
this is what i am expecting :
Product_id | Product _name | City | Market(Million) | Market_test |
111 | Hewlett Parkard | New York | 50-60 | match |
111 | Hewlett Parkard | San Francisco | >60 | match |
111 | Hewlett Parkard | Miami | 20-30 | match |
111 | Hewlett Parkard | New Jersey | 40-50 | match |
111 | Hewlett Parkard | Houston | < 10% of State Market | match |
. | . | . | . | . |
. | . | . | . | . |
. | . | . | . | . |
. | . | . | . | . |
Any idea?
Add at least the city to the join
and a.city=b.city
Though if I'm checking to see if two files should be the same I'd start with Proc Compare, possibly using dataset options to keep the variables of interest though sorting would likely be needed first.
thank you much RW9 and Ballardw. Both suggestions get me the right result ... distinct and adding city to the join
Try this instead:
proc sql;
create table mywork.comparison as
select
a.*,
case when b.product_id is missing
then "No Match"
else "Match" end as Market_test
from
mywork.Production_Before as a natural left join
mywork.Production_After as b;
quit;
PG
PGStats
with your suggestion, i am getting different result meaning i am getting more rows than expected. The table Production_before has 36200 rows and Product_after has 36440.
When i used your code i got 65745 rows in the result data. I will stick with my response from the post 4.
Thanks for the feedback. NATURAL JOIN implies equality between same-named columns. You should be getting NO MATCH flags only for same-named columns that have non identical values. If the value of some same-named columns shouldn't be checked for equality between your two tables, then NATURAL JOIN isn't the proper join to request in your query.
PG
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.