hello
I have two samples tables below : Production_Before and Production_After and data in the table should look the same base on the test i am conducting. I am trying to verify that the data entry is correct.
Product_id | Product _name | Category | Country | City | Market(Million) |
111 | Hewlett Parkard | Computer | USA | New York | 50-60 |
111 | Hewlett Parkard | Computer | USA | San Francisco | >60 |
111 | Hewlett Parkard | Computer | USA | Miami | 20-30 |
111 | Hewlett Parkard | Computer | USA | New Jersey | 40-50 |
111 | Hewlett Parkard | Computer | USA | Houston | < 10% of State Market |
111 | Hewlett Parkard | Computer | GB | Birmingham | < 5% of GB Market |
111 | Hewlett Parkard | Computer | GB | Bradford | >6 |
111 | Hewlett Parkard | Computer | GB | London | ❤️ |
111 | Hewlett Parkard | Computer | GB | Cambridge | <2 |
111 | Hewlett Parkard | Computer | GB | Manchester | >50 |
111 | Hewlett Parkard | Computer | Nigeria | Lagos | <2X |
111 | Hewlett Parkard | Computer | Germany | Berlin | 50-60 |
111 | Hewlett Parkard | Computer | Germany | Hambourg | 40-50 |
111 | Hewlett Parkard | Computer | Germany | Munich | Very concentrated |
111 | Hewlett Parkard | Computer | Germany | Cologne | <20 |
111 | Hewlett Parkard | Computer | Ghana | Accra | Not Meaningful |
111 | Hewlett Parkard | Computer | Tunisia | Tunis | NA |
and i use the code below to compare the entry
proc sql;
create table mywork.comparison as
select a.product_id, a.product_name,
case when a.market=b.market then 'match' else 'no match' end as Market_test
From mywork.Production_after a
left join mywork.Production_before b
where a.Product_id=b.Product_id;
The result of my code is multiplying the value of the fields (City and Market) in Look up like this:
Product_id | Product _name | City | Market(Million) |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | New York | 50-60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
111 | Hewlett Parkard | San Francisco | >60 |
what am i doing wrong? Any help?
thank you
You can try popping a distinct in:
select distinct a.product_id
thanks for replying, here is what im getting by adding distinct :
Product_id | Product _name | City | Market(Million) | Market_test |
111 | Hewlett Parkard | New York | 50-60 | match |
111 | Hewlett Parkard | New York | 50-60 | no match |
111 | Hewlett Parkard | San Francisco | >60 | match |
111 | Hewlett Parkard | San Francisco | >60 | no match |
111 | Hewlett Parkard | Miami | 20-30 | match |
111 | Hewlett Parkard | Miami | 20-30 | no match |
111 | Hewlett Parkard | New Jersey | 40-50 | match |
111 | Hewlett Parkard | New Jersey | 40-50 | no match |
this is what i am expecting :
Product_id | Product _name | City | Market(Million) | Market_test |
111 | Hewlett Parkard | New York | 50-60 | match |
111 | Hewlett Parkard | San Francisco | >60 | match |
111 | Hewlett Parkard | Miami | 20-30 | match |
111 | Hewlett Parkard | New Jersey | 40-50 | match |
111 | Hewlett Parkard | Houston | < 10% of State Market | match |
. | . | . | . | . |
. | . | . | . | . |
. | . | . | . | . |
. | . | . | . | . |
Any idea?
Add at least the city to the join
and a.city=b.city
Though if I'm checking to see if two files should be the same I'd start with Proc Compare, possibly using dataset options to keep the variables of interest though sorting would likely be needed first.
thank you much RW9 and Ballardw. Both suggestions get me the right result ... distinct and adding city to the join
Try this instead:
proc sql;
create table mywork.comparison as
select
a.*,
case when b.product_id is missing
then "No Match"
else "Match" end as Market_test
from
mywork.Production_Before as a natural left join
mywork.Production_After as b;
quit;
PG
PGStats
with your suggestion, i am getting different result meaning i am getting more rows than expected. The table Production_before has 36200 rows and Product_after has 36440.
When i used your code i got 65745 rows in the result data. I will stick with my response from the post 4.
Thanks for the feedback. NATURAL JOIN implies equality between same-named columns. You should be getting NO MATCH flags only for same-named columns that have non identical values. If the value of some same-named columns shouldn't be checked for equality between your two tables, then NATURAL JOIN isn't the proper join to request in your query.
PG
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.