BookmarkSubscribeRSS Feed
Armand
Calcite | Level 5

hello

I have two samples tables below : Production_Before and Production_After and data in the table should look the same base on the test i am conducting. I am trying to verify that the data entry is correct.

Product_idProduct _nameCategoryCountryCityMarket(Million)
111Hewlett ParkardComputerUSANew York50-60
111Hewlett ParkardComputerUSASan Francisco>60
111Hewlett ParkardComputerUSAMiami20-30
111Hewlett ParkardComputerUSANew Jersey40-50
111Hewlett ParkardComputerUSAHouston< 10% of State Market
111Hewlett ParkardComputerGBBirmingham< 5% of GB Market
111Hewlett ParkardComputerGBBradford>6
111Hewlett ParkardComputerGBLondon❤️
111Hewlett ParkardComputerGBCambridge<2
111Hewlett ParkardComputerGBManchester>50
111Hewlett ParkardComputerNigeriaLagos<2X
111Hewlett ParkardComputerGermanyBerlin50-60
111Hewlett ParkardComputerGermanyHambourg40-50
111Hewlett ParkardComputerGermanyMunichVery concentrated
111Hewlett ParkardComputerGermanyCologne<20
111Hewlett ParkardComputerGhanaAccraNot Meaningful
111Hewlett ParkardComputerTunisiaTunisNA

and i use the code below to compare the entry

proc sql;

create table mywork.comparison as

select a.product_id, a.product_name,

           case when a.market=b.market then 'match' else 'no match' end as Market_test

From mywork.Production_after a

    left join mywork.Production_before b

where a.Product_id=b.Product_id;

The result of my code is multiplying the value of the fields (City and Market) in Look up like this:

Product_idProduct _nameCityMarket(Million)
111Hewlett ParkardNew York50-60
111Hewlett ParkardNew York50-60
111Hewlett ParkardNew York50-60
111Hewlett ParkardNew York50-60
111Hewlett ParkardNew York50-60
111Hewlett ParkardNew York50-60
111Hewlett ParkardNew York50-60
111Hewlett ParkardNew York50-60
111Hewlett ParkardNew York50-60
111Hewlett ParkardNew York50-60
111Hewlett ParkardNew York50-60
111Hewlett ParkardNew York50-60
111Hewlett ParkardNew York50-60
111Hewlett ParkardNew York50-60
111Hewlett ParkardNew York50-60
111Hewlett ParkardNew York50-60
111Hewlett ParkardNew York50-60
111Hewlett ParkardSan Francisco>60
111Hewlett ParkardSan Francisco>60
111Hewlett ParkardSan Francisco>60
111Hewlett ParkardSan Francisco>60
111Hewlett ParkardSan Francisco>60
111Hewlett ParkardSan Francisco>60
111Hewlett ParkardSan Francisco>60
111Hewlett ParkardSan Francisco>60
111Hewlett ParkardSan Francisco>60
111Hewlett ParkardSan Francisco>60
111Hewlett ParkardSan Francisco>60
111Hewlett ParkardSan Francisco>60
111Hewlett ParkardSan Francisco>60
111Hewlett ParkardSan Francisco>60
111Hewlett ParkardSan Francisco>60
111Hewlett ParkardSan Francisco>60
111Hewlett ParkardSan Francisco>60

what am i doing wrong?  Any help?

thank you

7 REPLIES 7
RW9
Diamond | Level 26 RW9
Diamond | Level 26

You can try popping a distinct in:

select distinct a.product_id

Armand
Calcite | Level 5

thanks for replying, here is what im getting by adding distinct :

Product_idProduct _nameCityMarket(Million)Market_test
111Hewlett ParkardNew York50-60match
111Hewlett ParkardNew York50-60no match
111Hewlett ParkardSan Francisco>60match
111Hewlett ParkardSan Francisco>60no match
111Hewlett ParkardMiami20-30match
111Hewlett ParkardMiami20-30no match
111Hewlett ParkardNew Jersey40-50match
111Hewlett ParkardNew Jersey40-50no match


this is what i am expecting :

Product_idProduct _nameCityMarket(Million)Market_test
111Hewlett ParkardNew York50-60match
111Hewlett ParkardSan Francisco>60match
111Hewlett ParkardMiami20-30match
111Hewlett ParkardNew Jersey40-50match
111Hewlett ParkardHouston< 10% of State Marketmatch
.....
.....
.....
.....

Any idea?

ballardw
Super User

Add at least the city to the join

and a.city=b.city

Though if I'm checking to see if two files should be the same I'd start with Proc Compare, possibly using dataset options to keep the variables of interest though sorting would likely be needed first.

Armand
Calcite | Level 5


thank you much RW9 and Ballardw. Both suggestions get me the right result ... distinct and adding city to the join

PGStats
Opal | Level 21

Try this instead:

proc sql;

create table mywork.comparison as

select

     a.*,

     case when b.product_id is missing

          then "No Match"

          else "Match" end as Market_test

from

     mywork.Production_Before as a natural left join

     mywork.Production_After as b;

quit;

PG

PG
Armand
Calcite | Level 5

PGStats

with your suggestion, i am getting different result meaning i am getting more rows than expected. The table Production_before has 36200 rows and Product_after has 36440.

When i used your code i got 65745 rows in the result data. I will stick with my response from the post 4.

PGStats
Opal | Level 21

Thanks for the feedback. NATURAL JOIN implies equality between same-named columns. You should be getting NO MATCH flags only for same-named columns that have non identical values. If the value of some same-named columns shouldn't be checked for equality between your two tables, then NATURAL JOIN isn't the proper join to request in your query.

PG

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 612 views
  • 6 likes
  • 4 in conversation