BookmarkSubscribeRSS Feed
Srigyan
Quartz | Level 8

Note: I am working on a very huge dataset, records are in million so it is helpful if you can help with optimised query. May be not proc sql.

 

I need to compare two tables and create a table with the column where there is difference or new record inserted.

i.e. we have table 1

ProductDivisionDepartmentCategory
p1d1dep1cat1
p2d2dep2cat2
p3d3dep3cat3
p4d4dep4cat4
p5d5dep5cat5
data Table1;

input product $ Division $ Department $ Category $;
datalines;
p1 d1 dep1 cat1
p2 d2 dep2 cat2
p3 d3 dep3 cat3
p4 d4 dep4 cat4
p5 d5 dep5 cat5
;
run;

 

and another table 2, highlighted is the difference from table 1

ProductDivisionDepartmentCategory
p1d1dep1cat1
p2d22dep2cat2
p3d3dep3cat3
p4d4dep44cat4
p5d5dep5cat55
p6d6dep6cat56

 

data Table2;

input product $ Division $ Department $ Category $;
datalines;
p1 d1 dep1 cat1
p2 d22 dep2 cat2
p3 d3 dep3 cat3
p4 d4 dep44 cat4
p5 d5 dep5 cat55
p6 d6 dep6 cat56
;
run;

we can see there is change (highlighted in red)

 

output should be the only changed record and new records

 

ProductDivisionDepartmentCategoryStatus
p2d22dep2cat2U
p4d4dep44cat4U
p5d5dep5cat55U
p6d6dep6cat56I

 

data output;

input product $ Division $ Department $ Category $ Status $;
datalines;
p2 d22 dep2 cat2 U
p4 d4 dep44 cat4 U
p5 d5 dep5 cat55 U
p6 d6 dep6 cat56 I
;
run;

Please help...

6 REPLIES 6
ghosh
Barite | Level 11

Please show us what you have tried.  

Proc compare has many options to show differences, that could be a starting point.

 

Srigyan
Quartz | Level 8
proc compare base=Table2 compare=table1 out=output outbase outnoequal   ;
by product;
run;

This gives output fine but this gives the entire line, I am interested only in specific value according to the ID value. Here ID is the product.

ghosh
Barite | Level 11

You were  close but missing one of the records from the base table.

Just use IS instead of BY which gives you exactly what you have in your required output table. 

 

Not sure what you mean by "I am interested only in specific value according to the ID value."

 

Please take a look at this paper on Proc compare for more info on the ID statement

http://support.sas.com/resources/papers/proceedings10/149-2010.pdf

 

proc compare base=Table2 compare=table1 out=output   outbase outnoequal   ;
id product;
run;

Capture.JPG

 

 

novinosrin
Tourmaline | Level 20

Hi @Srigyan 

data Table1;

input product $ Division $ Department $ Category $;
datalines;
p1 d1 dep1 cat1
p2 d2 dep2 cat2
p3 d3 dep3 cat3
p4 d4 dep4 cat4
p5 d5 dep5 cat5
;
run;

data Table2;

input product $ Division $ Department $ Category $;
datalines;
p1 d1 dep1 cat1
p2 d22 dep2 cat2
p3 d3 dep3 cat3
p4 d4 dep44 cat4
p5 d5 dep5 cat55
p6 d6 dep6 cat56
;
run;

proc sql;
create table want as
(select *,'U' as status
from 
(select *
from table2 
where product in (select product from table1)
except 
select * from table1))
union 
(select *,'I' as status
from 
(select *
from table2
where product not in (select product from table1)));
quit;
Srigyan
Quartz | Level 8

Working fine now but I have two question

 

1) what this code is exactly doing, I see it giving me my answer but I could not understand the logic.

select * from 
(select * from table2 
where product in 
(select product from table1)
except 
select * from table1);

2) My data is million in numbers and so is it better not to use proc sql or use any different method.

 

novinosrin
Tourmaline | Level 20

Sure I agree that Proc sql may not be so robust if you dataset is humongous running into 100 million records or beyond. However , a few millions, I am pretty certain proc sql can comfortably handle. 

 

The logic is all using SET operators. All you need to do is type sql set operators, you will find plethora of info.

 

An advantage of sql sometimes is

1. You can push the query in-database level. So once you are connected to DB through sas access pass through, this sql is very advantageous

2. Proc compare is good but sql gives better control.

3. A datastep will likely require renaming and I was rather lazy in attempting.

 

Having said that, Feel free to wait for geniuses like Tom, PGstats, John King Datanull, Ballardw , Xia Keshan Ksharp , Reinhard, Patrick etc to chime in with the solutions you prefer. If you can wait , you should be fine. I am awfully lazy rather for some weird reason they are always full of energy.

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 676 views
  • 2 likes
  • 3 in conversation