Hi,
Kindly help me to solve below issue. Please provide SAS code as well.
We have a master table in ABC library with all table structures that are available in work libarary.
Master Table Structure
Table_Name | Column_Name | Key |
Customer | Cust_ID | Y |
Customer | Cust_Type | Y |
Product | Product_ID | Y |
Product | Cust_ID | Y |
Offer | Offer_ID | Y |
Tables in work library are
Table1: Customer
Cust_ID | Customer_Name | Cust_Type | Cust_Area |
101 | Amit | Retail | Munbai |
102 | Rahul | corporate | Hyd |
10300 | Sachin | Retail | Dheli |
104 | Dhoni | Retail | Pune |
105 | Virat | corporate | Hyd |
106 | Rohit | Retail | Mumbai |
107 | James | corporate | Chennai |
108 | Kartik | Retail | kerala |
109000 | Priyanka | Retail | Mumbai |
110 | Maneesha | Retail | Mumbai |
Table2: Customer_BKP
Cust_ID | Customer_Name | Cust_Type | Cust_Area |
101 | Amit | Retail | Munbai |
102 | Rahul | corporate | Hyd |
103 | Sachin | Retail | Dheli |
104 | Dhoni | Retail | Pune |
105 | Virat | Corporate | Hyd |
106 | Rohit | Retail | Mumbai |
107 | James | corporate_YY | Chennai |
108 | Kartik | Retail | kerala |
109 | Priyanka | Retail | Mumbai |
110 | Maneesha | Retail | Mumbai |
Table3: Product
Product_ID | Cust_ID | Product_Name |
AB1 | 101 | Gold |
AB2 | 102 | Platinum |
AB3 | 103 | Gold |
AB4 | 104 | Platinum |
AB5 | 105 | Platinum |
AB6 | 106 | Gold |
AB7 | 107 | Platinum |
AB8 | 108 | Platinum |
AB9 | 109 | Gold |
AB10 | 110 | Platinum |
AB11 | 111 | Silver |
Table4: Product_Bkp
Product_ID | Cust_ID | Product_Name |
AB1 | 101 | Gold |
AB2 | 102 | Platinum |
AB3 | 103 | Gold |
AB4 | 104 | Platinum |
AB5 | 105 | Platinum |
AB6 | 106 | Gold |
AB7 | 107 | Platinum |
AB800 | 108 | Platinum |
AB9 | 109 | Gold |
AB10 | 110 | Platinum |
Table5: Offer
Offer_ID | Offer_Status |
501 | Y |
502 | Y |
503 | Y |
504 | N |
505 | Y |
Table6: Offer_Bkp
Offer_ID | Offer_Status |
501 | Y |
502 | Y |
503 | Y |
Requirement:
Compare Customer with Customer_bkp, Product with Product_Bkp & Offer with Offer_Bkp tables with key columns that are available in Master table in ABC library.
Example: for Customer table we have total 4 columns, but key columns are two in Master table: those are Cust_ID & Cust_Type. Now you have to compare Customer. Cust_ID with
Customer_Bkp.Cust_ID and Customer. Cust_Type with Customer_Bkp.Cust_Type. Finally we need to calculate the deviation.
Similarly it should look for all the tables and calculate the deviation at each key level for a particular table.
Today we have total 3 tables to compare. Next week tables may increase or decrease. Irrespective table count it should automatically loop and calculate the deviation and produce final output. Table names always match with master table.
Final output should be
Table_Name | Column_Name | Deviation |
Customer | Cust_ID | 20% |
Customer | Cust_Type | 10% |
Product | Product_ID | 20% |
Product | Cust_ID | 10% |
Offer | Offer_ID | 20% |
Once you compare and calculate the deviation. Now we need collect unmatched records.
Output
Table_Name | Column_Name | Values |
Customer | Cust_ID | 10300 |
Customer | Cust_ID | 109000 |
Customer | Cust_Type | CIC_YY |
Product | Product_ID | AB11 |
Product | Product_ID | AB800 |
Product | Cust_ID | 111 |
Offer | Offer_ID | 504 |
Offer | Offer_ID | 505 |
Thanks,
RJ
Please show the code you already have, so we can point out any mistakes.
That looks like a more complicated problem. I’m assuming your definition of deviation isn’t standard deviation since there’s no numbers here so not sure exactly how that would be calculated.
This sounds like a work, interview or homeworkassignment that does require a bit of work.
We're happy to help, not happy to do your work. If you’d like help please post what you’ve attempted so far and we’d be happy to help.
@RJY wrote:
Hi,
Kindly help me to solve below issue. Please provide SAS code as well.
We have a master table in ABC library with all table structures that are available in work libarary.
Master Table Structure
Table_Name
Column_Name
Key
Customer
Cust_ID
Y
Customer
Cust_Type
Y
Product
Product_ID
Y
Product
Cust_ID
Y
Offer
Offer_ID
Y
Tables in work library are
Table1: Customer
Cust_ID
Customer_Name
Cust_Type
Cust_Area
101
Amit
Retail
Munbai
102
Rahul
corporate
Hyd
10300
Sachin
Retail
Dheli
104
Dhoni
Retail
Pune
105
Virat
corporate
Hyd
106
Rohit
Retail
Mumbai
107
James
corporate
Chennai
108
Kartik
Retail
kerala
109000
Priyanka
Retail
Mumbai
110
Maneesha
Retail
Mumbai
Table2: Customer_BKP
Cust_ID
Customer_Name
Cust_Type
Cust_Area
101
Amit
Retail
Munbai
102
Rahul
corporate
Hyd
103
Sachin
Retail
Dheli
104
Dhoni
Retail
Pune
105
Virat
Corporate
Hyd
106
Rohit
Retail
Mumbai
107
James
corporate_YY
Chennai
108
Kartik
Retail
kerala
109
Priyanka
Retail
Mumbai
110
Maneesha
Retail
Mumbai
Table3: Product
Product_ID
Cust_ID
Product_Name
AB1
101
Gold
AB2
102
Platinum
AB3
103
Gold
AB4
104
Platinum
AB5
105
Platinum
AB6
106
Gold
AB7
107
Platinum
AB8
108
Platinum
AB9
109
Gold
AB10
110
Platinum
AB11
111
Silver
Table4: Product_Bkp
Product_ID
Cust_ID
Product_Name
AB1
101
Gold
AB2
102
Platinum
AB3
103
Gold
AB4
104
Platinum
AB5
105
Platinum
AB6
106
Gold
AB7
107
Platinum
AB800
108
Platinum
AB9
109
Gold
AB10
110
Platinum
Table5: Offer
Offer_ID
Offer_Status
501
Y
502
Y
503
Y
504
N
505
Y
Table6: Offer_Bkp
Offer_ID
Offer_Status
501
Y
502
Y
503
Y
Requirement:
Compare Customer with Customer_bkp, Product with Product_Bkp & Offer with Offer_Bkp tables with key columns that are available in Master table in ABC library.
Example: for Customer table we have total 4 columns, but key columns are two in Master table: those are Cust_ID & Cust_Type. Now you have to compare Customer. Cust_ID with
Customer_Bkp.Cust_ID and Customer. Cust_Type with Customer_Bkp.Cust_Type. Finally we need to calculate the deviation.
Similarly it should look for all the tables and calculate the deviation at each key level for a particular table.
Today we have total 3 tables to compare. Next week tables may increase or decrease. Irrespective table count it should automatically loop and calculate the deviation and produce final output. Table names always match with master table.
Final output should be
Table_Name
Column_Name
Deviation
Customer
Cust_ID
20%
Customer
Cust_Type
10%
Product
Product_ID
20%
Product
Cust_ID
10%
Offer
Offer_ID
20%
Once you compare and calculate the deviation. Now we need collect unmatched records.
Output
Table_Name
Column_Name
Values
Customer
Cust_ID
10300
Customer
Cust_ID
109000
Customer
Cust_Type
CIC_YY
Product
Product_ID
AB11
Product
Product_ID
AB800
Product
Cust_ID
111
Offer
Offer_ID
504
Offer
Offer_ID
505
Thanks,
RJ
"That looks like a more complicated problem. I’m assuming your definition of deviation isn’t standard deviation since there’s no numbers here so not sure exactly how that would be calculated. "
Good evening @Reeza My understanding of the final output is
Final output should be
Table_Name | Column_Name | Deviation |
Customer | Cust_ID | 20% |
Customer | Cust_Type | 10% |
OP apparently wants those 2 unique ids of Customer that doesn't match Customer_bkup divided by the 10 records. Deviation=2/10 in pct =20% for Custid and likewise 1 custtype in bkup doesn't match the custtype in customer. So deviation for custtype =1/10 in pct=10%.
The above process is to repeat for various comparisons by 1st looking to extract matching keys from master, so i guess OP has titled SAS macro loop. If my understanding is right, it is extremely simple as it is merely a look up sql, temp array, hash or whatever. But i am too lazy and not keen on this question and after having had a Burger, and fries makes me all the more sleepy to type.
I deem this as some kind of interview/home work too. lol
This is real time problem. We are importing manual files every month comparing with last month's dataset. There we are facing a problem. It is bit urgent.
@novinosrin: you are right, need to caluclate the deviation "no. of unmatched records/total records"
-- Initially we are focusing only to calculate the deviation. Based on deviation % we can take a step whether to proceed or not. Ex: If deviation more than 20% we stop the processing other wise we can proceed further.
-- since these are manual files records are not coming properly.
Thanks,
RJ
Below is the sample code. My calculations are going wrong. I am still trying from my side.
DATA Master;
length Table_Name Column_Name $10;
INPUT Table_Name $ Column_Name $ Key $;
cards;
Customer Cust_ID Y
Customer Cust_Type Y
Product Product_ID Y
Product Cust_ID Y
Offer Offer_ID Y
;
run;
DATA Customer;
length Cust_ID Customer_Name Cust_Type Cust_Area $10;
INPUT Cust_ID $ Customer_Name $ Cust_Type $ Cust_Area $;
cards;
101 Amit Retail Munbai
102 Rahul CIC Hyd
10300 Sachin Retail Dheli
104 Dhoni Retail Pune
105 Virat CIC Hyd
106 Rohit Retail Mumbai
107 James CIC Chennai
108 Kartik Retail kerala
109000 Priyanka Retail Mumbai
110 Maneesha Retail Mumbai
;
run;
DATA Customer_BKP;
length Cust_ID Customer_Name Cust_Type Cust_Area $10;
INPUT Cust_ID $ Customer_Name $ Cust_Type $ Cust_Area $;
cards;
101 Amit Retail Munbai
102 Rahul CIC Hyd
103 Sachin Retail Dheli
104 Dhoni Retail Pune
105 Virat CIC Hyd
106 Rohit Retail Mumbai
107 James CIC_YY Chennai
108 Kartik Retail kerala
109 Priyanka Retail Mumbai
110 Maneesha Retail Mumbai
;
run;
options symbolgen;
proc sql;
select count(table_name) into: cnt from master;
select distinct table_name into: tab1-:tab%eval(&cnt.) from master ;
quit;
proc sql;
select table_name, column_name into: table_name, :column_name from master;
quit;
%put &table_name &column_name;
proc sql;
select table_name, column_name into: table_name_bkp, :column_name_bkp from master;
quit;
%put &table_name_bkp &column_name_bk;
*** still working on it**;
%macro loop();
data total;delete;run;
%do i = 1 %to &cnt.;
proc sql noprint;
select distinct column_name into: cols separated by "," from master where key="Y" and Table_Name="&&tab&i.." ;
create table Mds as select *, count(*) as key1 from ( select distinct &cols from &&tab&i..);
create table bkp as select*, count(*) as key2 from ( select distinct &cols1 from &&tab&i.._bkp);
create table per as select &&&cols&i.. as column_name, "&&tab&i.." as table_name, a.key1,b.key2, a.key1/b.key2 as percentage from mds as a , bkp as b;
/* drop table mds;*/
/* drop table bkp;*/
quit;
data total;
set total per;
run;
%put &cols;
%end;
%mend;
%loop();
**Since above code is not working properly started working with below**';
proc sql;
create table xyz as
select &column_name as name, count(*) as count from
(select cust_id from &table_name
except
select cust_id from &table_name_bkp);
quit;
%put &cont;
Before you can write code you need finish defining the problem. It particular what it is definition of "deviation".
All you have specified is what the key variables are, but not how to compare the two datasets. Are you looking for records that match on key variables but differ on one or more non-key variables? Insertions? Deletions? If you want to calculate a percent deviation what it your numerator? What is your denominator?
First figure out how to do it for one table before attempting to convert it to a macro.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.