What is the best approach for regression testing in SAS?

Accepted Solution Solved
Reply
Contributor
Posts: 61
Accepted Solution

What is the best approach for regression testing in SAS?

Hi,

 

I have more than 40 tables (datasets), each table contains many columns (variables). Every table contains a primary key (let's call it as "CR_ID").

 

Each variable in each dataset will contain some known values (ie: constant), but table's primary key value will change every time source system has a run. Not all variables in dataset are known values. Every datasets in common is its primary key are the same, but every time its value will change when run the scripts.

 

What is the best approach to write SAS regression test code to verify these constant values are stil unchange every time source system has a new deployment?

 

Summary my problem :

 

int CR_ID : x  

constant: a1_1, a1_2,..., a1_N;

constant: a2_1; a2_2, ..., a2_N;

...

Constant: aM_1, aM_2, ..., aM_N;

 

 

<DataSet1>: CR_ID= x; variable1_1 = <a1_1> ; variable1_2=<a1_2>; .... variable1_N=<a1_N> ; variable1_X, variable1_Y are unknown;

 

<DataSet2>: CR_ID=x; variable2_1= <a2_1>; variable2_2=<a2_2>; ...variable2_N=<a2_N>; variable2_X; variable2_Y are unknown;

 

...

 

<DataSetM>: CR_ID=x; variableM_1=<aM_1>; variableM_2=<aM_2> .... variableM_N=<aM_N>, variableM_X; variableM_Y are unknown;

 

I would like to write SAS code to verify that all constants remain unchange. Each run, only the primay key CR_ID value is changed.

Please note: <DataSet1>, <DataSet2>... and <DataSetM> name are unchange.

 

What is the best way to tackle this problem?

 

Thanks!

Nancy

 


Accepted Solutions
Solution
‎07-11-2016 12:01 AM
Super User
Posts: 5,380

Re: What is the best approach for regression testing in SAS?

Proc compare will tell that there are changes. But I can't see how you could use that output in a structure and automated way without a consistent key.
Data never sleeps

View solution in original post


All Replies
Respected Advisor
Posts: 4,804

Re: What is the best approach for regression testing in SAS?

Not sure I get it... If the old version of your datasets is in library oldLib and the new version in newLib, you could run a series of proc compare and then assemble the stats:

 

proc compare 
	base=oldLib.DataSet1(drop=CR_ID) 
	compare=newLib.DataSet1(drop=CR_ID)
	outstats=compareDataSet1
	noprint;
run;

proc compare 
	base=oldLib.DataSet2(drop=CR_ID) 
	compare=newLib.DataSet2(drop=CR_ID)
	outstats=compareDataSet2
	noprint;
run;

/* ... and so on ... */

data compareAll;
set compareDataset:;
run;
PG
Contributor
Posts: 61

Re: What is the best approach for regression testing in SAS?

Hi PGStats

 

I have make up some data for illustrate the problem 

 

Base - Employee_Name
CR_ID  Gender  First_Name Last_Name  Salary
12305   M              Tom             Zhang             10000 

 

  - Employee_Hire
CR_ID   Role               Birth_Date      Hire_Date
12305   Sales Rep.    12JUL1972   12/04/2010
 


After the run - Employee_Name
CR_ID  Gender  First_Name Last_Name  Salary
12367  M         Tom                Zhang         10000 

 

              - Employee_Hire
CR_ID   Role               Birth_Date      Hire_Date
12367   Sales Rep.    12JUL1972      12/04/2010 

 

I would like check the values in Employee_Name and Employee_Hire whether change or not. I hope this make the problem more clear.

 

Thanks!

Super User
Posts: 5,380

Re: What is the best approach for regression testing in SAS?

Assuming you have more than one person in your real data, you have a challenge. I can't see any candidate for an alternative key. Many people can gave the same name, and names can even change.
I would try to require a change in the source system to make them add a mapping table (old to new cr_id). Or if they could add a key that remains unchanged between their deployments.
Data never sleeps
Contributor
Posts: 61

Re: What is the best approach for regression testing in SAS?

The example above are completely frabicate data. The real data would be much more complex and having more datasets, each datasets got much more variables. (more than 400 variables in around 30 datasets!) The values are not change (except it is primary key) as it is pumped into from source UI system by running automatic script (automatic testing). But the primary key is generated by source system, so it is changed every run (Not be able to remains the same value).

 

The objective of this test is to see whether data has been populated from multiple source systems through common Staging, then reaching SAS remains unchange after every time source system(s)' deployment.

  • to test source system(s) unbroken
  • to test ETL progress working correctly
  • to test data has been populated into SAS correctly

Since this is not real data, it is injected sets of test data, so in this case, this sets of test data sleeps (we can make this sleep, except the primary key value as it is generated by source system).  We have 8 sets of data. (That means 8 controlled set of data)

 

Actually, PDStats's suggestion is very good, I would like to try "proc compare". But not sure how to start...

 

Nancy

Solution
‎07-11-2016 12:01 AM
Super User
Posts: 5,380

Re: What is the best approach for regression testing in SAS?

Proc compare will tell that there are changes. But I can't see how you could use that output in a structure and automated way without a consistent key.
Data never sleeps
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 382 views
  • 2 likes
  • 3 in conversation