DATA Step, Macro, Functions and more

Methods to validate a data step merge

Reply
Occasional Contributor
Posts: 12

Methods to validate a data step merge

[ Edited ]

I have joined 3 tables (table1, table 2, table 3) with Empid which is common in all three tables.

The final merged table is Four.

 

 

Data Four;
    merge table1(in=a)
               table2(in=b)
               table3(in=c);

    by Empid;
    if a and b and c;
run;


proc print data = Four;
Run;

 

I got the final table Four. 

 

My question is :

What steps would you take to ensure your dataset is correct apart from reading the log and apart from reading the first few lines of the code as  formal testing?Smiley Happy

Respected Advisor
Posts: 4,931

Re: SAS query?

First step: read your code again Smiley Happy

 

It is missing the statement

 

by Empid;

 

then look at the first few lines of all datasets involved. Formal testing would require specific business rules.

PG
Occasional Contributor
Posts: 12

Re: SAS query?

[ Edited ]

Formal testing is not the answer that I am looking for.

 

I know how does this code works. Because that code has been written by me. Apart from that do you have any other info. available for my question? 

 

Thanks for your advice.

Super User
Posts: 5,516

Re: SAS query?

[ Edited ]

We tend to take this for granted, so:  READ THE LOG.

 

That's how SAS communicates what happened.  Besides errors / notes / warnings, look at how many observations came in from each source and how many you ended up with.

 

*********************************

 

EDITED:  Given what you describe, then I would add one more item to your checklist.  Check the last 10 observations to see that they come out as expected.  MERGE without a BY statement is not so common, and the anomalies would tend to come at the end of the resulting data set.

Occasional Contributor
Posts: 12

Re: SAS query?

Posted in reply to Astounding

Reading the log, checking the errors and warning messages are the basic rules in SAS when creating a dataset.

 

Anyways thanks for the reply.

Respected Advisor
Posts: 4,173

Re: SAS query?

[ Edited ]

@Saawan

For formal testing have test cases and define the criteria for "met".

 

For unit testing create test data for your test cases and define the expected outcome. Then run your code and compare the actual outcome with the expected outcome.

 

I would also have some generic test cases like: No errors or warnings and eventually also performance metrics to be met if this is relevant.

Occasional Contributor
Posts: 12

Re: SAS query?

That's Cool. Compare would work in my case.
Super User
Posts: 19,860

Re: SAS query?


Saawan wrote:

I have joined 3 tables (table1, table 2, table 3) with Empid which is common in all three tables.

The final merged table is Four.

Data Four;

merge table1(in=a)

           table2(in=b)

           table3(in=c);

by Empid;

if a and b and c;

run;

proc print data = Four;

Run;

 

I got the final table Four. 

 

My question is :

What steps would you take to ensure your dataset is correct apart from reading the log and apart from reading the first few lines of the code as  formal testing?Smiley Happy


 

Not sure this is formal testing but:

1. Pick a few cases and check the merge is what you expect

2. Check any edge cases possible, are missings correctly excluded, if you have records in one dataset but not others are they dropped. Check the number of records, N of records in the output should be equal to or less than the number of records in any of the contributing data sets.

3. If you have a character variable check if the results differ if you upcase/lowcase everything.

 

 


Saawan wrote:

apart from reading the first few lines of the code 


 

You should read every line of your code. I'm assuming you mean the first few records of your output data set. 

 

In the future, please included modifications to your question in the thread, not as an edit. 

Super User
Posts: 19,860

Re: Methods to validate a data step merge

Note, I've updated the subject line on your main post. SAS query is too generic and doesn't help to explain your question. Please use more descriptive subject lines in the future and include your code in a code editor rather than as text in the forum.
Ask a Question
Discussion stats
  • 8 replies
  • 225 views
  • 2 likes
  • 5 in conversation