I have joined 3 tables (table1, table 2, table 3) with Empid which is common in all three tables.
The final merged table is Four.
Data Four;
merge table1(in=a)
table2(in=b)
table3(in=c);
by Empid;
if a and b and c;
run;
proc print data = Four;
Run;
I got the final table Four.
My question is :
What steps would you take to ensure your dataset is correct apart from reading the log and apart from reading the first few lines of the code as formal testing?
First step: read your code again
It is missing the statement
by Empid;
then look at the first few lines of all datasets involved. Formal testing would require specific business rules.
Formal testing is not the answer that I am looking for.
I know how does this code works. Because that code has been written by me. Apart from that do you have any other info. available for my question?
Thanks for your advice.
We tend to take this for granted, so: READ THE LOG.
That's how SAS communicates what happened. Besides errors / notes / warnings, look at how many observations came in from each source and how many you ended up with.
*********************************
EDITED: Given what you describe, then I would add one more item to your checklist. Check the last 10 observations to see that they come out as expected. MERGE without a BY statement is not so common, and the anomalies would tend to come at the end of the resulting data set.
Reading the log, checking the errors and warning messages are the basic rules in SAS when creating a dataset.
Anyways thanks for the reply.
For formal testing have test cases and define the criteria for "met".
For unit testing create test data for your test cases and define the expected outcome. Then run your code and compare the actual outcome with the expected outcome.
I would also have some generic test cases like: No errors or warnings and eventually also performance metrics to be met if this is relevant.
@Saawan wrote:
I have joined 3 tables (table1, table 2, table 3) with Empid which is common in all three tables.
The final merged table is Four.
Data Four;
merge table1(in=a)
table2(in=b)
table3(in=c);
by Empid;
if a and b and c;
run;
proc print data = Four;
Run;
I got the final table Four.
My question is :
What steps would you take to ensure your dataset is correct apart from reading the log and apart from reading the first few lines of the code as formal testing?
Not sure this is formal testing but:
1. Pick a few cases and check the merge is what you expect
2. Check any edge cases possible, are missings correctly excluded, if you have records in one dataset but not others are they dropped. Check the number of records, N of records in the output should be equal to or less than the number of records in any of the contributing data sets.
3. If you have a character variable check if the results differ if you upcase/lowcase everything.
@Saawan wrote:
apart from reading the first few lines of the code
You should read every line of your code. I'm assuming you mean the first few records of your output data set.
In the future, please included modifications to your question in the thread, not as an edit.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.