07-17-2017 11:31 PM
How would you code to merge 3 datasets together (not all 3 share a common column i.e. table_1 and table_2 can merge through a column; table_1 and table_3 can merge through another column, but table_2 and table_3 have no common column). The code I am attempting must be able to run regardless of the name of the datasets i.e. any user can run the code for whatever tables they input into SAS and choose to merge.
This is the code I have used so far which works, but as you can see, it requires the dataset name (table_1 etc.) which I don't want as part of the code as different people using the code will have different dataset names and the process should be automated (i.e. the user shouldn't need to know how to code in SAS).
proc sort data = table_1; by column_A; run; proc sort data = table_2; by column_A; run; data table_together; merge table_1 table_2; by column_A; run; proc sort data = table_3; by column_B; run; data table_all; merge table_3 table_together; by column B; run;
Also, table_1 and table_2 may have columns sharing the same name. Is there a way in the code to merge these tables without the columns overwriting eachother if the columns with the same name have different data?
07-17-2017 11:54 PM
Anything possible though I question the wisdom of this type of projects.
Some things to consider:
1. You say you don't want Table name identified but I assume a macro variables for the three datasets is fine?
2. How do you know which fields to join on?
3. How do you differ between fields to join on and data that should be selected but are in both data sets
4. Wouldn't SQL be more effective here?
5. Will it always be a one to one join situation?
Im not going to code this for you, someone else may, but do you have any experience in macros? That's probably the solution you'll need.
07-18-2017 12:02 AM
What is the actual problem that this is trying to solve?
Why do the datasets have non-key variables with the same name? What is the meaning of variable X in dataset A versus variable X in dataset B?
Can you post some sample data and the results that you want from them? Perhaps simple merges is not what you are actually need to do.
That said if you have working code and want to provide a way for users to specify the dataset names then you might just need to define some macro variables. Or perhaps even wrap the code in a macro with parameters.
%let input1=TABLE_1; %let input2=TABLE_2; %let input3=TABLE_3; proc sql noprint ; create table table_all as select * from (select * from &input1 A full join &input2 B on A.column_A = B.column_A ) X full join &input3 C on X.column_B = C.column_B ; quit;
07-18-2017 02:08 AM - edited 07-18-2017 02:09 AM
> any user can run the code for whatever tables they input into SAS and choose to merge
What could possibly go wrong? ;-)
SQL is the answer to non-shared key joins as shown above, but with SQL, cartesian products are happy -keen even- to bring your server to a halt.
SQL plus your stated goal make disaster a matter of time unless stern controls are in place.
07-18-2017 02:35 AM
This is why you have the point-and-click tools like EG or Studio.
As soon as users have to work with code, at least a modicum of intelligence and computer-saviness is mandatory.
And if you want to write code that deals with all eventualities and can withstand the onslaught of even the heftiest DAU, you will end up with several KLOC that nobody wants to maintain. Think "Codebase of Windows".
07-18-2017 04:09 AM
So this is code to be shared out as part of a library correct? If so what does your Functional Design Specification say - i.e. the document which defines what the code should do, inputs/outputs, validation etc.? That should clearly state what the input tables/keys, sorts etc. (yes, its not just a merge, you need to consider what sorting the output data should have, not to mention validating inputs etc.). I have seen this type of things from basic: i.e. a simple few lines of code which just merges the data and makes no assumptions - this only works in very limited scenarios and really saves zero effort as its faster to write the code, and complex: hundreds of lines of code doing all kinds of checking, merging, retaining sorts etc - this however fails 99% of the time and most would rather just code the merge themselves.
So to sum up, these tools (in fact quite a lot of what is out there in macro language) mostly make coding harder and less flexible than just writing Base SAS code to do a task.