Hello all,
i've been struggling with the following...
I have 15 tables (all stored in a SAS Folder). Each table name starts with TRNSCTN followed by a transaction number (i.e. TRNSCTN300, TRNSCTN354, TRNSCTN780, etc.).
each transaction table has a different amount of column numbers, with different column names. however, some of these columns represent the same values. For example:
TRNSCTN300 - Column names: Customer_ID, Order, Time
TRNSCTN354 - Column names: Customer ID, Order number, time of day
This is the case for all 15 tables. Sometimes the column names do match in some tables and not in others.
My question is, how can I go about appending the columns that require combining into 1 column in a master dataset, from those 15 tables? I will need to do this for about 15-20 columns as well...
I tried using a macro to help loop through the tables and concatenate the column names "Customer_ID" and "Customer ID" that are found across all 15 tables, however it ran so many errors...
I have attached my code below.
%macro concatenate_columns; LIBNAME mylib "XXXXXXX"; /* Step 2: Create an empty master dataset with the concatenated column */ data mylib.master_dataset; length concatenated_column $200; /* Define the length of the new column as needed */ format concatenated_column $CHAR3.; /* Apply CHAR3. format to allow for three letters */ run; %do i = 1 %to 15; /* Loop through the 15 tables */ /* Generate code for concatenating columns from different tables */ data temp_table; set mylib.TRNSCTN&i(keep=Customer_ID, Customer ID); concatenated_column = catx(' ', coalesce(trim(Customer_ID), trim(Customer_ID))); /* Concatenate values handling variations in column names */ drop Customer_ID Customer ID; run; /* Append to the master dataset */ data mylib.master_dataset; set mylib.master_dataset (in=a); if a then output; /* Copy existing data from master_dataset */ set temp_table (keep=concatenated_column); run; %end; %mend concatenate_columns; /* Step 4: Execute the macro */ %concatenate_columns;
I get many issues, including:
- File MYLIB.TRNSCTN.DATA does not exist.
- Undeclared array referenced: i.
- Variable concatenated_column has been defined as both character and numeric.
- The %END statement is not valid in open code.
Is there a more efficient or effective approach to this?
Thanks
"Appending the columns ... " do you mean taking a table that has (let's say) 4 columns and adding three more columns that don't match from the next data set, giving you a data set with 7 columns?
This is just a merge statement in a DATA step, here's an outline of code that will do this.
data want;
merge dataset1 dataset2 dataset3 ... ;
by customer_ID order;
run;
What does "combining" mean?
Is this a horizontal "combining" of data sets or a vertical "combining" of data sets or something else?
Can you give us an example of two (fake) input data sets, and the desired output from "combining" these two?
sure! So I made 2 mock tables below, and I would need the columns in these tables to be appended and placed into a new column in a master dataset (see final dataset in the last table).
ultimately, my issue is that there are 15 tables with inconsistent names and columns, that need to be combined and appended into a master database. Since the 15 tables use different names for the columns that hold the same values. I dont want to merge the tables and create a dataset with over 50 columns holding the same information across multiple columns. Instead, i'm looking to create a master dataset with the appended columns, so the dataset is clean 1 column holding information on 1 value (i.e. 1 column for Customer_ID, instead of customer_ID being held across 4 different columns due to the different names of the Customer ID columns across the 15 tables).
I hope this helps clarify my goal.
TRNSCTN354
Customer ID | Order | Time |
1241 | 1 | 10:57 |
12124 | 2 | 11:00 |
124125 | 3 | 12:00 |
132124 | 4 | 10:00 |
1231 | 5 | 09:59 |
TRNSCTN300
Customer_ID | Order number | Time of day |
25748 | 8 | 07:55 |
9985 | 52 | 12:11 |
54848 | 77 | 12:14 |
54578 | 41 | 12:45 |
54577 | 32 | 03:43 |
Master Data
Customer_ID | OrderNum | ToP |
25748 | 8 | 07:55 |
9985 | 52 | 12:11 |
54848 | 77 | 12:14 |
54578 | 41 | 12:45 |
54577 | 32 | 03:43 |
25748 | 8 | 07:55 |
9985 | 52 | 12:11 |
54848 | 77 | 12:14 |
54578 | 41 | 12:45 |
54577 | 32 | 03:43 |
The only thing I can think of to do about the inconsistent column names is to go in manually and change them to be consistent. I suppose you could also create a lookup table of inconsistent names that need to be combined, but that seems like more work. yes, it's a hassle and takes some time and manual effort, but with 15 tables I would just make the changes manually.
Just RENAME the variables to be consistently named (assuming they already have consistent other metadata the most important being the TYPE and the storage LENGTH).
So assuming your existing dataset is the one that has the variable names you want to use then you can make a new dataset that combines the OBSERVATIONS from all three dataset with this simple data step.
data NEW;
set
ORIGINAL
TRNSCTN354(rename=(Order=Ordernum Time=ToP))
TRNSCTN300(rename=('Order number'n=Ordernum 'Time of day'n=ToP))
;
run;
Hi, You could try code like the below. It creates a dataset MYVARS that contains the names of all the variables in all the datasets in one folder (here the folder specified in the library MYLIB). You could then use the SAS macro language to process this dataset in order to get variable names without typing them. This macro code would be custom to your variable names, though.
libname mylib "C:/Myfolder";
ods output variables=myvars;
proc contents data=mylib._all_;
run;
Get all of the names from a series of dataset in the same library.
proc sql;
create table names as
select distinct name
from dictionary.columns
where libname='WORK'
and memname in ('TRAN1', 'TRAN2', 'TRAN3')
;
quit;
Write them to the SAS log:
data _null_;
set names;
put name ;
run;
Now just copy the list from the log and post it back into an editor.
Sorry, I do not have any macro code handy. This would be a custom program for your situation. If you want to dig, you could try reading other posts like this these:
Solved: Renaming variable names with macro and loop - SAS Support Communities
What do you mean by "efficient"?
I doubt that there is a programmatic way to figure out how to rename the variables. Unless there is some pattern to how they have mangled the names.
You could pull out all of the unique names from the multiple datasets and put them into a file and manually create a series of OLDNAME =NEWNAME pairs that covers all of them.
Then If you modify the DKRICOND option setting you can just use the same rename list for all input datasets.
Example:
1 options validvarname=any;
2 %let renames=sex=gender Order=Ordernum Time=ToP 'Order number'n=Ordernum 'Time of day'n=ToP;
3 options dkricond=error;
4 data want;
5 set sashelp.class(rename=(&renames));
ERROR: Variable Order is not on file SASHELP.CLASS.
ERROR: Variable Time is not on file SASHELP.CLASS.
ERROR: Variable 'Order number'n is not on file SASHELP.CLASS.
ERROR: Variable 'Time of day'n is not on file SASHELP.CLASS.
ERROR: Invalid DROP, KEEP, or RENAME option on file SASHELP.CLASS.
6 run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 0 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
7 options dkricond=nowarn;
8 data want;
9 set sashelp.class(rename=(&renames));
10 run;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.WANT has 19 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Ready to level-up your skills? Choose your own adventure.