BookmarkSubscribeRSS Feed
JibJam221
Obsidian | Level 7

Hello all, 

i've been struggling with the following...

 

I have 15 tables (all stored in a SAS Folder). Each table name starts with TRNSCTN followed by a transaction number (i.e. TRNSCTN300, TRNSCTN354,  TRNSCTN780, etc.).

 

each transaction table has a different amount of column numbers, with different column names.  however, some of these columns represent the same values. For example: 

TRNSCTN300 - Column names: Customer_ID, Order,  Time

TRNSCTN354 - Column names: Customer ID, Order number, time of day

 

This is the case for all 15 tables. Sometimes the column names do match in some tables and not in others. 

 

My question is, how can I go about appending the columns that require combining into 1 column in a master dataset, from those 15 tables? I will need to do this for about 15-20 columns as well... 

 

I tried using a macro to help loop through the tables and concatenate the column names "Customer_ID" and "Customer ID" that are found across all 15 tables, however it ran so many errors...

 

I have attached my code below.

 

 

%macro concatenate_columns;



 LIBNAME mylib "XXXXXXX";



/* Step 2: Create an empty master dataset with the concatenated column */ data mylib.master_dataset;

  length concatenated_column $200; /* Define the length of the new column as needed */

  format concatenated_column $CHAR3.; /* Apply CHAR3. format to allow for three letters */ run;



%do i = 1 %to 15; /* Loop through the 15 tables */



  /* Generate code for concatenating columns from different tables */

  data temp_table;

    set mylib.TRNSCTN&i(keep=Customer_ID, Customer ID);

    concatenated_column = catx(' ', coalesce(trim(Customer_ID), trim(Customer_ID))); /* Concatenate values handling variations in column names */

    drop Customer_ID Customer ID;

  run;



  /* Append to the master dataset */

  data mylib.master_dataset;

    set mylib.master_dataset (in=a);

    if a then output; /* Copy existing data from master_dataset */

    set temp_table (keep=concatenated_column);

  run;



%end;



%mend concatenate_columns;



/* Step 4: Execute the macro */

%concatenate_columns;

 

 

I get many issues, including:

- File MYLIB.TRNSCTN.DATA does not exist.

- Undeclared array referenced: i.

- Variable concatenated_column has been defined as both character and numeric.

- The %END statement is not valid in open code.

 

Is there a more efficient or effective approach to this? 

 

Thanks

 

 

 

 

 

 

13 REPLIES 13
PaigeMiller
Diamond | Level 26

"Appending the columns ... " do you mean taking a table that has (let's say) 4 columns and adding three more columns that don't match from the next data set, giving you a data set with 7 columns?

 

This is just a merge statement in a DATA step, here's an outline of code that will do this.

 

data want;
    merge dataset1 dataset2 dataset3 ... ;
    by customer_ID order;
run;

 

--
Paige Miller
JibJam221
Obsidian | Level 7
sorry, i mean taking 1 column from each table (of the 15 tables) and combining them into 1 column to a master dataset. however the 1 column from each dataset has different column names (for the most part)
PaigeMiller
Diamond | Level 26

What does "combining" mean?

 

Is this a horizontal "combining" of data sets or a vertical "combining" of data sets or something else?


Can you give us an example of two (fake) input data sets, and the desired output from "combining" these two?

--
Paige Miller
JibJam221
Obsidian | Level 7

sure! So I made 2 mock tables below, and I would need the columns in these tables to be appended and placed into a new column in a master dataset (see final dataset in the last table). 

 

ultimately, my issue is that there are 15 tables with inconsistent names and columns, that need to be combined and appended into a master database. Since the 15 tables use different names for the columns that hold the same values. I dont want to merge the tables and create a dataset with over 50 columns holding the same information across multiple columns. Instead, i'm looking to create a master dataset with the appended columns, so the dataset  is clean 1 column holding information on 1 value (i.e. 1 column for Customer_ID, instead of customer_ID being held across 4 different columns due to the different names of the Customer ID columns across the 15 tables).

 

I hope this helps clarify my goal. 

 

TRNSCTN354

Customer ID Order Time
1241 1 10:57
12124 2 11:00
124125 3 12:00
132124 4 10:00
1231 5 09:59

 

TRNSCTN300

Customer_ID Order number Time of day
25748 8 07:55
9985 52 12:11
54848 77 12:14
54578 41 12:45
54577 32 03:43

 

Master Data

Customer_ID OrderNum ToP
25748 8 07:55
9985 52 12:11
54848 77 12:14
54578 41 12:45
54577 32 03:43
25748 8 07:55
9985 52 12:11
54848 77 12:14
54578 41 12:45
54577 32 03:43
PaigeMiller
Diamond | Level 26

The only thing I can think of to do about the inconsistent column names is to go in manually and change them to be consistent. I suppose you could also create a lookup table of inconsistent names that need to be combined, but that seems like more work. yes, it's a hassle and takes some time and manual effort, but with 15 tables I would just make the changes manually.

--
Paige Miller
JibJam221
Obsidian | Level 7
okay this is what I was trying to figure out. Wasnt sure if there was a more efficient way. Thank you!
Tom
Super User Tom
Super User

Just RENAME the variables to be consistently named (assuming they already have consistent other metadata the most important being the TYPE and the storage LENGTH).

So assuming your existing dataset is the one that has the variable names you want to use then you can make a new dataset that combines the OBSERVATIONS from all three dataset with this simple data step.

data NEW;
  set
   ORIGINAL
   TRNSCTN354(rename=(Order=Ordernum Time=ToP))
   TRNSCTN300(rename=('Order number'n=Ordernum 'Time of day'n=ToP))
 ;
run;
JibJam221
Obsidian | Level 7
none of the tables have all of the column names that I need. All of the tables have a different amount of column names, but i'd like to make a master dataset holding all of the information across all of the tables. im assuming there no way thats more efficient than this? Majority of tables have over 30 columns, so I was hoping for an efficient way to do this across the 15 tables.
JackieJ_SAS
SAS Employee

Hi, You could try code like the below. It creates a dataset MYVARS that contains the names of all the variables in all the datasets in one folder (here the folder specified in the library MYLIB). You could then use the SAS macro language to process this dataset in order to get variable names without typing them. This macro code would be custom to your variable names, though.

libname mylib "C:/Myfolder";

ods output variables=myvars;
proc contents data=mylib._all_;
run;
JibJam221
Obsidian | Level 7
thank you! Would you happen to have an example of a macro code that could get the variable names without having to type them? Also, do you know how I can have SAS append the columns from multiple tables into 1 column of a new table? Would the only way be to manually type out the name of each column to append, and from the specific table it comes from?
Tom
Super User Tom
Super User

Get all of the names from a series of dataset in the same library.

proc sql;
create table names as 
select distinct name
from dictionary.columns
where libname='WORK'
  and memname in ('TRAN1', 'TRAN2', 'TRAN3')
;
quit;

Write them to the SAS log:

data _null_;
 set names;
 put name ;
run;

Now just copy the list from the log and post it back into an editor.

JackieJ_SAS
SAS Employee

Sorry, I do not have any macro code handy. This would be a custom program for your situation. If you want to dig, you could try reading other posts like this these: 
Solved: Renaming variable names with macro and loop - SAS Support Communities

Renaming variable w/ macros - SAS Support Communities

Tom
Super User Tom
Super User

What do you mean by "efficient"?

I doubt that there is a programmatic way to figure out how to rename the variables.  Unless there is some pattern to how they have mangled the names.

 

You could pull out all of the unique names from the multiple datasets and put them into a file and manually create a series of OLDNAME =NEWNAME pairs that covers all of them.

 

Then If you modify the DKRICOND option setting you can just use the same rename list for all input datasets.

 

Example:

1    options validvarname=any;
2    %let renames=sex=gender Order=Ordernum Time=ToP 'Order number'n=Ordernum 'Time of day'n=ToP;
3    options dkricond=error;
4    data want;
5     set sashelp.class(rename=(&renames));
ERROR: Variable Order is not on file SASHELP.CLASS.
ERROR: Variable Time is not on file SASHELP.CLASS.
ERROR: Variable 'Order number'n is not on file SASHELP.CLASS.
ERROR: Variable 'Time of day'n is not on file SASHELP.CLASS.
ERROR: Invalid DROP, KEEP, or RENAME option on file SASHELP.CLASS.
6    run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WANT may be incomplete.  When this step was stopped there were 0 observations and 0 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


7    options dkricond=nowarn;
8    data want;
9     set sashelp.class(rename=(&renames));
10   run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.WANT has 19 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 3516 views
  • 0 likes
  • 4 in conversation