BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello

I am using proc append to append (union) multiple data sets using force option (since some data sets have columns that are not exiting in other data sets ).

What is the way to prevent warning message  

WARNING: Variable Ind_A was not found on BASE file. The variable will not be added to the BASE file.
 
 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

 

 

As I mentioned, KEEP  the needed variables or DROP the unwanted ones as a data set option. If the extra variables are not the same in all the other sets then KEEP is likely the way to go.

 

proc append base=somelib.dataset
        data=otherdataset (keep = <your list of variable names goes here>);
run;

 


@Ronein wrote:
Hello
I need to union 20 data sets and each data set contain 2 million rows.
Most of fields are in all data sets but It happened that there were new fields added to the data sets.
I have used proc append because as far as I know it is much quicker.
What do you think please?

 

View solution in original post

13 REPLIES 13
sbxkoenk
SAS Super FREQ

Hello @Ronein ,

 

The FORCE option is preventing the PROC APPEND to fail, but you will always get that warning message (that you cannot suppress) in case the variables in both datasets are not the same.

To avoid the warning you should use a data step to concatenate vertically.

Data step is also the only way to have all variables from both base data set and append data set.

 

Cheers,

Koen

ballardw
Super User

@Ronein wrote:

Hello

I am using proc append to append (union) multiple data sets using force option (since some data sets have columns that are not exiting in other data sets ).

What is the way to prevent warning message  

WARNING: Variable Ind_A was not found on BASE file. The variable will not be added to the BASE file.
 
 


The only way I know to PREVENT that warning is to make sure the added variables are not in the appending data set, such as Drop the additional variables or only Keep the variables in the base data as data set options.

data work.class; 
   set sashelp.class;
run;

data junk; 
   set sashelp.class;
   hw=height*weight;
run;

proc append base=work.class
            data=junk (drop=hw) force;
run;

The real question might be why Proc Append if you know there are variables not present in the base set? Force may also have issues with same named variables of different lengths resulting in truncated data.

Ronein
Meteorite | Level 14
Hello
I need to union 20 data sets and each data set contain 2 million rows.
Most of fields are in all data sets but It happened that there were new fields added to the data sets.
I have used proc append because as far as I know it is much quicker.
What do you think please?
ballardw
Super User

 

 

As I mentioned, KEEP  the needed variables or DROP the unwanted ones as a data set option. If the extra variables are not the same in all the other sets then KEEP is likely the way to go.

 

proc append base=somelib.dataset
        data=otherdataset (keep = <your list of variable names goes here>);
run;

 


@Ronein wrote:
Hello
I need to union 20 data sets and each data set contain 2 million rows.
Most of fields are in all data sets but It happened that there were new fields added to the data sets.
I have used proc append because as far as I know it is much quicker.
What do you think please?

 

DarrylLawrence
Obsidian | Level 7

Good day

 

I tried listing all the variables in the "DATA" dataset in the PROC Append but it still does not add the columns to the "BASE" dataset in the PROC Append code.

 

I tried a datastep but then it drops the indexes on the "BASE" table, and then I need to re-add the indexes and that step takes over 2-4 hours as the dataset has 100's of millions of rows.

 

data HIST;

set HIST

       CURR;

run;

 

How else can I achieve this objective?

 

I have 2 datasets:

1. HIST (852 variables)

2. CURR (723 variables)

 

Put all the variable names from "CURR" into a list (723 variables)

 

proc sql;
select name
into :SASMA_LIST separated by ' '
from dictionary.columns
where libname="WORK" and memtype="DATA" and memname = "CURR";
quit;

 

PROC APPEND BASE=HIST
DATA=CURR (keep=&SASMA_LIST) FORCE;
RUN;

 

Results:

NOTE: Writing HTML5(EGHTML) Body file: EGHTML
27
28 PROC APPEND BASE=HIST
29 DATA=CURR (keep=&SASMA_LIST) FORCE;
30 RUN;

NOTE: Appending WORK.CURR to WORK.HIST.
WARNING: Variable SME_DATE_KEY was not found on BASE file. The variable will not be added to the BASE file.
WARNING: Variable TOT_PAYMENT_VALUE_AMT was not found on BASE file. The variable will not be added to the BASE file.
WARNING: Variable ACCOUNT_BALANCE_AMT was not found on BASE file. The variable will not be added to the BASE file.
WARNING: Variable TOT_INSTALMENT_AMT was not found on BASE file. The variable will not be added to the BASE file.
WARNING: Variable TOT_DUE_AMT was not found on BASE file. The variable will not be added to the BASE file.
WARNING: Variable Prev_TOT_DUE_AMT was not found on BASE file. The variable will not be added to the BASE file.
WARNING: Variable TOT_PAYMENT_VALUE_AMT0 was not found on BASE file. The variable will not be added to the BASE file.
....


NOTE: FORCE is specified, so dropping/truncating will occur.
NOTE: There were 10 observations read from the data set WORK.CURR.
NOTE: 10 observations added.
NOTE: The data set WORK.HIST has 20 observations and 852 variables.
NOTE: PROCEDURE APPEND used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds

 

DarrylLawrence
Obsidian | Level 7

Good day

 

I tried your method (keep statement) but the variables ( in the "DATA"  dataset but not on the "BASE" dataset) are still dropped in the PROC Append.

I have 2 SAS datasets and want to append the records in the dataset "CURR" to the dataset "HIST"

Too many variables to list in the proc append code and the list of variables might change from month to month, so I extracted all the variable names from the dictionary.columns list.

proc sql;
select name
into :SASMA_LIST separated by ' '
from dictionary.columns
where libname="WORK" and memtype="DATA" and memname = "CURR";
quit;


PROC APPEND BASE=HIST
DATA=CURR (keep=&SASMA_LIST) FORCE;
RUN;


Log

35
36 PROC APPEND BASE=HIST
37 DATA=CURR (keep=&SASMA_LIST) FORCE;
38 RUN;

NOTE: Appending WORK.CURR to WORK.HIST.
WARNING: Variable p_rcosmetics1 was not found on BASE file. The variable will not be added to the BASE file.
WARNING: Variable p_raswiss1 was not found on BASE file. The variable will not be added to the BASE file.
WARNING: Variable p_rathome1 was not found on BASE file. The variable will not be added to the BASE file.

 

 

sbxkoenk
SAS Super FREQ

Hello @DarrylLawrence ,

 

These are WARNINGs.
It's better to put them in GREEN (instead of RED).

 

Your KEEP= on the DATA= is not doing anything (as all existing variables are in the KEEP=).
The drop= and keep= options suggested by @ballardw were only there to avoid possible warnings and to possibly drop the FORCE option from the syntax. In other words ... to "align" on the vars available in both datasets.

data classextra; set sashelp.class; a=9999; run;

proc datasets library=WORK NoList NoDetails memtype=data; delete classNEW; run; QUIT;
proc append base=work.classNEW data=sashelp.class; run;
proc append base=work.classNEW data=work.classextra FORCE nowarn; run;
QUIT;

Anyway, if you have variables in base= dataset that are non-existing in data= dataset AND vice versa, then the only way to append while keeping all variables from both datasets is by using the data step.

 

Koen

 
DarrylLawrence
Obsidian | Level 7

Thanks Koen

 

I was trying to avoid using a data step to append the latest month's data to my historical dataset (36 months worth of data) as that process drops the indexes on the BASE dataset, and it takes about 4 hours to re-create the indexes. 

Proc append does not drop the indexes but then I have the issue that any new variables, not on the BASE dataset, are not added.

 

So it looks like I have to revert to my original code :

/*********************************************************/

/* Add Current SASMA Snapshot to the History Dataset     */

/*********************************************************/

data HISTORY.SASMA_HISTORY;

set HISTORY.SASMA_HISTORY

     SASMA.CURRENT_SASMA;

run;

 

/*********************************************************/

/* Add Index's                                                       */

/*********************************************************/

 

proc datasets library=HISTORY;

modify SASMA_HISTORY;

index create ID;

index create OBS_DATE;

index create STATUS;

index create REGISTERED_BRAND;

index create ID_NUMBER / nomiss;

index create PASSPORT_NUMBER / nomiss;

run;

/*********************************************************/

/* Add Composite Index's                                 */

/*********************************************************/

proc datasets library=HISTORY;

modify SASMA_HISTORY;

index create numname=(ID OBS_DATE) / nomiss;

run;

Tom
Super User Tom
Super User

Wouldn't it be a lot faster to just make the indexes while adding the data?

Something like:

data HISTORY.SASMA_HISTORY
  (index=(id obs_date .... numname=(ID OBS_DATE))
;
  set HISTORY.SASMA_HISTORY
      SASMA.CURRENT_SASMA;
run;

NOTE If this is taking 4 hours perhaps you are at the point where you need store this data into an actual database instead of SAS datasets.

sbxkoenk
SAS Super FREQ

Hello @DarrylLawrence ,

 

PROC APPEND and PROC COPY are preserving the index indeed.

With the data step solution for your need ... you will have to recreate the index ... not necessarily in a subsequent step though --> you can use the INDEX= data set option in the DATA statement.

 

I have two macros to capture all the index definitions such that you can easily re-apply them after vertical concatenation (one macro for getting and one macro for setting them again), but I do not think you need these (as you perfectly know about the indexes that were there on the base= dataset and that you need to re-apply).

 

Good luck,

Koen

 
DarrylLawrence
Obsidian | Level 7

Thanks Koen

 

I was trying to avoid using a data step to append the latest month's data to my historical dataset (36 months worth of data) as that process drops the indexes on the BASE dataset, and it takes about 4 hours to re-create the indexes. 

Proc append does not drop the indexes but then I have the issue that any new variables, not on the BASE dataset, are not added.

 

So it looks like I have to revert to my original code :

/*********************************************************/

/* Add Current SASMA Snapshot to the History Dataset     */

/*********************************************************/

data HISTORY.SASMA_HISTORY;

set HISTORY.SASMA_HISTORY

     SASMA.CURRENT_SASMA;

run;

 

/*********************************************************/

/* Add Index's                                                       */

/*********************************************************/

 

proc datasets library=HISTORY;

modify SASMA_HISTORY;

index create ID;

index create OBS_DATE;

index create STATUS;

index create REGISTERED_BRAND;

index create ID_NUMBER / nomiss;

index create PASSPORT_NUMBER / nomiss;

run;

/*********************************************************/

/* Add Composite Index's                                 */

/*********************************************************/

proc datasets library=HISTORY;

modify SASMA_HISTORY;

index create numname=(ID OBS_DATE) / nomiss;

run;

Tom
Super User Tom
Super User

If you want the structure of the combined dataset to change over time then PROC APPEND is not the right tool.

 

Is this something you are doing all in one job? Do you have all of the inputs available at the same time?  Why not make the dataset in one data step instead of bunch of steps?

data want;
  set in1 in2 in3 in4 another_in1 another_in2 ;
run;

You could do your own testing to see which performs better.

 

Even if there is some reason to use PROC APPEND you will need to add some logic to make a NEW dataset whenever you need to add more variables.

Ksharp
Super User
proc append base=work.class
            data=junk force nowarn;
run;

Try NOWARN option.

sas-innovate-2024.png

📢

ANNOUNCEMENT

The early bird rate has been extended! Register by March 18 for just $695 - $100 off the standard rate.

 

Check out the agenda and get ready for a jam-packed event featuring workshops, super demos, breakout sessions, roundtables, inspiring keynotes and incredible networking events. 

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 2007 views
  • 8 likes
  • 6 in conversation