Help using Base SAS procedures

proc append lots of datasets that have different names for variables

Reply
Contributor
Posts: 37

proc append lots of datasets that have different names for variables

Hi!  Here is the challenge.

There are many datadets with about 50 columns in each dataset.  Every dataset has over thousands of records.

e.g.

one of the datasets looks like:

produce    weight        bags

------------------------------------

carrots     10              500

potatos     5               200

garlic        2               100

...

another dataset looks like (note the variable names are not the same as the above dataset):

fruit         wt.              containers

-----------------------------------------

apples     15              100

pears        9                20

...

After appending, the result should look like this:

produce    weight        bags

------------------------------------

carrots     10              500

potatos     5               200

garlic        2               100

...

apples     15              100

pears        9                20

...

question: What is the best way to proc append lots of large files that have different variable names?

i read about appending, but none address the variable naming issue yet.

Thanks!

Super User
Posts: 19,869

Re: proc append lots of datasets that have different labels for variables

Do you have different labels or variable names?

If the labels are different it won't matter, the base data set will set the labels/variables for the final dataset.

Try it Smiley Happy

data class1;

  set sashelp.class;

  label age='AGE GROUP';

run;

data class2;

  set sashelp.class;

  label sex='Gender';

run;

proc append base=class1 data=class2;

run;

Super Contributor
Posts: 349

Re: proc append lots of datasets that have different labels for variables

Hi,

Try this...You need to rename the variables.

data one;

input produce $   weight        bags;

cards;

carrots     10              500

potatos     5               200

garlic        2               100

;

run;

data two;

input fruit  $   wt  containers;

cards;

apples     15              100

pears        9                20

;

run;

proc append base=one data=two(rename=(fruit=produce wt=weight containers=bags));

run;

data final;

set one two(rename=(fruit=produce wt=weight containers=bags));

run;

Thanks,

Shiva

Contributor
Posts: 37

Re: proc append lots of datasets that have different labels for variables

Shiva,

it looks like you provided two options.  one is "proc append", another one is using "data final; set..."

both methods require to rename over 50 fields one by one.  in my case, there are over a few hundred datasets to rename because all of them have different labels...different spelling, abbreviated differently, e.g. fruit, frt, frut, etc.  weight, wt, wgt, w. etc.

wonder if there are other ways to accomplish that.

Thanks you.  I will try it tomorrow.

Contributor
Posts: 37

Re: proc append lots of datasets that have different labels for variables


Reeza,

so the base dataset is "class1" in your example?  and "AGE GROUP" will be the label for the final dataset named class1?

i don't have access to try it now, but will try tomorrow when i have access.

thanks.Smiley Happy

Contributor
Posts: 37

Re: proc append lots of datasets that have different labels for variables

is there a way to remove all variable names from all datasets, so all the names would become VAR1, VAR2... for every dataset?  so PROC APPEND and "data final; set..." will not require renaming every field in every dataset?

we have hundreds of datasets with abbreviated names that don't even spell the same way.:smileyshocked:

i made a correction.  it's about variable names, not variable labels.  sorry.

Super User
Posts: 10,044

Re: proc append lots of datasets that have different labels for variables

Answer is Yes.Assuming all these tables are under WORK library.

data class1;
  set sashelp.class;
  label age='AGE GROUP';
run;

data class2;
  set sashelp.class;
  label sex='Gender';
run;


data _null_;
 set sashelp.vmember(keep=memname) end=last;
 if _n_ eq 1 then call execute('proc datasets library=work nolist;');
 call execute('modify '||trim(memname)||'; attrib _all_ label=" ";');
 if last then call execute('quit;');
run;


Ksharp

Contributor
Posts: 37

Re: proc append lots of datasets that have different labels for variables

thank you Ksharp.  I meant variable names, not variable labels.

let me try it.

Super User
Super User
Posts: 7,076

Re: proc append lots of datasets that have different labels for variables

That is not an easy thing to do RIGHT.

How confident are you that the second variable in one dataset has the same meaning as the second variable in the next dataset?

If so then you can use the VARNUM to generate the new variable names.

proc sql noprint ;

select catx('=',name,cats('VAR',varnum)) into :renames separated by ' '

  from dictionary.columns

  where libname='WORK' and memname='X'

;

quit;

proc append base=all data=x (rename=(&renames));

run;

You might be better off dumping the variable information from all the datasets into spreadsheet and manually creating a column with the new name to assign to each one.  Then you can use that to generate the rename commands.

Super User
Posts: 10,044

Re: proc append lots of datasets that have different names for variables

Easy.Assuming all these tables are under WORK library.

data class1;
  set sashelp.class;
  label age='AGE GROUP';
run;

data class2;
  set sashelp.class;
  label sex='Gender';
run;


data _null_;
 set sashelp.vcolumn(keep=libname memname name where=(libname='WORK')) end=last;
 if _n_ eq 1 then call execute('proc datasets library=work nolist;');
 if memname ne lag(memname) then do;n=0;call execute('modify '||trim(memname)||';'); end;
 n+1;
 call execute('rename '||trim(name)||'=var'||strip(n)||';');
 if last then call execute('quit;');
run;


Ksharp

Respected Advisor
Posts: 4,173

Re: proc append lots of datasets that have different names for variables

Hi Ksharp

I believe there is also a thread with the same question here: https://communities.sas.com/message/123403#123403

I personally like Peter.C's approach of simply using a SQL UNION ALL as it does exactly what the OP is asking for "out of the box".

Cheers

Patrick

Contributor
Posts: 37

Re: proc append lots of datasets that have different names for variables

UNION ALL seems to do the job well.  Thanks! Smiley Happy

Super User
Posts: 10,044

Re: proc append lots of datasets that have different names for variables

Hi. Patrick.

Yes, That could do the same job. But That will waste lots and lots of time, because SQL will run through all over the datasets. When you have lots of large tables, that would definitely be a nightmare.

My solution is very very fast, it would not read dataset a single observation, it just modify metadata of datasets directly, SO I recommend to use my way.

Ksharp

Respected Advisor
Posts: 4,173

Re: proc append lots of datasets that have different names for variables

Hi Ksharp

Totally agree with what you say as long as there is no need to actually physically concatenate the data sets.

Cheers

Patrick

Super User
Posts: 10,044

Re: proc append lots of datasets that have different names for variables

Hi,Patrick

If there is some need to combine these datasets, I believe using APPEND + the way I proposed will also be faster than SQL.

Cheers

Ksharp

Ask a Question
Discussion stats
  • 14 replies
  • 3251 views
  • 1 like
  • 6 in conversation