Re: proc append lots of datasets that have different names for variabl...

Cyndia · Posted 04-12-2012 09:11 PM

Hi! Here is the challenge.

There are many datadets with about 50 columns in each dataset. Every dataset has over thousands of records.

e.g.

one of the datasets looks like:

produce weight bags

------------------------------------

carrots 10 500

potatos 5 200

garlic 2 100

...

another dataset looks like (note the variable names are not the same as the above dataset):

fruit wt. containers

-----------------------------------------

apples 15 100

pears 9 20

...

After appending, the result should look like this:

produce weight bags

------------------------------------

carrots 10 500

potatos 5 200

garlic 2 100

...

apples 15 100

pears 9 20

...

question: What is the best way to proc append lots of large files that have different variable names?

i read about appending, but none address the variable naming issue yet.

Thanks!

Reeza · Posted 04-12-2012 10:56 PM

Do you have different labels or variable names?

If the labels are different it won't matter, the base data set will set the labels/variables for the final dataset.

Try it

data class1;

set sashelp.class;

label age='AGE GROUP';

run;

data class2;

set sashelp.class;

label sex='Gender';

run;

proc append base=class1 data=class2;

run;

shivas · Posted 04-12-2012 11:52 PM

Hi,

Try this...You need to rename the variables.

data one;

input produce $ weight bags;

cards;

carrots 10 500

potatos 5 200

garlic 2 100

;

run;

data two;

input fruit $ wt containers;

cards;

apples 15 100

pears 9 20

;

run;

proc append base=one data=two(rename=(fruit=produce wt=weight containers=bags));

run;

data final;

set one two(rename=(fruit=produce wt=weight containers=bags));

run;

Thanks,

Shiva

Cyndia · Posted 04-13-2012 12:28 AM

Shiva,

it looks like you provided two options. one is "proc append", another one is using "data final; set..."

both methods require to rename over 50 fields one by one. in my case, there are over a few hundred datasets to rename because all of them have different labels...different spelling, abbreviated differently, e.g. fruit, frt, frut, etc. weight, wt, wgt, w. etc.

wonder if there are other ways to accomplish that.

Thanks you. I will try it tomorrow.

Cyndia · Posted 04-13-2012 12:20 AM

Reeza,

so the base dataset is "class1" in your example? and "AGE GROUP" will be the label for the final dataset named class1?

i don't have access to try it now, but will try tomorrow when i have access.

thanks.

Cyndia · Posted 04-13-2012 12:33 AM

is there a way to remove all variable names from all datasets, so all the names would become VAR1, VAR2... for every dataset? so PROC APPEND and "data final; set..." will not require renaming every field in every dataset?

we have hundreds of datasets with abbreviated names that don't even spell the same way.:smileyshocked:

i made a correction. it's about variable names, not variable labels. sorry.

Ksharp · Posted 04-13-2012 01:50 AM

Answer is Yes.Assuming all these tables are under WORK library.

data class1;
  set sashelp.class;
  label age='AGE GROUP';
run;

data class2;
  set sashelp.class;
  label sex='Gender';
run;


data _null_;
 set sashelp.vmember(keep=memname) end=last;
 if _n_ eq 1 then call execute('proc datasets library=work nolist;');
 call execute('modify '||trim(memname)||'; attrib _all_ label=" ";');
 if last then call execute('quit;');
run;

Ksharp

Cyndia · Posted 04-13-2012 11:42 AM

thank you Ksharp. I meant variable names, not variable labels.

let me try it.

Tom · Posted 04-13-2012 12:49 PM

That is not an easy thing to do RIGHT.

How confident are you that the second variable in one dataset has the same meaning as the second variable in the next dataset?

If so then you can use the VARNUM to generate the new variable names.

proc sql noprint ;

select catx('=',name,cats('VAR',varnum)) into :renames separated by ' '

from dictionary.columns

where libname='WORK' and memname='X'

;

quit;

proc append base=all data=x (rename=(&renames));

run;

You might be better off dumping the variable information from all the datasets into spreadsheet and manually creating a column with the new name to assign to each one. Then you can use that to generate the rename commands.

Ksharp · Posted 04-16-2012 04:59 AM

Easy.Assuming all these tables are under WORK library.

data class1;
  set sashelp.class;
  label age='AGE GROUP';
run;

data class2;
  set sashelp.class;
  label sex='Gender';
run;


data _null_;
 set sashelp.vcolumn(keep=libname memname name where=(libname='WORK')) end=last;
 if _n_ eq 1 then call execute('proc datasets library=work nolist;');
 if memname ne lag(memname) then do;n=0;call execute('modify '||trim(memname)||';'); end;
 n+1;
 call execute('rename '||trim(name)||'=var'||strip(n)||';');
 if last then call execute('quit;');
run;

Ksharp

Patrick · Posted 04-16-2012 07:04 AM

Hi Ksharp

I believe there is also a thread with the same question here: https://communities.sas.com/message/123403#123403

I personally like Peter.C's approach of simply using a SQL UNION ALL as it does exactly what the OP is asking for "out of the box".

Cheers

Patrick

Cyndia · Posted 04-16-2012 02:48 PM

UNION ALL seems to do the job well. Thanks!

Ksharp · Posted 04-16-2012 10:01 PM

Hi. Patrick.

Yes, That could do the same job. But That will waste lots and lots of time, because SQL will run through all over the datasets. When you have lots of large tables, that would definitely be a nightmare.

My solution is very very fast, it would not read dataset a single observation, it just modify metadata of datasets directly, SO I recommend to use my way.

Ksharp

Patrick · Posted 04-17-2012 05:05 AM

Hi Ksharp

Totally agree with what you say as long as there is no need to actually physically concatenate the data sets.

Cheers

Patrick

Ksharp · Posted 04-17-2012 10:24 PM

Hi,Patrick

If there is some need to combine these datasets, I believe using APPEND + the way I proposed will also be faster than SQL.

Cheers

Ksharp

proc append lots of datasets that have different names for variables

Re: proc append lots of datasets that have different labels for variables

Re: proc append lots of datasets that have different labels for variables

Re: proc append lots of datasets that have different labels for variables

Re: proc append lots of datasets that have different labels for variables

Re: proc append lots of datasets that have different labels for variables

Re: proc append lots of datasets that have different labels for variables

Re: proc append lots of datasets that have different labels for variables

Re: proc append lots of datasets that have different labels for variables

Re: proc append lots of datasets that have different names for variables

Re: proc append lots of datasets that have different names for variables

Re: proc append lots of datasets that have different names for variables

Re: proc append lots of datasets that have different names for variables

Re: proc append lots of datasets that have different names for variables

Re: proc append lots of datasets that have different names for variables

SAS Innovate 2025: Call for Content

Classroom Training Available!