BookmarkSubscribeRSS Feed
Cyndia
Calcite | Level 5

Hi!  Here is the challenge.

There are many datadets with about 50 columns in each dataset.  Every dataset has over thousands of records.

e.g.

one of the datasets looks like:

produce    weight        bags

------------------------------------

carrots     10              500

potatos     5               200

garlic        2               100

...

another dataset looks like (note the variable names are not the same as the above dataset):

fruit         wt.              containers

-----------------------------------------

apples     15              100

pears        9                20

...

After appending, the result should look like this:

produce    weight        bags

------------------------------------

carrots     10              500

potatos     5               200

garlic        2               100

...

apples     15              100

pears        9                20

...

question: What is the best way to proc append lots of large files that have different variable names?

i read about appending, but none address the variable naming issue yet.

Thanks!

14 REPLIES 14
Reeza
Super User

Do you have different labels or variable names?

If the labels are different it won't matter, the base data set will set the labels/variables for the final dataset.

Try it Smiley Happy

data class1;

  set sashelp.class;

  label age='AGE GROUP';

run;

data class2;

  set sashelp.class;

  label sex='Gender';

run;

proc append base=class1 data=class2;

run;

shivas
Pyrite | Level 9

Hi,

Try this...You need to rename the variables.

data one;

input produce $   weight        bags;

cards;

carrots     10              500

potatos     5               200

garlic        2               100

;

run;

data two;

input fruit  $   wt  containers;

cards;

apples     15              100

pears        9                20

;

run;

proc append base=one data=two(rename=(fruit=produce wt=weight containers=bags));

run;

data final;

set one two(rename=(fruit=produce wt=weight containers=bags));

run;

Thanks,

Shiva

Cyndia
Calcite | Level 5

Shiva,

it looks like you provided two options.  one is "proc append", another one is using "data final; set..."

both methods require to rename over 50 fields one by one.  in my case, there are over a few hundred datasets to rename because all of them have different labels...different spelling, abbreviated differently, e.g. fruit, frt, frut, etc.  weight, wt, wgt, w. etc.

wonder if there are other ways to accomplish that.

Thanks you.  I will try it tomorrow.

Cyndia
Calcite | Level 5


Reeza,

so the base dataset is "class1" in your example?  and "AGE GROUP" will be the label for the final dataset named class1?

i don't have access to try it now, but will try tomorrow when i have access.

thanks.Smiley Happy

Cyndia
Calcite | Level 5

is there a way to remove all variable names from all datasets, so all the names would become VAR1, VAR2... for every dataset?  so PROC APPEND and "data final; set..." will not require renaming every field in every dataset?

we have hundreds of datasets with abbreviated names that don't even spell the same way.:smileyshocked:

i made a correction.  it's about variable names, not variable labels.  sorry.

Ksharp
Super User

Answer is Yes.Assuming all these tables are under WORK library.

data class1;
  set sashelp.class;
  label age='AGE GROUP';
run;

data class2;
  set sashelp.class;
  label sex='Gender';
run;


data _null_;
 set sashelp.vmember(keep=memname) end=last;
 if _n_ eq 1 then call execute('proc datasets library=work nolist;');
 call execute('modify '||trim(memname)||'; attrib _all_ label=" ";');
 if last then call execute('quit;');
run;


Ksharp

Cyndia
Calcite | Level 5

thank you Ksharp.  I meant variable names, not variable labels.

let me try it.

Tom
Super User Tom
Super User

That is not an easy thing to do RIGHT.

How confident are you that the second variable in one dataset has the same meaning as the second variable in the next dataset?

If so then you can use the VARNUM to generate the new variable names.

proc sql noprint ;

select catx('=',name,cats('VAR',varnum)) into :renames separated by ' '

  from dictionary.columns

  where libname='WORK' and memname='X'

;

quit;

proc append base=all data=x (rename=(&renames));

run;

You might be better off dumping the variable information from all the datasets into spreadsheet and manually creating a column with the new name to assign to each one.  Then you can use that to generate the rename commands.

Ksharp
Super User

Easy.Assuming all these tables are under WORK library.

data class1;
  set sashelp.class;
  label age='AGE GROUP';
run;

data class2;
  set sashelp.class;
  label sex='Gender';
run;


data _null_;
 set sashelp.vcolumn(keep=libname memname name where=(libname='WORK')) end=last;
 if _n_ eq 1 then call execute('proc datasets library=work nolist;');
 if memname ne lag(memname) then do;n=0;call execute('modify '||trim(memname)||';'); end;
 n+1;
 call execute('rename '||trim(name)||'=var'||strip(n)||';');
 if last then call execute('quit;');
run;


Ksharp

Patrick
Opal | Level 21

Hi Ksharp

I believe there is also a thread with the same question here: https://communities.sas.com/message/123403#123403

I personally like Peter.C's approach of simply using a SQL UNION ALL as it does exactly what the OP is asking for "out of the box".

Cheers

Patrick

Cyndia
Calcite | Level 5

UNION ALL seems to do the job well.  Thanks! Smiley Happy

Ksharp
Super User

Hi. Patrick.

Yes, That could do the same job. But That will waste lots and lots of time, because SQL will run through all over the datasets. When you have lots of large tables, that would definitely be a nightmare.

My solution is very very fast, it would not read dataset a single observation, it just modify metadata of datasets directly, SO I recommend to use my way.

Ksharp

Patrick
Opal | Level 21

Hi Ksharp

Totally agree with what you say as long as there is no need to actually physically concatenate the data sets.

Cheers

Patrick

Ksharp
Super User

Hi,Patrick

If there is some need to combine these datasets, I believe using APPEND + the way I proposed will also be faster than SQL.

Cheers

Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 9564 views
  • 1 like
  • 6 in conversation