- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi! Here is the challenge.
There are many datadets with about 50 columns in each dataset. Every dataset has over thousands of records.
e.g.
one of the datasets looks like:
produce weight bags
------------------------------------
carrots 10 500
potatos 5 200
garlic 2 100
...
another dataset looks like (note the variable names are not the same as the above dataset):
fruit wt. containers
-----------------------------------------
apples 15 100
pears 9 20
...
After appending, the result should look like this:
produce weight bags
------------------------------------
carrots 10 500
potatos 5 200
garlic 2 100
...
apples 15 100
pears 9 20
...
question: What is the best way to proc append lots of large files that have different variable names?
i read about appending, but none address the variable naming issue yet.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do you have different labels or variable names?
If the labels are different it won't matter, the base data set will set the labels/variables for the final dataset.
Try it
data class1;
set sashelp.class;
label age='AGE GROUP';
run;
data class2;
set sashelp.class;
label sex='Gender';
run;
proc append base=class1 data=class2;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Try this...You need to rename the variables.
data one;
input produce $ weight bags;
cards;
carrots 10 500
potatos 5 200
garlic 2 100
;
run;
data two;
input fruit $ wt containers;
cards;
apples 15 100
pears 9 20
;
run;
proc append base=one data=two(rename=(fruit=produce wt=weight containers=bags));
run;
data final;
set one two(rename=(fruit=produce wt=weight containers=bags));
run;
Thanks,
Shiva
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Shiva,
it looks like you provided two options. one is "proc append", another one is using "data final; set..."
both methods require to rename over 50 fields one by one. in my case, there are over a few hundred datasets to rename because all of them have different labels...different spelling, abbreviated differently, e.g. fruit, frt, frut, etc. weight, wt, wgt, w. etc.
wonder if there are other ways to accomplish that.
Thanks you. I will try it tomorrow.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Reeza,
so the base dataset is "class1" in your example? and "AGE GROUP" will be the label for the final dataset named class1?
i don't have access to try it now, but will try tomorrow when i have access.
thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
is there a way to remove all variable names from all datasets, so all the names would become VAR1, VAR2... for every dataset? so PROC APPEND and "data final; set..." will not require renaming every field in every dataset?
we have hundreds of datasets with abbreviated names that don't even spell the same way.:smileyshocked:
i made a correction. it's about variable names, not variable labels. sorry.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Answer is Yes.Assuming all these tables are under WORK library.
data class1; set sashelp.class; label age='AGE GROUP'; run; data class2; set sashelp.class; label sex='Gender'; run; data _null_; set sashelp.vmember(keep=memname) end=last; if _n_ eq 1 then call execute('proc datasets library=work nolist;'); call execute('modify '||trim(memname)||'; attrib _all_ label=" ";'); if last then call execute('quit;'); run;
Ksharp
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thank you Ksharp. I meant variable names, not variable labels.
let me try it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That is not an easy thing to do RIGHT.
How confident are you that the second variable in one dataset has the same meaning as the second variable in the next dataset?
If so then you can use the VARNUM to generate the new variable names.
proc sql noprint ;
select catx('=',name,cats('VAR',varnum)) into :renames separated by ' '
from dictionary.columns
where libname='WORK' and memname='X'
;
quit;
proc append base=all data=x (rename=(&renames));
run;
You might be better off dumping the variable information from all the datasets into spreadsheet and manually creating a column with the new name to assign to each one. Then you can use that to generate the rename commands.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Easy.Assuming all these tables are under WORK library.
data class1; set sashelp.class; label age='AGE GROUP'; run; data class2; set sashelp.class; label sex='Gender'; run; data _null_; set sashelp.vcolumn(keep=libname memname name where=(libname='WORK')) end=last; if _n_ eq 1 then call execute('proc datasets library=work nolist;'); if memname ne lag(memname) then do;n=0;call execute('modify '||trim(memname)||';'); end; n+1; call execute('rename '||trim(name)||'=var'||strip(n)||';'); if last then call execute('quit;'); run;
Ksharp
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ksharp
I believe there is also a thread with the same question here: https://communities.sas.com/message/123403#123403
I personally like Peter.C's approach of simply using a SQL UNION ALL as it does exactly what the OP is asking for "out of the box".
Cheers
Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
UNION ALL seems to do the job well. Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi. Patrick.
Yes, That could do the same job. But That will waste lots and lots of time, because SQL will run through all over the datasets. When you have lots of large tables, that would definitely be a nightmare.
My solution is very very fast, it would not read dataset a single observation, it just modify metadata of datasets directly, SO I recommend to use my way.
Ksharp
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ksharp
Totally agree with what you say as long as there is no need to actually physically concatenate the data sets.
Cheers
Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,Patrick
If there is some need to combine these datasets, I believe using APPEND + the way I proposed will also be faster than SQL.
Cheers
Ksharp