How to append 100 datasets having similar names

Reply
N/A
Posts: 0

How to append 100 datasets having similar names

I got 100 datasets, let's say data1, data2,...., & data100. I just want to append them together. They all have same fields and so on. I'd like something like this (of course it does not work).

data data_combined;
set data1 -- data100;
run;

Thanks a lot.

-Nilan
SAS Super FREQ
Posts: 8,646

Re: How to append 100 datasets having similar names

Hi:
Well, there's always cut and paste. Seriously though, there are a few different methods you could use. For example, you could write a SAS macro program to detect all the SAS datasets in a certain folder and then automate building a SET statement to concatenate all the files into one SAS data set. Or you could go for a simpler macro program to just type the names of the files for you, as long as the names fall into a pattern that you can automate.

The SAS Macro facility is like a typewriter -- the only difference is that instead of you typing on a keyboard, you write a program and tell the macro facility what kind of code to write for you, and send to the compiler when you invoke it. For example, let's make 4 files, named fil1, fil2, fil3, fil4:

[pre]
** make 4 data files;
data fil1 fil2 fil3 fil4;
x=1; y=2; z=3;
output fil1;
x+2; y+2; z+2;
output fil2;
x+3; y+3; z+3;
output fil3;
x+4; y+4; z+4;
output fil4;
run;
[/pre]

Now, let's say you want to combine the files together in the following ways:
[pre]
data newfile;
set fil1 fil2 fil3 fil4;
run;

** or;

data newfile;
set fil2 fil3 fil4;
run;

** or ;

data newfile;
set fil1 fil2 fil3;
run;
[/pre]

Now we have some working SAS code, but it's sort of a pain to keep typing the data set names over and over -- imagine if you had 100 data sets. The only piece of the code that really needs to change is the list of names of the file after the SET statement. So, how about a macro program that will just build the list of data set names for you:
[pre]

%macro makename(dsn=, start=, stop=);
%do i = &start %to &stop;
&dsn.&i
%end;
%mend makename;
[/pre]

This is just the definition -- it doesn't DO anything until it's invoked. So where would you invoke it?
[pre]
** Now use the macro program to get all 4 files;
options mprint symbolgen;
data newfile1;
set %makename(dsn=fil, start=1, stop=4);
run;

ods listing;
proc print data=newfile1;
title "newfile1";
run;

[/pre]
Or, another way to invoke it would be to use this SET statement:
[pre]
set %makename(dsn=fil, start=2, stop=4);

[/pre]

In this example, the macro variable &DSN will hold the text string FIL and the
%DO loop will iterate from the &START value to the &STOP value.

So the value of &I will increment for each iteration through the loop. The place where the data set name gets generate is between the %DO and the %END...the first time (in the first invocation) through the loop, &DSN will resolve to FIL and &I will resolve to 1 so the macro will type FIL1 for you. Then the next time through the loop, then DSN will still be FIL and &I will be 2 and so the macro will type FIL2, etc, etc.

With the debugging macro options MPRINT and SYMBOLGEN turned on, you can see how the macro variable references resolve by reviewing the generated statements in the SAS Log.

For more help, you can read the SAS Macro Facility documentation which is quite good. Or look for macro program examples on the Tech Support site. Or look for SUGI and SGF papers about SAS Macro programming, like this one:
http://www2.sas.com/proceedings/sugi28/056-28.pdf
For help with a particular macro coding question, Tech Support is always happy to help.

cynthia
N/A
Posts: 0

Re: How to append 100 datasets having similar names

Wow, it's so simple. I wonder why I could not come up with it Smiley Happy I was trying to do it using proc sql.

proc sql;
select day_num into:myfilelist seperated by ' data'
...

I was almost there...but the first file name has a problem with this method.

I'm going to use your method.

Thanks Cynthia.

-Nilan
Super Contributor
Posts: 625

Iterations

Hi Cynthia:
At the outset Happy New year!!

I need your help on this please. Say my data has columns id, category id date

id category_id date
10 100 01/01/2008
10 101 03/01/2008
10 100 02/01/2008
11 111 01/01/2008
11 111 02/01/2008
12 112 01/01/2008
12 13 02/01/2008



identify duplicated complaints that belongs to reoccurrence based on criteria
(1) Compare the n-1 id with the first id (start date is date_1 and cate_ID is cate_ID_1) date_1 is the date on which the first id occurred and cate_id_1 is the cate_id of the first id
a. Interval_1=date_1-date_i (i=2nd, 3rd ..so on id's in the group. say id 10 is repeated 3 times, so we will compare first id with 2nd and 3rd)
b. If interval_1 < time_frame and cat_Id_i=cat_ID_1, than reoccur_1=1 else reoccur_1=0;
c. Sum the value of reoccur_1 within each HIC, define it as reoccur_sum_1;

(2) Compare the n-2 id with the second id (start date is date_2 and cate_ID is cate_ID_2)
a. Interval_2=date_2-date_i
b. If interval_2 < time_frame and cat_Id_i=cat_ID_2, than reoccur_2=2 else reoccur_1=0;
c. Sum the value of reoccur_2 within each HIC, define it as reoccur_sum_2;

(3) ….general, Compare the n-J id with the Jth id (start date is date_J and cate_ID is cate_ID_J)
a. Interval_J=date_J-date_i
b. If interval_J < time_frame and cat_Id_i=cat_ID_J, than reoccur_J=1 else reoccur_J=0;
c. Sum the value of reoccur_J within each HIC, define it as reoccur_sum_J;
(4) Continue doing this until J=n-1. Compare the nth id with the n-1 th id (start date is date_n-1 and cate_ID is cate_ID_n-1)

The task is to know whether there are reoccurened complaints from the value of reoccur_sum_J.



/*** I plan to delete the first.variable everytime and compare with the subsequent observations, This did not help me as there is a case where there are 40 ids and i have to delete the first.variable 39 times.***/
SAS Super FREQ
Posts: 8,646

Re: Iterations

Hi:
I'd recommend reading about the LAG function as one possible alternative and using ARRAYs as the second alternative. There have been many forum postings on both topics and if you search on support.sas.com, you should also find some sample programs that may get you started.
Lag:
http://support.sas.com/forums/thread.jspa?messageID=14418㡒
http://support.sas.com/forums/thread.jspa?messageID=11673⶙
http://support.sas.com/kb/25/938.html
http://support.sas.com/forums/thread.jspa?messageID=11445ⲵ

Array:
http://support.sas.com/rnd/papers/sgf07/arrays1780.pdf
http://support.sas.com/forums/thread.jspa?messageID=13408㑠
http://www2.sas.com/proceedings/sugi30/242-30.pdf
http://www2.sas.com/proceedings/sugi27/p066-27.pdf

If it was my problem, I'd probably take your long skinny data and make a couple of arrays -- one array to hold the dates and another array to hold the category_id values. Then you could use do loops to go back and forth either using n-1 or n+1 arrary subscripting to do your comparisons.

But, you'll have to do some experimenting first to see which approach will work best for your needs.

cynthia
Super Contributor
Posts: 625

Re: How to append 100 datasets having similar names

Hi There,
I have a small problem that is giving me issues.
data rrt;
input id catid1-catid5 reoccur1-reoccur5;
datalines;
10 11 12 13 14 15 0 0 1 1 0
11 12 11 11 14 15 0 1 1 1 0
12 13 12 11 14 12 1 0 1 1 0
13 14 15 12 11 15 1 0 1 1 0
;
run;

options mlogic mprint symbolgen;
%macro F1(N= );
Data rrt2;
Set rrt;

Array catID(&N);
Array Reoccur(&N);
length compvar $10;
Compvar='';
%Do _J=1 %to &N;
/*retain compvar;*/
Cat_ID_N&_J.= '*' || LEFT(TRIM(catID&_J.));

if reoccur&_J. GT 1 and count(compvar,'*')=0 then compvar=Compvar || LEFT(TRIM(cat_ID_N&_J.));
else if reoccur&_J. GT 1 and count(compvar,'*')GT 0 and count(Compvar,Cat_ID_N&_J.)=0 then Compvar=Compvar || LEFT(TRIM(cat_ID_N&_J.));
%end;
count_reoccurence=count(compvar,'*');
run;
%mend;
%F1(N=5 );


My idea is to find the number of reoccurernces. In the above code compvar doesnt contain any values.
And so count_reoccurence dosent return any values.
Super Contributor
Super Contributor
Posts: 3,174

Re: How to append 100 datasets having similar names

May I recommend you create a new post when you have a new question/problem in the SAS Discussion Forums, please? Also, take a moment to consider the particular forum where you post your item, as well.

Regarding your most recent update to your post, your code shows the SAS variable COMPVAR being concatenated, however you will experience data truncation of COMPVAR unless you use LEFT and TRIM or some other technique to remove the blanks with each iterative test and assignment.

Add some SAS PUT _ALL_ statements within your DATA step processing to display "before" and "after" value contents in order to debug your program.

Scott Barry
SBBWorks, Inc.
Super Contributor
Posts: 625

Iterations

hi all,
I need your help on this please. Say my data has columns id, category id date

id category_id date
10 100 01/01/2008
10 101 03/01/2008
10 100 02/01/2008
11 111 01/01/2008
11 111 02/01/2008
12 112 01/01/2008
12 13 02/01/2008



identify duplicated complaints that belongs to reoccurrence based on criteria
(1) Compare the n-1 id with the first id (start date is date_1 and cate_ID is cate_ID_1) date_1 is the date on which the first id occurred and cate_id_1 is the cate_id of the first id
a. Interval_1=date_1-date_i (i=2nd, 3rd ..so on id's in the group. say id 10 is repeated 3 times, so we will compare first id with 2nd and 3rd)
b. If interval_1 < time_frame and cat_Id_i=cat_ID_1, than reoccur_1=1 else reoccur_1=0;
c. Sum the value of reoccur_1 within each HIC, define it as reoccur_sum_1;

(2) Compare the n-2 id with the second id (start date is date_2 and cate_ID is cate_ID_2)
a. Interval_2=date_2-date_i
b. If interval_2 < time_frame and cat_Id_i=cat_ID_2, than reoccur_2=2 else reoccur_1=0;
c. Sum the value of reoccur_2 within each HIC, define it as reoccur_sum_2;

(3) ….general, Compare the n-J id with the Jth id (start date is date_J and cate_ID is cate_ID_J)
a. Interval_J=date_J-date_i
b. If interval_J < time_frame and cat_Id_i=cat_ID_J, than reoccur_J=1 else reoccur_J=0;
c. Sum the value of reoccur_J within each HIC, define it as reoccur_sum_J;
(4) Continue doing this until J=n-1. Compare the nth id with the n-1 th id (start date is date_n-1 and cate_ID is cate_ID_n-1)

The task is to know whether there are reoccurened complaints from the value of reoccur_sum_J.



/*** I plan to delete the first.variable everytime and compare with the subsequent observations, This did not help me as there is a case where there are 40 ids and i have to delete the first.variable 39 times.***/
Post a Question
Discussion Stats
  • 7 replies
  • 617 views
  • 0 likes
  • 4 in conversation