DATA Step, Macro, Functions and more

Merging multiple CSV files to a dataset.

Reply
Contributor
Posts: 49

Merging multiple CSV files to a dataset.

Hi, I currently have 16 files with the similar names such as A1, A2, A3, etc, I also have a merge base file called B. What I want to do is merge 16 As files to B into one file. Suppose all the data is sorted and should be merged by date. I wrote the code for merging but doesn't work very well. The problem is some of the column variables in As are missing after merge. Does someone has any suggestion? Any help with this will be appreciated!

Here is the code I wrote.

DATA work.Merged; 
	MERGE work.A: work.B; 										
	BY Date; 
RUN; 
Super User
Posts: 19,878

Re: Merging multiple CSV files to a dataset.

You need to post more details. Especially around:

The problem is some of the column variables in As are missing after merge.

Super User
Super User
Posts: 7,997

Re: Merging multiple CSV files to a dataset.

Do you want to merge the datasets?   Sounds more like you want a set.  Anyways, if variables appear in multiple datasets then they will only appear once from the first dataset.  Also, your date might not match across all datasets, if so the variables which would come from that dataset will be blank in the instance where there is no match.  Provide test data - in the form of a datastep, and put in the {i} tool above post - for further help.

Super User
Posts: 11,343

Re: Merging multiple CSV files to a dataset.

Merge BY will take the values from the right most dataset on the merge statement with the named variable. So if work.B is missing a value for variable XYZZY, then the resulting data set will have a missing value for XYZZY.

 

Notice in this example that the values for the variable B all come from dataset work.two except for the value of A that is not in work.two.

data work.one;
   input a b;
datalines;
1 24
2 34
3 44
4 66
;
run;
data work.two;
   input a b;
datalines;
1  .
2  33
3  99
;
run;

data merged;
   merge work.one work.two;
   by a;
run;

Expand this to 17 files it can get very difficult to guess which data set may be causing a specific missing result.

 

Are the variables in the A datasets all supposed to be the same or do they have different variables in some?

If a variable only has values in one data set, then if it doesn't contribute to all records on the BY variable value then those records without a match will be missing as well.

Ask a Question
Discussion stats
  • 3 replies
  • 74 views
  • 0 likes
  • 4 in conversation