Merge data by date

Reply
Occasional Contributor
Posts: 9

Merge data by date

Hi,

 

I am a relatively new user of SAS.

 

I need to merge two MS excel files by date.

 

 However, from the second file (file 2), I need to merge only observations that are within 30 days of each date listed in the first excel file (file 1).

 

Pls do provide some insight on the SAS code I can use for this. My code merges all data in both files.

 


%LET AllData=
File1
File2
;
%_import(&AllData);


data TR;
set 
File1
File2
;




 

 

Ari.

Super User
Posts: 21,530

Re: Merge data by date

Can you provide some sample data that we can work with? Without knowing your data structure we'd be making wild guesses. 

If your data is confidential make fake data that reflects your data.

Occasional Contributor
Posts: 9

Re: Merge data by date

Sure, Reeza.

 

 

File 1:

Dateqqrrvv
1/1/1990366.12366.120.011158
1/3/1990353.94353.940.018884
1/6/1990359.53359.530.012618
1/7/1990354.88354.880.014233
1/9/1990365.24365.240.012643
1/14/1990369.79369.790.011378

 

 

 

File 2:

Dateaabbff
1/5/199020.1120.110.04426
1/8/199020.2620.260.0074
1/9/199022.222.20.08739
1/10/199022.4422.440.0107
1/11/199020.0520.05-0.1192
1/16/199024.1824.18-0.0893
1/17/199024.1624.16-0.0008
1/18/199024.3424.340.0074
1/19/199022.522.5-0.0818
1/22/199026.726.70.1573
1/20/199027.2527.250.02972
2/11/199025.3625.36-0.0745
2/17/199024.8724.87-0.0197
2/22/199024.3224.32-0.0226
2/25/199024.5424.540.00897
2/26/199024.6924.690.00608
3/27/199024.2924.29-0.0165

 

 

I want to merge only observations from 'file 2' that are within 30 days of each date listed in 'File 1'.

 

The merged file should look like:

Dateqqrrvvaabbff
1/1/1990366.12366.120.011158   
1/3/1990353.94353.940.018884   
1/5/1990   20.11000120.1100010.04425669
1/6/1990359.53359.530.012618   
1/7/1990354.88354.880.014233   
1/8/1990   20.2620.260.0074037
1/9/1990365.24365.240.012643   
1/9/1990   22.20000122.2000010.08738743
1/10/1990   22.44000122.4400010.01069519
1/11/1990   20.04999920.049999-0.1192021
1/16/1990   369.79369.790.011378
1/17/1990   24.1624.16-0.0008278
1/18/1990   24.3424.340.00739523
1/19/1990   22.522.5-0.0817778
1/22/1990   26.70000126.7000010.1573034
1/20/1990   27.2527.250.02972473
2/11/1990   25.36000125.360001-0.0745268
Trusted Advisor
Posts: 1,148

Re: Merge data by date

Why does file1 have  a date=01/14/1990, but the wanted file does not?

Super User
Posts: 12,148

Re: Merge data by date

1) show the code you used that doesn't do what you want so we don't recommend that.

2) provide some example data from each set NOT as spread sheets but as SAS data step code: Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

3) Show what the results should be for your given example data (just enough records to show the different cases.

 

Details are needed because we can't tell if only want one record from file 2 matched to one record from file 1 or potentially multiples of each.

Trusted Advisor
Posts: 1,683

Re: Merge data by date

[ Edited ]

You may try next code:

data all_files;
    set file1 file2 indsname=_file_in;
    input_file = _file_in;
run;

proc sort data=all_files; by date input_file; run;

data want;
      retain lag_file lag_date phase;
      lag_file = lag(input_file);
      if lag_file = 'FILE1' then do;
         pase = 1; 
         lag_date = date;
     end;
     else phase =2;

      if phase=1 or 
_N_ > 1 and phase = 2 and date - lag_date le 30; run;

 

Occasional Contributor
Posts: 9

Re: Merge data by date

Thank you, Shumel.

 

Unfortunately, this code did not read all my data properly.

 

I used this:

 

proc sql noprint;
create table file3 as
select * from file1 as a, file2 as b
where a.date1-30 <= b.date2 <= a.date1+30;
quit;
 
Trusted Advisor
Posts: 1,683

Re: Merge data by date

1) According to input you posted, the date variable is named date.

    Where from did you get date1   date2  in your code ?

              

proc sql noprint;
create table file3 as
select * from file1 as a, file2 as b
where a.date1-30 <= b.date2 <= a.date1+30;
quit;
 
2)  File1 contains 6 obs. File2 contains 17 rows.
     At most expected output is 23 rows.
 
     Running your code, changing date1 and date2 into date results into 67 rows.
     That's because sql creates cartesian merge and there are more than one matches per File2 row.
     So it creates duplicates.
 
3) I have fixed my code. The result contains 17 rows - all rows from File1 and the matching rows from File2:
    
data file1;
 infile cards dlm='09'x truncover;
 input date mmddyy10. qq rr vv ;
 format date yymmdd10.;
cards;
01/01/1990	366.12	366.12	0.011158
01/03/1990	353.94	353.94	0.018884
01/06/1990	359.53	359.53	0.012618
01/07/1990	354.88	354.88	0.014233
01/09/1990	365.24	365.24	0.012643
01/14/1990	369.79	369.79	0.011378
; run;

data file2;
 infile cards dlm='09'x truncover;
 input date mmddyy10. aa bb ff;
 
cards;
01/05/1990	20.11	20.11	0.04426
01/08/1990	20.26	20.26	0.0074
01/09/1990	22.2	22.2	0.08739
01/10/1990	22.44	22.44	0.0107
01/11/1990	20.05	20.05	-0.1192
01/16/1990	24.18	24.18	-0.0893
01/17/1990	24.16	24.16	-0.0008
01/18/1990	24.34	24.34	0.0074
01/19/1990	22.5	22.5	-0.0818
01/22/1990	26.7	26.7	0.1573
01/20/1990	27.25	27.25	0.02972
02/11/1990	25.36	25.36	-0.0745
02/17/1990	24.87	24.87	-0.0197
02/22/1990	24.32	24.32	-0.0226
02/25/1990	24.54	24.54	0.00897
02/26/1990	24.69	24.69	0.00608
03/27/1990	24.29	24.29	-0.0165
; run;

/**** YOUR CODE ***** proc sql noprint; create table file3 as select * from file1 as a, file2 as b where a.date-30 <= b.date <= a.date+30; quit;
******************/ data all_files; set file1 file2 indsname=_file_in; input_file = _file_in; run; proc sort data=all_files; by date input_file; run; data want; set all_files; retain lag_file lag_date phase; lag_file = lag(input_file); if lag_file = 'WORK.FILE1' then do; /* this line was fixed */ pase = 1; lag_date = date; end; else phase =2; if phase=1 or _N_ > 1 and phase = 2 and date - lag_date le 30; run;
 
Occasional Contributor
Posts: 9

Re: Merge data by date

Looks good, Shmuel. Thank you.
PROC Star
Posts: 7,791

Re: Merge data by date

Sorry for being late to the tread but, unless I missed something, I don't think you have gotten suggested code that meets your specs.

 

If I correctly understand what you want, wouldn't something like the following suffice?:

proc sql noprint;
  select min(date) into :mindate
    from file1
  ;
quit;

data want;
  set file1 file2 (where=(&mindate. <= date <= (&mindate.+30))); 
run;

Art, CEO, AnalystFinder.com

 

Ask a Question
Discussion stats
  • 9 replies
  • 168 views
  • 0 likes
  • 6 in conversation