BookmarkSubscribeRSS Feed
arii
Obsidian | Level 7

Hi,

 

I am a relatively new user of SAS.

 

I need to merge two MS excel files by date.

 

 However, from the second file (file 2), I need to merge only observations that are within 30 days of each date listed in the first excel file (file 1).

 

Pls do provide some insight on the SAS code I can use for this. My code merges all data in both files.

 


%LET AllData=
File1
File2
;
%_import(&AllData);


data TR;
set 
File1
File2
;




 

 

Ari.

9 REPLIES 9
Reeza
Super User

Can you provide some sample data that we can work with? Without knowing your data structure we'd be making wild guesses. 

If your data is confidential make fake data that reflects your data.

arii
Obsidian | Level 7

Sure, Reeza.

 

 

File 1:

Dateqqrrvv
1/1/1990366.12366.120.011158
1/3/1990353.94353.940.018884
1/6/1990359.53359.530.012618
1/7/1990354.88354.880.014233
1/9/1990365.24365.240.012643
1/14/1990369.79369.790.011378

 

 

 

File 2:

Dateaabbff
1/5/199020.1120.110.04426
1/8/199020.2620.260.0074
1/9/199022.222.20.08739
1/10/199022.4422.440.0107
1/11/199020.0520.05-0.1192
1/16/199024.1824.18-0.0893
1/17/199024.1624.16-0.0008
1/18/199024.3424.340.0074
1/19/199022.522.5-0.0818
1/22/199026.726.70.1573
1/20/199027.2527.250.02972
2/11/199025.3625.36-0.0745
2/17/199024.8724.87-0.0197
2/22/199024.3224.32-0.0226
2/25/199024.5424.540.00897
2/26/199024.6924.690.00608
3/27/199024.2924.29-0.0165

 

 

I want to merge only observations from 'file 2' that are within 30 days of each date listed in 'File 1'.

 

The merged file should look like:

Dateqqrrvvaabbff
1/1/1990366.12366.120.011158   
1/3/1990353.94353.940.018884   
1/5/1990   20.11000120.1100010.04425669
1/6/1990359.53359.530.012618   
1/7/1990354.88354.880.014233   
1/8/1990   20.2620.260.0074037
1/9/1990365.24365.240.012643   
1/9/1990   22.20000122.2000010.08738743
1/10/1990   22.44000122.4400010.01069519
1/11/1990   20.04999920.049999-0.1192021
1/16/1990   369.79369.790.011378
1/17/1990   24.1624.16-0.0008278
1/18/1990   24.3424.340.00739523
1/19/1990   22.522.5-0.0817778
1/22/1990   26.70000126.7000010.1573034
1/20/1990   27.2527.250.02972473
2/11/1990   25.36000125.360001-0.0745268
mkeintz
PROC Star

Why does file1 have  a date=01/14/1990, but the wanted file does not?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ballardw
Super User

1) show the code you used that doesn't do what you want so we don't recommend that.

2) provide some example data from each set NOT as spread sheets but as SAS data step code: Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

3) Show what the results should be for your given example data (just enough records to show the different cases.

 

Details are needed because we can't tell if only want one record from file 2 matched to one record from file 1 or potentially multiples of each.

Shmuel
Garnet | Level 18

You may try next code:

data all_files;
    set file1 file2 indsname=_file_in;
    input_file = _file_in;
run;

proc sort data=all_files; by date input_file; run;

data want;
      retain lag_file lag_date phase;
      lag_file = lag(input_file);
      if lag_file = 'FILE1' then do;
         pase = 1; 
         lag_date = date;
     end;
     else phase =2;

      if phase=1 or 
_N_ > 1 and phase = 2 and date - lag_date le 30; run;

 

arii
Obsidian | Level 7

Thank you, Shumel.

 

Unfortunately, this code did not read all my data properly.

 

I used this:

 

proc sql noprint;
create table file3 as
select * from file1 as a, file2 as b
where a.date1-30 <= b.date2 <= a.date1+30;
quit;
 
Shmuel
Garnet | Level 18

1) According to input you posted, the date variable is named date.

    Where from did you get date1   date2  in your code ?

              

proc sql noprint;
create table file3 as
select * from file1 as a, file2 as b
where a.date1-30 <= b.date2 <= a.date1+30;
quit;
 
2)  File1 contains 6 obs. File2 contains 17 rows.
     At most expected output is 23 rows.
 
     Running your code, changing date1 and date2 into date results into 67 rows.
     That's because sql creates cartesian merge and there are more than one matches per File2 row.
     So it creates duplicates.
 
3) I have fixed my code. The result contains 17 rows - all rows from File1 and the matching rows from File2:
    
data file1;
 infile cards dlm='09'x truncover;
 input date mmddyy10. qq rr vv ;
 format date yymmdd10.;
cards;
01/01/1990	366.12	366.12	0.011158
01/03/1990	353.94	353.94	0.018884
01/06/1990	359.53	359.53	0.012618
01/07/1990	354.88	354.88	0.014233
01/09/1990	365.24	365.24	0.012643
01/14/1990	369.79	369.79	0.011378
; run;

data file2;
 infile cards dlm='09'x truncover;
 input date mmddyy10. aa bb ff;
 
cards;
01/05/1990	20.11	20.11	0.04426
01/08/1990	20.26	20.26	0.0074
01/09/1990	22.2	22.2	0.08739
01/10/1990	22.44	22.44	0.0107
01/11/1990	20.05	20.05	-0.1192
01/16/1990	24.18	24.18	-0.0893
01/17/1990	24.16	24.16	-0.0008
01/18/1990	24.34	24.34	0.0074
01/19/1990	22.5	22.5	-0.0818
01/22/1990	26.7	26.7	0.1573
01/20/1990	27.25	27.25	0.02972
02/11/1990	25.36	25.36	-0.0745
02/17/1990	24.87	24.87	-0.0197
02/22/1990	24.32	24.32	-0.0226
02/25/1990	24.54	24.54	0.00897
02/26/1990	24.69	24.69	0.00608
03/27/1990	24.29	24.29	-0.0165
; run;

/**** YOUR CODE ***** proc sql noprint; create table file3 as select * from file1 as a, file2 as b where a.date-30 <= b.date <= a.date+30; quit;
******************/ data all_files; set file1 file2 indsname=_file_in; input_file = _file_in; run; proc sort data=all_files; by date input_file; run; data want; set all_files; retain lag_file lag_date phase; lag_file = lag(input_file); if lag_file = 'WORK.FILE1' then do; /* this line was fixed */ pase = 1; lag_date = date; end; else phase =2; if phase=1 or _N_ > 1 and phase = 2 and date - lag_date le 30; run;
 
arii
Obsidian | Level 7
Looks good, Shmuel. Thank you.
art297
Opal | Level 21

Sorry for being late to the tread but, unless I missed something, I don't think you have gotten suggested code that meets your specs.

 

If I correctly understand what you want, wouldn't something like the following suffice?:

proc sql noprint;
  select min(date) into :mindate
    from file1
  ;
quit;

data want;
  set file1 file2 (where=(&mindate. <= date <= (&mindate.+30))); 
run;

Art, CEO, AnalystFinder.com

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 758 views
  • 0 likes
  • 6 in conversation