Re: Help regarding removing records from a dataset by matching

deleted_user · Posted 12-04-2009 05:46 AM

Hi,

My requirement is to match 2 files(of different record counts) using few key fields and if they match exactly then i need to remove those records from the 1st file. i.e. i need to write only the unmatched records from File-1 to output.

I tried using the MERGE operation which works fine, but the prob is the SORT Order of my original file gets impacted and i am not able to find the original SORT Order to resort the file to its original sort order as LRECL is very large(LRECL=1250). Morever the number of records in the file-1 is also very large. So, when the merged contents are written to a dataset it gets abended due to space issue.

i also tried using SELECT statement from PROC SQL. But still i am facing the same space issue when i write the unmatched contents to a table to be written to the output file.

I tried another option using INDEX function to match the key fields. Here also i encounter a problem. The number of records that can be matched between these 2 files is taken as file-2(which has least count). And so the unmatched records from File-1 that are written to output is exactly till the same count as in File-2. Is there any way to use the INDEX option between 2 files which do not have same number of record counts?

Could you please help me out with any other options for matching these 2 files, which in turn would not require writing the entire contents of File-1 to a dataset (so that i can avoid the space issue)?

Thanks,
San

FredrikE · Posted 12-04-2009 06:29 AM

proc sql;
create view match as select a.*,b.unique_b_var as check from
(select * from table1) as a
left join
(select * from table2) as b
on a.var1 = b.var1
and a.var2 = b.var2
and a.var(n) = b.var(n)
;

create new_table as
select *
from match
where check eq .
;
quit;

Flip · Posted 12-04-2009 08:04 AM

It can be done in one step.

proc sql;
create table new as select * from old o where not exists( select * from mark m
where m.var1 = o.var1 and m.var2 = o.var2);
quit;

or in place as
proc sql;
delete from old o where exists( select * from mark m
where m.var1 = o.var1 and m.var2 = o.var2);
quit;

deleted_user · Posted 12-04-2009 09:42 AM

Hi,

Thanks for ur inputs. But when i write the contents to a table using the PROC SQL Statements given in ur replies, i still get the space issue like below.

"Write to WORK.'@T000000'n.UTILITY failed. File is full and may be damaged"..

Is there any way to resolve this?

Thanks,
San

Help regarding removing records from a dataset by matching