Solved: Re: DATA DELETE Versus SQL DELETE?

Junyong · Posted 09-27-2020 09:33 PM

The following have has 50000 observations. The data step deletes the second half of have, so the last sql returns 25000.

data have;
do i=1 to 50000;
output;
end;
run;

data have;
set have;
if i>25000 then delete;
run;

proc sql;
select nobs from dictionary.tables where libname="WORK" and memname="HAVE";
quit;

The following have identically has 50000 observations. The second sql identically deletes the second half of have, but the third sql still returns 50000.

data have;
do i=1 to 50000;
output;
end;
run;

proc sql;
delete from have where i>25000;
quit;

proc sql;
select nobs from dictionary.tables where libname="WORK" and memname="HAVE";
quit;

I have never thought about the second possibility because I don't delete observations using sql. I found this issue from the advanced programming sample questions—the nobs and nlobs of dictionary.columns are different. How are the data delete and sql delete different? Why does SAS distinguish these two deletes?

ChrisNZ · Posted 09-27-2020 11:31 PM

If you delete one row in a 10-billion-row table, you might want to avoid rewriting the table.

It's a matter of balance: waste due to rewriting vs waste due to unused space.

I don't think the rows can be un-deleted.

High-Performance SAS Coding - Third Edition

View solution in original post

ChrisNZ · Posted 09-27-2020 11:12 PM

In the first case you are overwriting the data set.

In the last two cases you keep the data set and you flag some observations as deleted, but the physical number of observations is unchanged.

Look at the log messages to see that.

data T1 T2 T3;
  do I=1 to 50000;
    output;
  end;
run;
     
data T1;
  set T1;
  if I > 25000 then delete;
run;

data T2;
  modify T2;
  if I > 25000 then delete;
run;

proc sql;
  delete from T3 where I > 25000;
quit;

data _null_;
  set T1 nobs=N1;
  set T2 nobs=N2;
  set T3 nobs=N3;
  putlog N1= N2= N3=;
  stop;
run;

NOTE: The data set WORK.T1 has 25000 observations and 1 variables.

NOTE: The data set WORK.T2 has been updated. There were 25000 observations rewritten, 0 observations added and 0 observations deleted.

NOTE: 25000 rows were deleted from WORK.T3.

N1=25000 N2=50000 N3=50000

High-Performance SAS Coding - Third Edition

Junyong · Posted 09-27-2020 11:26 PM

Thanks for these details. Then, (1) what is the advantage of the modify delete and the sql delete over the set delete? It seems they are still occupying the space so there is no storage benefit. Why do we need these? (2) Can a user countermand the modify delete or the sql delete to rescue the observations seemingly deleted?

ChrisNZ · Posted 09-27-2020 11:31 PM

If you delete one row in a 10-billion-row table, you might want to avoid rewriting the table.

It's a matter of balance: waste due to rewriting vs waste due to unused space.

I don't think the rows can be un-deleted.

High-Performance SAS Coding - Third Edition

ChrisNZ · Posted 09-27-2020 11:41 PM

Actually my example is misleading. Apologies about this.

data T2;
  modify T2;
  if I > 20000 then delete;
run;

data T2;
  modify T2;
  if I > 20000 then remove;
run;

The first data step does nothing. The delete statement just stops the processing of the step for the flagged observations.

The second data step flags the rows as deleted.

NOTE: The data set WORK.T2 has been updated. There were 20000 observations rewritten, 0 observations added and 0 observations deleted.

NOTE: The data set WORK.T2 has been updated. There were 0 observations rewritten, 0 observations added and 30000 observations deleted.

High-Performance SAS Coding - Third Edition

DATA DELETE Versus SQL DELETE?

Re: DATA DELETE Versus SQL DELETE?

Re: DATA DELETE Versus SQL DELETE?

Re: DATA DELETE Versus SQL DELETE?

Re: DATA DELETE Versus SQL DELETE?

Re: DATA DELETE Versus SQL DELETE?

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away