The following have
has 50000 observations. The data
step delete
s the second half of have
, so the last sql
returns 25000.
data have;
do i=1 to 50000;
output;
end;
run;
data have;
set have;
if i>25000 then delete;
run;
proc sql;
select nobs from dictionary.tables where libname="WORK" and memname="HAVE";
quit;
The following have
identically has 50000 observations. The second sql
identically delete
s the second half of have
, but the third sql
still returns 50000.
data have;
do i=1 to 50000;
output;
end;
run;
proc sql;
delete from have where i>25000;
quit;
proc sql;
select nobs from dictionary.tables where libname="WORK" and memname="HAVE";
quit;
I have never thought about the second possibility because I don't delete
observations using sql
. I found this issue from the advanced programming sample questions—the nobs
and nlobs
of dictionary.columns
are different. How are the data delete
and sql delete
different? Why does SAS distinguish these two delete
s?
If you delete one row in a 10-billion-row table, you might want to avoid rewriting the table.
It's a matter of balance: waste due to rewriting vs waste due to unused space.
I don't think the rows can be un-deleted.
In the first case you are overwriting the data set.
In the last two cases you keep the data set and you flag some observations as deleted, but the physical number of observations is unchanged.
Look at the log messages to see that.
data T1 T2 T3;
do I=1 to 50000;
output;
end;
run;
data T1;
set T1;
if I > 25000 then delete;
run;
data T2;
modify T2;
if I > 25000 then delete;
run;
proc sql;
delete from T3 where I > 25000;
quit;
data _null_;
set T1 nobs=N1;
set T2 nobs=N2;
set T3 nobs=N3;
putlog N1= N2= N3=;
stop;
run;
NOTE: The data set WORK.T1 has 25000 observations and 1 variables.
NOTE: The data set WORK.T2 has been updated. There were 25000 observations rewritten, 0 observations added and 0 observations deleted.
NOTE: 25000 rows were deleted from WORK.T3.
N1=25000 N2=50000 N3=50000
Thanks for these details. Then, (1) what is the advantage of the modify delete
and the sql delete
over the set delete
? It seems they are still occupying the space so there is no storage benefit. Why do we need these? (2) Can a user countermand the modify delete
or the sql delete
to rescue the observations seemingly deleted?
If you delete one row in a 10-billion-row table, you might want to avoid rewriting the table.
It's a matter of balance: waste due to rewriting vs waste due to unused space.
I don't think the rows can be un-deleted.
Actually my example is misleading. Apologies about this.
data T2;
modify T2;
if I > 20000 then delete;
run;
data T2;
modify T2;
if I > 20000 then remove;
run;
The first data step does nothing. The delete statement just stops the processing of the step for the flagged observations.
The second data step flags the rows as deleted.
NOTE: The data set WORK.T2 has been updated. There were 20000 observations rewritten, 0 observations added and 0 observations deleted.
NOTE: The data set WORK.T2 has been updated. There were 0 observations rewritten, 0 observations added and 30000 observations deleted.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.