The following have has 50000 observations. The data step deletes the second half of have, so the last sql returns 25000.
data have;
do i=1 to 50000;
output;
end;
run;
data have;
set have;
if i>25000 then delete;
run;
proc sql;
select nobs from dictionary.tables where libname="WORK" and memname="HAVE";
quit;The following have identically has 50000 observations. The second sql identically deletes the second half of have, but the third sql still returns 50000.
data have;
do i=1 to 50000;
output;
end;
run;
proc sql;
delete from have where i>25000;
quit;
proc sql;
select nobs from dictionary.tables where libname="WORK" and memname="HAVE";
quit;I have never thought about the second possibility because I don't delete observations using sql. I found this issue from the advanced programming sample questions—the nobs and nlobs of dictionary.columns are different. How are the data delete and sql delete different? Why does SAS distinguish these two deletes?
If you delete one row in a 10-billion-row table, you might want to avoid rewriting the table.
It's a matter of balance: waste due to rewriting vs waste due to unused space.
I don't think the rows can be un-deleted.
In the first case you are overwriting the data set.
In the last two cases you keep the data set and you flag some observations as deleted, but the physical number of observations is unchanged.
Look at the log messages to see that.
data T1 T2 T3;
do I=1 to 50000;
output;
end;
run;
data T1;
set T1;
if I > 25000 then delete;
run;
data T2;
modify T2;
if I > 25000 then delete;
run;
proc sql;
delete from T3 where I > 25000;
quit;
data _null_;
set T1 nobs=N1;
set T2 nobs=N2;
set T3 nobs=N3;
putlog N1= N2= N3=;
stop;
run;
NOTE: The data set WORK.T1 has 25000 observations and 1 variables.
NOTE: The data set WORK.T2 has been updated. There were 25000 observations rewritten, 0 observations added and 0 observations deleted.
NOTE: 25000 rows were deleted from WORK.T3.
N1=25000 N2=50000 N3=50000
Thanks for these details. Then, (1) what is the advantage of the modify delete and the sql delete over the set delete? It seems they are still occupying the space so there is no storage benefit. Why do we need these? (2) Can a user countermand the modify delete or the sql delete to rescue the observations seemingly deleted?
If you delete one row in a 10-billion-row table, you might want to avoid rewriting the table.
It's a matter of balance: waste due to rewriting vs waste due to unused space.
I don't think the rows can be un-deleted.
Actually my example is misleading. Apologies about this.
data T2;
modify T2;
if I > 20000 then delete;
run;
data T2;
modify T2;
if I > 20000 then remove;
run;
The first data step does nothing. The delete statement just stops the processing of the step for the flagged observations.
The second data step flags the rows as deleted.
NOTE: The data set WORK.T2 has been updated. There were 20000 observations rewritten, 0 observations added and 0 observations deleted.
NOTE: The data set WORK.T2 has been updated. There were 0 observations rewritten, 0 observations added and 30000 observations deleted.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.