Help using Base SAS procedures

Observation Pointers / Table Vector after a REMOVE statement

Reply
Occasional Contributor PHK
Occasional Contributor
Posts: 11

Observation Pointers / Table Vector after a REMOVE statement

Hi,

I need help with the following:
In the SAS Help files, I got the code:

data accounts;
input AcctNumber Credit;
datalines;
1001 1500
1002 4900
1003 3000
;

data accounts;
modify accounts;
if AcctNumber=1002 then remove;
run;


NOW I ADDED THE CODE:

data accounts2;
input AcctNumber Credit;
datalines;
1004 3800
1005 9200
;
proc append base = accounts data = accounts2; run;


THE OBSERVATION NUMBERS IN THE TABLE accounts ARE 1,3,4,5. THIS IS NOT WHAT I DESIRE TO HAVE; I WANT TO HAVE: 1,2,3,4.

NOW, IF I USE THE POINT= FUNCTION AS FOLLOWS:

data test;
do n = 1 to 5;
set accounts point = n; output;
end;
stop;
run;

THERE IS STILL AN OBSERVATION 2, BUT ALSO AN OBSERVATION 5.

IS THERE ANY WAY TO CORRECT THE TABLE VECTOR AFTER A REMOVE STATEMENT?
Super Contributor
Super Contributor
Posts: 365

Re: Observation Pointers / Table Vector after a REMOVE statement

Hello PHK,

What do you want to achieve? If you want to simply renumber you accounts after deleting it is very easy to achieve using n+1 statement. Or you need something else?

Sincerely,
SPR
Occasional Contributor PHK
Occasional Contributor
Posts: 11

Re: Observation Pointers / Table Vector after a REMOVE statement

Hi SPR,

I don't really understand how the n+1 statement could be used to solve the problem, for I don't really want to add a variable n in the dataset. (Or how does one use the n+1 statement without creating a variable named "n"?) My problem is that, whereas the final dataset has only 4 observations, SAS somehow has a reference to 5 observations (as illustrated by the last piece of code).

The example provided is only an illustration of the problem. Actually I am working with a large dataset, from which I remove the last 5% or so of the data. I then add new data using PROC APPEND, because I don't want to rewrite all the data in the original dataset. Afterwards, I am using a custom binary search using the POINT = statement. But this is exactly where the problem comes in that is illustrated by the example.

The problem, as in the example, is that when the second observation is removed from the first dataset, if you open the table in SAS Viewer, the row numbers of the remaining to records are 1 and 3, and not 1 and 2 as one would have expected. And when using the POINT = statement, it corresponds to what you see in the SAS Viewer: there still are row number references to 3 observations.

Kind regards,
PHK
Grand Advisor
Posts: 9,447

Re: Observation Pointers / Table Vector after a REMOVE statement

You delte 2 ,so it will not appear any more.
if AcctNumber=1002 then remove;

So,change

data accounts2;
modify accounts2;
if AcctNumber=1005 then remove;
run;
Occasional Contributor PHK
Occasional Contributor
Posts: 11

Re: Observation Pointers / Table Vector after a REMOVE statement

Sorry, I don't understand how this will solve the problem: I don't want to remove the observation in which AcctNumber=1005. I need the row number references (or table vector or whatever it is called) of the final accounts-table to be 1, 2, 3, 4 rather than: 1,2,4,5.

Kind regards,
PHK
Grand Advisor
Posts: 9,447

Re: Observation Pointers / Table Vector after a REMOVE statement

Oh.I understand what you mean.It is very interesting which i have never imaged.
Then you need this:
[pre]
data accounts;
input AcctNumber Credit;
datalines;
1001 1500
1002 4900
1003 3000
;

data accounts;
set accounts;
if AcctNumber=1002 then delete;
run;
[/pre]


Ksharp
Occasional Contributor PHK
Occasional Contributor
Posts: 11

Re: Observation Pointers / Table Vector after a REMOVE statement

Hi Ksharp,

Thank you. But as far as I know, the code that you are suggesting:

data accounts;
set accounts;
if AcctNumber=1002 then delete;
run;

... rewrites the accounts table (due to the SET statement). I know that the code you are suggesting will work, but my problem is that I have a fairly large dataset and I only want to remove the last part (about 5%) of the dataset. If I use the delete statement with the set statement, the other 95% of the data still gets rewritten (is that not so?) - and that is what I want to avoid, but without getting the observation references mixed up.

By the way:
My problem is also illustrated (but not even mentioned or addressed) on:
http://v8doc.sas.com/sashtml/lgref/z0202648.htm
See the output of the PROC PRINT: there is an observation 1 and 3, and not 1 and 2.
Occasional Contributor PHK
Occasional Contributor
Posts: 11

Re: Observation Pointers / Table Vector after a REMOVE statement

((... I was wondering whether this in any way relates to what is called a "rollback segment" in IT jargon?))
Occasional Contributor PHK
Occasional Contributor
Posts: 11

Re: Observation Pointers / Table Vector after a REMOVE statement

SQL ALTERNATIVES SEEM TO FAIL AS WELL:

data accounts;
input AcctNumber Credit;
datalines;
1001 1500
1002 4900
1003 3000
;
run;

proc sql;
delete from accounts
where AcctNumber=1002;
quit;

... SAME PROBLEM, AS ALSO INDICATED BY:

data test;
do n = 1 to 3;
set accounts point = n; output;
end;
stop;
run;

----
proc sql;
delete from accounts
where MONOTONIC()=2;
quit;
... ALSO DOES NOT WORK
Occasional Contributor PHK
Occasional Contributor
Posts: 11

Re: Observation Pointers / Table Vector after a REMOVE statement

Closest I could get to resolving the problem thus far:

There is something like a PURGE statement, which seems to renumber records.
The only problem is: the statement is used in PROC IML
(http://support.sas.com/documentation/cdl/en/imlug/59656/HTML/default/langref_sect207.htm) and we do not have PROC IML.

The function can also be used elsewhere:
(http://support.sas.com/documentation/cdl/en/imlug/59656/HTML/default/worksasdatasets_sect16.htm).

However, I can see no where that it can be used in a datastep without the SET statement or in PROC SQL. Message was edited by: PHK
Respected Advisor
Posts: 3,773

Re: Observation Pointers / Table Vector after a REMOVE statement

When you MODIFY a data set and REMOVE an obs the record is not physically removed but somehow internally flagged as being deleted.

When reading a SAS data set using direct access with SET and the POINT option when you point to a REMOVED record the SET fails and sets _ERROR_ = 1. The NOBS option gives the “physical” record count which includes any REMOVED records. You can detect the removed records by examination of _ERROR_ and take appropriate action. Note also that when SET fails for a REMOVED record the variables being read by the SET statement will retain the values from the last successful SET.

I don’t know if this will help you or not. If you explain in detail what you need to do with the data with removed records perhaps someone can help you.

[pre]
data accounts;
input AcctNumber Credit @@;
datalines;
1001 1500 1002 4900 1003 3000
;;;;
run;
data accounts;
modify accounts;
if AcctNumber=1002 then remove;
run;
data accounts2;
input AcctNumber Credit @@;
datalines;
1004 3800 1005 9200
;;;;
run;
proc append base = accounts data = accounts2;
run;

data _null_;
do point = 1 to nobs;
_error_ = 0;
set accounts point=point nobs=nobs;
if _error_ eq 1 then do;
put 'NOTE: OBS ' point 'is not Accessible';
call missing(of _all_);
end;
put _all_;
end;
stop;
run;
[/pre]
Occasional Contributor PHK
Occasional Contributor
Posts: 11

Re: Observation Pointers / Table Vector after a REMOVE statement

Thank you, this is useful information, even though I won't be able to make use of it in what I need to do.
Respected Advisor
Posts: 3,773

Re: Observation Pointers / Table Vector after a REMOVE statement

You haven't full explained what you "need to do".
Grand Advisor
Posts: 9,447

Re: Observation Pointers / Table Vector after a REMOVE statement

Hey.I found very useful options,SAS is genius.
[pre]
options compress=yes reuse=yes;
data accounts;
input AcctNumber Credit;
datalines;
1001 1500
1002 4900
1003 3000
;

data accounts;
modify accounts;
if AcctNumber=1002 then remove;
run;



data accounts2;
input AcctNumber Credit;
datalines;
1004 3800
1005 9200
;
proc append base = accounts data = accounts2; run;
[/pre]


Or You can code like this:

[pre]

data accounts(compress=yes reuse=yes);
input AcctNumber Credit;
datalines;
1001 1500
1002 4900
1003 3000
;

data accounts;
modify accounts;
if AcctNumber=1002 then remove;
run;



data accounts2;
input AcctNumber Credit;
datalines;
1004 3800
1005 9200
;
proc append base = accounts data = accounts2; run;
[/pre]


Ksharp
Occasional Contributor PHK
Occasional Contributor
Posts: 11

Re: Observation Pointers / Table Vector after a REMOVE statement

Brilliant! Thanks! This seems to solve the problem in a very simple way!
Ask a Question
Discussion stats
  • 18 replies
  • 148 views
  • 0 likes
  • 5 in conversation