Re: remove last 0 rows(All trailing zeroes)

GeorgeSAS · Posted 10-27-2017 12:06 PM

data have;
input value;
cards;
  371
  0
  145
   75
   40
   41
   19
    0
   10
    2
    0
    1
    3
    999
   0
   0
   0
   0
   0
   0
   0
   0
   0
;
run;

What SAS code logic can remove last several zero value rows(In this example,remove all rows after value is 999)?

Thanks!

Here is my method please advise:

data need1;
set have;
n=_n_;
run;
proc sort data=need1 out=need2;
by descending n;
run;
data need3;
retain flag 0;
set need2 nobs=obs;
do i=1 to obs;
if value=0 then do;
  if flag =0 then delete;
end;
else do;
  flag=1;
end;
end;
run;
proc sort data=need3 out=need(keep=value);
by n;
run;

Reeza · Posted 10-27-2017 12:17 PM

What's the logic/rule?

mkeintz · Posted 10-27-2017 12:18 PM

What is your criterion? All trailing zeroes? All zeroes after 999? After the last 999?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

GeorgeSAS · Posted 10-27-2017 12:29 PM

All trailing zeroes.

Reeza · Posted 10-27-2017 12:33 PM

Reverse your data, delete all the first zeros and then reverse it back.

Astounding · Posted 10-27-2017 12:19 PM

There's really no way to do this in one step, since you have to keep reading in all the data to see if there is another nonzero.

Here's one approach:

data have;
input value;
if value ne 0 then call symputx('good_obs', _n_);
cards;
  ...
;
data want;
set have (obs=&good_obs);
run;

GeorgeSAS · Posted 10-27-2017 12:46 PM

This is good fancy solution.

Also use SYMPUTX instead of symput takes the additional step of removing any leading blanks

Thanks!

Rwon · Posted 10-27-2017 12:57 PM

You could use this technique; Seems to work with this case and it's one data step.

*Merge dataset with itself, starting with subsequent values. This will get post values. Output if you don't have 2 trailing zeroes;
data want (drop = post_value1 post_value2);
merge have
      have (firstobs = 2 rename = value = post_value1)
      have (firstobs = 3 rename = value = post_value2);

if sum (value, post_value1, post_value2) > 0 then output;
run;

GeorgeSAS · Posted 10-27-2017 03:53 PM

I am not understand your code and the code has error after I run.

data_null__ · Posted 10-27-2017 01:24 PM

MODIFY.

data have;
   input value @@;
   cards;
  371
  0
  145
   75    40
   41   19
    0   10
    2    0
    1    3
    999
   0   0
   0   0
   0   0
   0   0
   0
;
run;
proc print;
   run;
data have;
   do i=j by -1 to 1;
      modify have point=i nobs=j;
      if value eq 0 then remove;
      else stop;
      end;
   stop;
   run;
proc print;
   run;

GeorgeSAS · Posted 10-27-2017 03:28 PM

Great solution! very fancy code!

I never used 'point=' in a data step.I want to learn it by this example.

May I ask what the "point=i" here does?

Thanks!

By the way here will be a problem if the have dataset created in a different environment than the update program:

(that is if I created the 'have' in UNIX, but when i use this code in PC to update the dataset, the error will happen.

but that is fine, I can run the update code in UNIX too)

ERROR: File have cannot be updated because
its encoding does not match the session encoding or the
file is in a format native to another host, such as
HP_UX_64, RS_6000_AIX_64, SOLARIS_64, HP_IA64

data_null__ · Posted 10-27-2017 03:38 PM

POINT= is a MODIFY statement option to name the variable that points to the observation being modified.

See the documentation for complete details.

mkeintz · Posted 10-27-2017 05:19 PM

I like the use of the MODIFY statement for removing observations in place (i.e. don't copy the original data set, just update in place).

However, be aware that the data set attribute NOBS (number of observations) is unchanged by REMOVE statements. I think this is because SAS needs to know how much physical space is used by the dataset. And since it would be inefficient for SAS to delete internal records by overwriting all the subsequent records to new locations within the data set, they are just marked as removed, and total physical space is not reduced by REMOVE. However, the NLOBS (number of logical records) is adjusted.

If you need to make a new dataset, which will have NOBS=NLOBS, you can use this program, which uses the same "point=" logic as @data_null__:

data want;
  if _n_=1 then do p=nrecs to 1 by -1 until(value^=0);
    set have point=p nobs=nrecs;
  end;
  set have;
  if _n_>p then stop;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ksharp · Posted 10-29-2017 08:03 AM

data have;
input value;
cards;
  371
  0
  145
   75
   40
   41
   19
    0
   10
    2
    0
    1
    3
    999
   0
   0
   0
   0
   0
   0
   0
   0
   0
;
run;
data have; 
 set have;
 if value=0 then group=0;
  else group=1;
run;
data have;
 set have end=last;
 by group notsorted;
 n+first.group;
 if last then call symputx('n',n);
run;
data want;
 set have;
 if n=&n and value=0 then delete;
 drop group n;
run;

Registration is open

SAS Training: Just a Click Away