Solved: Re: How to delete first five and last five observations from a dataset

mmhxc5 · Posted 01-03-2019 11:34 PM

Hi,

I have a dataset of over 3000 observations for bridges inspected since 1992 -2017. Each bridge is inspected once each year. I would like to delete the first five years and last five years of inspection result for each bridge. Chosing STRUCTURE_NUMBER_008 or Inspection_Year might work as the variable to delete, because the beginning and last year of inspection is not the same for all bridges. The attached image shows two bridge IDs sorted by STRUCTURE_NUMBER_008 (First column) and Inspection_Year(Last column). STRUCTURE_NUMBER_008 is random name given for each bridge. I would be glad if anyone could help me to do this task. I use SAS 9.4 in Windows 10 and I have the dataset as permanent SAS files.

Thanks,

novinosrin · Posted 01-04-2019 11:25 AM

Please follow the comments and logic and see if this helps

/*Creating a sample data with 120 bridges with each bridge having 25 observations*/
data sample;
do bridge=1 to 120;
	do var=1 to 25;
	output;
	end;
end;
run;

/*Output */
data want;
/*1st loop count the number of records in each bridge*/
do n=1 by 1 until(last.bridge);
set sample;
by bridge;
end;
/*2nd loop remove first five and last five observation for each bridge 
and output the rest*/
do _n_=1 by 1 until(last.bridge);
set sample;
by bridge;
if 6<=_n_<= n-5 then output;
end;
run;

View solution in original post

novinosrin · Posted 01-04-2019 12:25 AM

data want;
do p=6 to nobs-5;
   set have nobs=nobs point=p;
    output;
end;
stop;
run;

mmhxc5 · Posted 01-04-2019 10:36 AM

Thank you for you response. Your code only removes the first five and last five observation of the whole dataset which is over 3000 observations.

What I want is to remove first five and last five observation for each bridge which has 25 observation each. The dataset of 120 bridges makes the 3000 observations. I only want to retain the middle 15 observation for each bridge. The STRUCTURE _NUMBER_008 is a mix of character and numeric. The dataset is ordered based on the STRUCTURE_NUMBER_008 and Inspection_Year.

Thanks,

novinosrin · Posted 01-04-2019 11:25 AM

Please follow the comments and logic and see if this helps

/*Creating a sample data with 120 bridges with each bridge having 25 observations*/
data sample;
do bridge=1 to 120;
	do var=1 to 25;
	output;
	end;
end;
run;

/*Output */
data want;
/*1st loop count the number of records in each bridge*/
do n=1 by 1 until(last.bridge);
set sample;
by bridge;
end;
/*2nd loop remove first five and last five observation for each bridge 
and output the rest*/
do _n_=1 by 1 until(last.bridge);
set sample;
by bridge;
if 6<=_n_<= n-5 then output;
end;
run;

Rick_SAS · Posted 01-05-2019 08:02 AM

If you are not familiar with @novinosrin's BY-group trick, see "How to use FIRST.variable and LAST.variable in a BY-group analysis in SAS."

mmhxc5 · Posted 01-07-2019 03:53 PM

Thank you for your timely reply. I really appreciate it. Based on your code I created a "Sample" data, then I defined the "want" and also the set "sample" to the end of your code. How can I link this code to the Bridge dataset I have? I mean, which part of your code will call the bridge dataset which has all the data that I want to process?

Ksharp · Posted 01-05-2019 06:09 AM

It is forwards and backwards tricks .

data sample;
do bridge=1 to 10;
	do var=1 to 25;
	output;
	end;
end;
run;

data want;
 merge sample sample(firstobs=6 keep=bridge rename=(bridge=_bridge));
 if bridge ne lag5(bridge) or bridge ne _bridge then delete;
 drop _bridge;
 run;

proc print;run;

mmhxc5 · Posted 01-07-2019 03:58 PM

Thank you for your timely reply. I really appreciate it. There is only two data calling in your code, "sample" and "want". How can I link this code to the Bridge dataset I have? I mean, which part of your code will call the bridge dataset which has all the data that I want to process?

Ksharp · Posted 01-15-2019 02:40 AM

‘Sample' dataset is 'Bridge' dataset ,according to Art.T 's code.

art297 · Posted 01-07-2019 08:36 PM

Both @novinosrin and @Ksharp created a file called sample to represent your bridge dataset. They did not expect you to create a sample dataset but, rather, replace it with your actual bridge dataset.

However, you have to decide whether you want to drop the first and last 5 records for each structure_no_008 OR, for each structure_no_008, you want to drop any inspection_year that was less than the earliest+4 and/or greater than the last inspection_year -4.

My suggested code does the latter:

data bridge (drop=structure_no var);
  do structure_no=1 to 120;
    structure_no_008=catt('000000000000H',put(structure_no,z3.));
    inspection_year=1996;
    do var=1 to 25;
      inspection_year+1;
      if mod(structure_no,40) eq 0 then do;
        if inspection_year ne 2000 then output;
      end;
      else output;
    end;
  end;
run;

/*Output */
data want (drop=start last);
/*1st loop count the number of records in each bridge*/
  do until(last.structure_no_008);
    set bridge;
    by structure_no_008;
    if first.structure_no_008 then start=inspection_year;
    else if last.structure_no_008 then last=inspection_year;
  end;
/*2nd loop only keep inspection_years if they are greater than the
  earliest inspection_year+4 and lower than latest inspection-4*/
  do until(last.structure_no_008);
    set bridge;
    by structure_no_008;
    if start+5 <=inspection_year<=last-5 then output;
  end;
run;

Art, CEO, AnalystFinder.com

mmhxc5 · Posted 01-07-2019 11:00 PM

Thank you @novinosrin, @Ksharp, @Rick_SAS, and @art297 for your help. Appreciated. @art297 your clarification was very helpful.

How to delete first five and last five observations from a dataset

Re: How to delete first five and last five observations from each variable in a dataset

Re: How to delete first five and last five observations from a dataset

Re: How to delete first five and last five observations from each variable in a dataset

Re: How to delete first five and last five observations from each variable in a dataset

Re: How to delete first five and last five observations from each variable in a dataset

Re: How to delete first five and last five observations from each variable in a dataset

Re: How to delete first five and last five observations from a dataset

Re: How to delete first five and last five observations from a dataset

Re: How to delete first five and last five observations from a dataset

Re: How to delete first five and last five observations from a dataset

Re: How to delete first five and last five observations from a dataset

Registration is open