Hi,
I have a dataset of records for bridges (attached), each bridge has its own structure number given as STRUCTURE_NUMBER_008, and the dataset is ordered based on STRUCTURE_NUMBER_008 and Inspection_year. The frequency each bridge is recorded is different as seen in the last column variable Freq.
The variable SUPERSTRUCTURE_COND_059 can take rating number of 4, 5, 6, 7, and 8. I would like to trim the dataset for each bridge based on the following condition.
If the first 5 rating or last five rating for each bridge for variable SUPERSTRUCTURE_COND_059 is identical (e.g., 4 4 4 4 4 or 6 6 6 6 6, ...) I want to retain all the observations with the middle observations. But if the first five rating for each bridge for variable SUPERSTRUCTURE_COND_059 is different (e.g., 4 4 4 4 5 or 7 7 6 6 6 or 8 7 7 7 7) I would like to delete the observations encountered first. For the example above I want to delete 4 4 4 4 or 7 7 or 8 and retain the remaining observation with the middle observations for the bridge. if the the last five observations for a bridge is (e.g., 4 4 4 4 5 or 7 7 6 6 6 or 8 7 7 7 7), I want to delete observations that is changed, which in this case I would delete 5, 6 6 6, or 7 7 7 7.
I would be grateful I anyone could help me on this. The data set is attached.
Thanks,
Here is a commented version of my code.
data trimmed;
/* Read the COND sequence for a given structure into array cond */
array cond{99} $23 _temporary_;
do count = 1 by 1 until(last.STRUCTURE_NUMBER_008);
set sasforum.psc_super_count02; by STRUCTURE_NUMBER_008;
cond{count} = SUPERSTRUCTURE_COND_059;
end;
/* Check how many COND values need to be trimmed at the
beginning and end of the sequence */
if count ge 5 then do;
/* Scan backwards from 5 to trim the beginning */
do cb = 4 to 1 by -1;
if cond{cb} ne cond{5} then leave;
end;
begin = cb + 1;
/* Scan the last 5 entries to trim the end */
do ce = count-3 to count;
if cond{ce} ne cond{count-4} then leave;
end;
end = ce - 1;
end;
else do; /* Don't trim if sequence isn't long enough */
begin = 1;
end = count;
end;
/* Read the data from the same structure again, copy the untrimmed
entries to the output dataset */
do count = 1 by 1 until(last.STRUCTURE_NUMBER_008);
set sasforum.psc_super_count02; by STRUCTURE_NUMBER_008;
if count >= begin and count <= end then output;
end;
drop cb ce count begin end;
run;
hth
I understand you want to remove the first batch of identical-condition records if that string of records is less than 5.
Do you also want to remove the LAST batch of identical-condition records if the string is also less than 5?
Thank you for your reply.
I don't want to delete the first five and last five identical observations for a bridge. But, I want to delete observations that is changed within first five and last five observations. For example if the first five or last five observation for a bridge is ( 7 7 7 7 7) I will retain all of them. But if the first five observation for a bridge is (8 8 8 7 7), I want to delete the first three observations which is (8 8 8). Similarly, if the last five observation is (6 6 6 6 5), I want to delete the last observation which in this case is 5.
Thank you for your help!
Can be done most simply with an array :
data trimmed;
array cond{99} $23 _temporary_;
do count = 1 by 1 until(last.STRUCTURE_NUMBER_008);
set sasforum.psc_super_count02; by STRUCTURE_NUMBER_008;
cond{count} = SUPERSTRUCTURE_COND_059;
end;
if count ge 10 then do;
do cb = 4 to 1 by -1;
if cond{cb} ne cond{5} then leave;
end;
begin = cb + 1;
do ce = count-3 to count;
if cond{ce} ne cond{count-4} then leave;
end;
end = ce - 1;
end;
else do; /* Don't touch if begin and end sequence overlap */
begin = 1;
end = count;
end;
do count = 1 by 1 until(last.STRUCTURE_NUMBER_008);
set sasforum.psc_super_count02; by STRUCTURE_NUMBER_008;
if count >= begin and count <= end then output;
end;
drop cb ce count begin end;
run;
I didn't do any trimming for sequences shorter than 10 because your rules don't cover such cases.
Thank you for your help. I run your code and the result was different from what I described in my first post. I would be happy if you could read my first post to revise your code.
Thanks,
Can you point a structure number where the trimmed sequence is not as expected, and describe in what what way it differs from your expectation.
PS. Your rules are ambiguous for sequences of less than 9 years: For example, what would be the trimmed version of the sequence 4 4 4 5 4 5 5 5 ?
Thank you for your time. I found out the problem was with bridge observations less than 10. When I deleted bridges with less than 10 observations your code worked.
Also, When I only replaced 10 in your code with 5 as follows, the code worked with observations less than 10 too. Does it affect my trimming in the wrong way if I do this?
if count ge 5 then do;
Thank you for your help.
Could you please give comment about the your code that I could learn from it, specially the array part and if-else part.
Thanks alot! I really appreciate it.
Here is a commented version of my code.
data trimmed;
/* Read the COND sequence for a given structure into array cond */
array cond{99} $23 _temporary_;
do count = 1 by 1 until(last.STRUCTURE_NUMBER_008);
set sasforum.psc_super_count02; by STRUCTURE_NUMBER_008;
cond{count} = SUPERSTRUCTURE_COND_059;
end;
/* Check how many COND values need to be trimmed at the
beginning and end of the sequence */
if count ge 5 then do;
/* Scan backwards from 5 to trim the beginning */
do cb = 4 to 1 by -1;
if cond{cb} ne cond{5} then leave;
end;
begin = cb + 1;
/* Scan the last 5 entries to trim the end */
do ce = count-3 to count;
if cond{ce} ne cond{count-4} then leave;
end;
end = ce - 1;
end;
else do; /* Don't trim if sequence isn't long enough */
begin = 1;
end = count;
end;
/* Read the data from the same structure again, copy the untrimmed
entries to the output dataset */
do count = 1 by 1 until(last.STRUCTURE_NUMBER_008);
set sasforum.psc_super_count02; by STRUCTURE_NUMBER_008;
if count >= begin and count <= end then output;
end;
drop cb ce count begin end;
run;
hth
I'd suggest reading the records for each bridge counting the size of each successive condition in variable _NCOND. If the first _NCOND is less than 5 then set a _keepstart variable to that number+1 (i.e. if the start is 1 1 2 2 2, then the first _NCOND is 2, and therefore _KEEPSTART=3 --- for record 3). Use a similar process for the last condition group. If its size is less than 5 then set _KEEPEND to total count (_NTOTAL) minus the group size (_NCOND). Otherwise _KEEPEND is set to _NTOTAL.
Then reread the records for the same bridge, keeping only those record numbers between _KEEPSTART and _KEEPEND.
data want (drop=_:);
_ntotal=0;
_keepstart=1;
do until (last.STRUCTURE_NUMBER_008);
do _ncond=1 by 1 until (last.SUPERSTRUCTURE_COND_059);
set psc_super_count02 ;
by STRUCTURE_NUMBER_008 SUPERSTRUCTURE_COND_059 notsorted;
end;
if _ntotal=0 and _ncond<5 then _keepstart=_ncond+1;
_ntotal=_ntotal+_ncond;
end;
if _ncond<5 then _keepend=_ntotal-_ncond;
else _keepend=_ntotal;
do _n=1 to _ntotal;
set psc_super_count02 ;
if _keepstart <= _n <= _keepend then output;
end;
run;
Thank you for you help. I run your code and the result was different from what I described in my first post. Please read my first post and let me know if there is need for clarification.
Thanks,
I think this would be more productive if you give an example of what you expected versus what you actually got from my program.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.