BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mmhxc5
Quartz | Level 8

Hi,

I have a dataset of records for bridges (attached), each bridge has its own structure number given as STRUCTURE_NUMBER_008, and the dataset is ordered based on STRUCTURE_NUMBER_008 and Inspection_year. The frequency each bridge is recorded is different as seen in the last column variable Freq.

The variable SUPERSTRUCTURE_COND_059 can take rating number of 4, 5, 6, 7, and 8. I would like to trim the dataset for each bridge based on the following condition.

 

If the first 5 rating or last five rating for each bridge for variable SUPERSTRUCTURE_COND_059 is identical (e.g., 4 4 4 4 4 or 6 6 6 6 6, ...) I want to retain all the observations with the middle observations. But if the first five rating for each bridge for variable SUPERSTRUCTURE_COND_059 is different (e.g., 4 4 4 4 5 or 7 7 6 6 6 or 8 7 7 7 7) I would like to delete the observations encountered first. For the example above I want to delete 4 4 4 4 or 7 7 or 8 and retain the remaining observation with the middle observations for the bridge. if the the last five observations for a bridge is (e.g., 4 4 4 4 5 or 7 7 6 6 6 or 8 7 7 7 7), I want to delete observations that is changed, which in this case I would delete 5, 6 6 6, or 7 7 7 7.

 

I would be grateful I anyone could help me on this. The data set is attached.

 

Thanks, 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Here is a commented version of my code.

 

data trimmed;

/* Read the COND sequence for a given structure into array cond */
array cond{99} $23 _temporary_;
do count = 1 by 1 until(last.STRUCTURE_NUMBER_008);
    set sasforum.psc_super_count02; by STRUCTURE_NUMBER_008;
    cond{count} = SUPERSTRUCTURE_COND_059;
    end;

/* Check how many COND values need to be trimmed at the 
   beginning and end of the sequence */
if count ge 5 then do;
    /* Scan backwards from 5 to trim the beginning */
    do cb = 4 to 1 by -1;
        if cond{cb} ne cond{5} then leave;
        end;
    begin = cb + 1;
    /* Scan the last 5 entries to trim the end */
    do ce = count-3 to count;
        if cond{ce} ne cond{count-4} then leave;
        end;
    end = ce - 1;
    end;
else do; /* Don't trim if sequence isn't long enough */
    begin = 1;
    end = count;
    end;

/* Read the data from the same structure again, copy the untrimmed 
   entries to the output dataset */
do count = 1 by 1 until(last.STRUCTURE_NUMBER_008);
    set sasforum.psc_super_count02; by STRUCTURE_NUMBER_008;
    if count >= begin and count <= end then output;
    end;

drop cb ce count begin end;
run;

hth

PG

View solution in original post

10 REPLIES 10
mkeintz
PROC Star

I understand you want to remove the first batch of identical-condition records if that string of records is less than 5.

 

Do you also want to remove the LAST batch of identical-condition records if the string is also less than 5? 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mmhxc5
Quartz | Level 8

Thank you for your reply.

I don't want to delete the first five and last five identical observations for a bridge. But, I want to delete observations that is changed within first five and last five observations. For example if the first five or last five observation for a bridge is ( 7 7 7 7 7) I will retain all of them. But if the first five observation for a bridge is (8 8 8 7 7), I want to delete the first three observations which is (8 8 8). Similarly, if the last five observation is (6 6 6 6 5), I want to delete the last observation which in this case is 5.

 

Thank you for your help!

PGStats
Opal | Level 21

Can be done most simply with an array :

 

data trimmed;
array cond{99} $23 _temporary_;
do count = 1 by 1 until(last.STRUCTURE_NUMBER_008);
    set sasforum.psc_super_count02; by STRUCTURE_NUMBER_008;
    cond{count} = SUPERSTRUCTURE_COND_059;
    end;

if count ge 10 then do;
    do cb = 4 to 1 by -1;
        if cond{cb} ne cond{5} then leave;
        end;
    begin = cb + 1;
    do ce = count-3 to count;
        if cond{ce} ne cond{count-4} then leave;
        end;
    end = ce - 1;
    end;
else do; /* Don't touch if begin and end sequence overlap */
    begin = 1;
    end = count;
    end;

do count = 1 by 1 until(last.STRUCTURE_NUMBER_008);
    set sasforum.psc_super_count02; by STRUCTURE_NUMBER_008;
    if count >= begin and count <= end then output;
    end;

drop cb ce count begin end;
run;

I didn't do any trimming for sequences shorter than 10 because your rules don't cover such cases.

PG
mmhxc5
Quartz | Level 8

Thank you for your help. I run your code and the result was different from what I described in my first post. I would be happy if you could read my first post to revise your code.

 

Thanks,

PGStats
Opal | Level 21

Can you point a structure number where the trimmed sequence is not as expected, and describe in what what way it differs from your expectation.

 

PS. Your rules are ambiguous for sequences of less than 9 years: For example, what would be the trimmed version of the sequence 4 4 4 5 4 5 5 5 ?

PG
mmhxc5
Quartz | Level 8

Thank you for your time. I found out the problem was with bridge observations less than 10. When I deleted bridges with less than 10 observations your code worked.

Also, When I only replaced 10 in your code with 5 as follows, the code worked with observations less than 10 too. Does it affect my trimming in the wrong way if I do this?

if count ge 5 then do;

Thank you for your help.

Could you please give comment about the your code that I could learn from it, specially the array part and if-else part.

 

Thanks alot! I really appreciate it.

PGStats
Opal | Level 21

Here is a commented version of my code.

 

data trimmed;

/* Read the COND sequence for a given structure into array cond */
array cond{99} $23 _temporary_;
do count = 1 by 1 until(last.STRUCTURE_NUMBER_008);
    set sasforum.psc_super_count02; by STRUCTURE_NUMBER_008;
    cond{count} = SUPERSTRUCTURE_COND_059;
    end;

/* Check how many COND values need to be trimmed at the 
   beginning and end of the sequence */
if count ge 5 then do;
    /* Scan backwards from 5 to trim the beginning */
    do cb = 4 to 1 by -1;
        if cond{cb} ne cond{5} then leave;
        end;
    begin = cb + 1;
    /* Scan the last 5 entries to trim the end */
    do ce = count-3 to count;
        if cond{ce} ne cond{count-4} then leave;
        end;
    end = ce - 1;
    end;
else do; /* Don't trim if sequence isn't long enough */
    begin = 1;
    end = count;
    end;

/* Read the data from the same structure again, copy the untrimmed 
   entries to the output dataset */
do count = 1 by 1 until(last.STRUCTURE_NUMBER_008);
    set sasforum.psc_super_count02; by STRUCTURE_NUMBER_008;
    if count >= begin and count <= end then output;
    end;

drop cb ce count begin end;
run;

hth

PG
mkeintz
PROC Star

I'd suggest reading the records for each bridge counting the size of each successive condition in variable _NCOND.  If the first _NCOND is less than 5 then set a _keepstart variable to that number+1  (i.e. if the start is   1 1 2 2 2, then the first _NCOND is 2, and therefore _KEEPSTART=3    --- for record 3).   Use a similar process for the last condition group.  If its size is less than 5 then set _KEEPEND to total count (_NTOTAL) minus the group size (_NCOND).  Otherwise _KEEPEND is set to _NTOTAL.

 

Then reread the records for the same bridge, keeping only those record numbers between _KEEPSTART and _KEEPEND.

 

data want (drop=_:);
  _ntotal=0;
  _keepstart=1;
  do until (last.STRUCTURE_NUMBER_008);
    do _ncond=1 by 1 until (last.SUPERSTRUCTURE_COND_059);
      set psc_super_count02 ;
      by STRUCTURE_NUMBER_008 SUPERSTRUCTURE_COND_059 notsorted;
    end;

    if _ntotal=0 and _ncond<5 then _keepstart=_ncond+1;
    _ntotal=_ntotal+_ncond;
  end;

  if _ncond<5 then _keepend=_ntotal-_ncond;
  else _keepend=_ntotal;

  do _n=1 to _ntotal;
    set psc_super_count02 ;
    if _keepstart <= _n <= _keepend then output;
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mmhxc5
Quartz | Level 8

Thank you for you help. I run your code and the result was different from what I described in my first post. Please read my first post and let me know if there is need for clarification.

Thanks,

mkeintz
PROC Star

I think this would be more productive if you give an example of what you expected versus what you actually got from my program.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1371 views
  • 1 like
  • 3 in conversation