BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
nonlinear999
Fluorite | Level 6

Hi Guys

I have a dataset structured in the following manner. 

data have;
input id date yymmdd6. code $ x $ price;
datalines;
1 121201 BC Grey 2
1 121202 BC Blue .
1 121203 BC Blue .
1 121204 BC Blue .
2 121101 CC Black 3
2 121102 CC Black 4
3 121101 DD Green 15
3 121102 DD Green .
3 121103 DD Green .
3 121104 DD Green .
3 121105 DD Green 16
3 121107 DD Green .
4 121110 KK Purple 100
4 121111 KK Purple 102
4 121111 KK Purple .
5 121110 MM Red .
5 121111 MM Red .
5 121112 MM Red .
5 121113 MM Red 30
;
run;

I would like to remove the group variable (ID ) if there are 3(arbitrary- could be any number) consecutive missing prices  for that ID (forward-looking) and backward-looking .

Ideally the output would look like the following  (removing all ID group that has 3 consecutive forward or backward missing prices. 

data want; 
input id date yymmdd6. code $ x $ price;
datalines;

2 121101	CC Black 3
2 121102	CC Black 4
4 121110	KK Purple 100
4 121111	KK Purple 102
4 121111	KK Purple .
;
run;

 Not sure how complicated it is. Any help is greatly appreciated. Thanks. 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Just use a double DOW loop.  The first one to find the longest run of missing prices.

The second one to re-read the data for that ID and output the values for the ID's you want to keep.

 

You can use the NOTSORTED keyword on the BY statement to allow you to group by ID PRICE.

data have;
  input id date :yymmdd6. code $ color $ price;
  format date yymmdd10.;
datalines;
1 121201 BC Grey 2
1 121202 BC Blue .
1 121203 BC Blue .
1 121204 BC Blue .
2 121101 CC Black 3
2 121102 CC Black 4
3 121101 DD Green 15
3 121102 DD Green .
3 121103 DD Green .
3 121104 DD Green .
3 121105 DD Green 16
3 121107 DD Green .
4 121110 KK Purple 100
4 121111 KK Purple 102
4 121111 KK Purple .
5 121110 MM Red .
5 121111 MM Red .
5 121112 MM Red .
5 121113 MM Red 30
;

data want;
  do until (last.id);
    do n=1 by 1 until (last.price) ;
      set have ;
      by id price notsorted ;
    end;
    if missing(price) then maxrun=max(maxrun,n);
  end;
  do until (last.id);
    set have ;
    by id ;
    if maxrun < 3 then output;
  end;
  drop n maxrun ;
run;

Result:

Obs    id          date    code    color     price

 1      2    2012-11-01     CC     Black        3
 2      2    2012-11-02     CC     Black        4
 3      4    2012-11-10     KK     Purple     100
 4      4    2012-11-11     KK     Purple     102
 5      4    2012-11-11     KK     Purple       .

 

View solution in original post

1 REPLY 1
Tom
Super User Tom
Super User

Just use a double DOW loop.  The first one to find the longest run of missing prices.

The second one to re-read the data for that ID and output the values for the ID's you want to keep.

 

You can use the NOTSORTED keyword on the BY statement to allow you to group by ID PRICE.

data have;
  input id date :yymmdd6. code $ color $ price;
  format date yymmdd10.;
datalines;
1 121201 BC Grey 2
1 121202 BC Blue .
1 121203 BC Blue .
1 121204 BC Blue .
2 121101 CC Black 3
2 121102 CC Black 4
3 121101 DD Green 15
3 121102 DD Green .
3 121103 DD Green .
3 121104 DD Green .
3 121105 DD Green 16
3 121107 DD Green .
4 121110 KK Purple 100
4 121111 KK Purple 102
4 121111 KK Purple .
5 121110 MM Red .
5 121111 MM Red .
5 121112 MM Red .
5 121113 MM Red 30
;

data want;
  do until (last.id);
    do n=1 by 1 until (last.price) ;
      set have ;
      by id price notsorted ;
    end;
    if missing(price) then maxrun=max(maxrun,n);
  end;
  do until (last.id);
    set have ;
    by id ;
    if maxrun < 3 then output;
  end;
  drop n maxrun ;
run;

Result:

Obs    id          date    code    color     price

 1      2    2012-11-01     CC     Black        3
 2      2    2012-11-02     CC     Black        4
 3      4    2012-11-10     KK     Purple     100
 4      4    2012-11-11     KK     Purple     102
 5      4    2012-11-11     KK     Purple       .

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 643 views
  • 2 likes
  • 2 in conversation