BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
nonlinear999
Fluorite | Level 6

Hi Guys

I have a dataset structured in the following manner. 

data have;
input id date yymmdd6. code $ x $ price;
datalines;
1 121201 BC Grey 2
1 121202 BC Blue .
1 121203 BC Blue .
1 121204 BC Blue .
2 121101 CC Black 3
2 121102 CC Black 4
3 121101 DD Green 15
3 121102 DD Green .
3 121103 DD Green .
3 121104 DD Green .
3 121105 DD Green 16
3 121107 DD Green .
4 121110 KK Purple 100
4 121111 KK Purple 102
4 121111 KK Purple .
5 121110 MM Red .
5 121111 MM Red .
5 121112 MM Red .
5 121113 MM Red 30
;
run;

I would like to remove the group variable (ID ) if there are 3(arbitrary- could be any number) consecutive missing prices  for that ID (forward-looking) and backward-looking .

Ideally the output would look like the following  (removing all ID group that has 3 consecutive forward or backward missing prices. 

data want; 
input id date yymmdd6. code $ x $ price;
datalines;

2 121101	CC Black 3
2 121102	CC Black 4
4 121110	KK Purple 100
4 121111	KK Purple 102
4 121111	KK Purple .
;
run;

 Not sure how complicated it is. Any help is greatly appreciated. Thanks. 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Just use a double DOW loop.  The first one to find the longest run of missing prices.

The second one to re-read the data for that ID and output the values for the ID's you want to keep.

 

You can use the NOTSORTED keyword on the BY statement to allow you to group by ID PRICE.

data have;
  input id date :yymmdd6. code $ color $ price;
  format date yymmdd10.;
datalines;
1 121201 BC Grey 2
1 121202 BC Blue .
1 121203 BC Blue .
1 121204 BC Blue .
2 121101 CC Black 3
2 121102 CC Black 4
3 121101 DD Green 15
3 121102 DD Green .
3 121103 DD Green .
3 121104 DD Green .
3 121105 DD Green 16
3 121107 DD Green .
4 121110 KK Purple 100
4 121111 KK Purple 102
4 121111 KK Purple .
5 121110 MM Red .
5 121111 MM Red .
5 121112 MM Red .
5 121113 MM Red 30
;

data want;
  do until (last.id);
    do n=1 by 1 until (last.price) ;
      set have ;
      by id price notsorted ;
    end;
    if missing(price) then maxrun=max(maxrun,n);
  end;
  do until (last.id);
    set have ;
    by id ;
    if maxrun < 3 then output;
  end;
  drop n maxrun ;
run;

Result:

Obs    id          date    code    color     price

 1      2    2012-11-01     CC     Black        3
 2      2    2012-11-02     CC     Black        4
 3      4    2012-11-10     KK     Purple     100
 4      4    2012-11-11     KK     Purple     102
 5      4    2012-11-11     KK     Purple       .

 

View solution in original post

1 REPLY 1
Tom
Super User Tom
Super User

Just use a double DOW loop.  The first one to find the longest run of missing prices.

The second one to re-read the data for that ID and output the values for the ID's you want to keep.

 

You can use the NOTSORTED keyword on the BY statement to allow you to group by ID PRICE.

data have;
  input id date :yymmdd6. code $ color $ price;
  format date yymmdd10.;
datalines;
1 121201 BC Grey 2
1 121202 BC Blue .
1 121203 BC Blue .
1 121204 BC Blue .
2 121101 CC Black 3
2 121102 CC Black 4
3 121101 DD Green 15
3 121102 DD Green .
3 121103 DD Green .
3 121104 DD Green .
3 121105 DD Green 16
3 121107 DD Green .
4 121110 KK Purple 100
4 121111 KK Purple 102
4 121111 KK Purple .
5 121110 MM Red .
5 121111 MM Red .
5 121112 MM Red .
5 121113 MM Red 30
;

data want;
  do until (last.id);
    do n=1 by 1 until (last.price) ;
      set have ;
      by id price notsorted ;
    end;
    if missing(price) then maxrun=max(maxrun,n);
  end;
  do until (last.id);
    set have ;
    by id ;
    if maxrun < 3 then output;
  end;
  drop n maxrun ;
run;

Result:

Obs    id          date    code    color     price

 1      2    2012-11-01     CC     Black        3
 2      2    2012-11-02     CC     Black        4
 3      4    2012-11-10     KK     Purple     100
 4      4    2012-11-11     KK     Purple     102
 5      4    2012-11-11     KK     Purple       .

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 204 views
  • 2 likes
  • 2 in conversation