Hi Guys
I have a dataset structured in the following manner.
data have;
input id date yymmdd6. code $ x $ price;
datalines;
1 121201 BC Grey 2
1 121202 BC Blue .
1 121203 BC Blue .
1 121204 BC Blue .
2 121101 CC Black 3
2 121102 CC Black 4
3 121101 DD Green 15
3 121102 DD Green .
3 121103 DD Green .
3 121104 DD Green .
3 121105 DD Green 16
3 121107 DD Green .
4 121110 KK Purple 100
4 121111 KK Purple 102
4 121111 KK Purple .
5 121110 MM Red .
5 121111 MM Red .
5 121112 MM Red .
5 121113 MM Red 30
;
run;
I would like to remove the group variable (ID ) if there are 3(arbitrary- could be any number) consecutive missing prices for that ID (forward-looking) and backward-looking .
Ideally the output would look like the following (removing all ID group that has 3 consecutive forward or backward missing prices.
data want; input id date yymmdd6. code $ x $ price; datalines; 2 121101 CC Black 3 2 121102 CC Black 4 4 121110 KK Purple 100 4 121111 KK Purple 102 4 121111 KK Purple . ; run;
Not sure how complicated it is. Any help is greatly appreciated. Thanks.
Just use a double DOW loop. The first one to find the longest run of missing prices.
The second one to re-read the data for that ID and output the values for the ID's you want to keep.
You can use the NOTSORTED keyword on the BY statement to allow you to group by ID PRICE.
data have;
input id date :yymmdd6. code $ color $ price;
format date yymmdd10.;
datalines;
1 121201 BC Grey 2
1 121202 BC Blue .
1 121203 BC Blue .
1 121204 BC Blue .
2 121101 CC Black 3
2 121102 CC Black 4
3 121101 DD Green 15
3 121102 DD Green .
3 121103 DD Green .
3 121104 DD Green .
3 121105 DD Green 16
3 121107 DD Green .
4 121110 KK Purple 100
4 121111 KK Purple 102
4 121111 KK Purple .
5 121110 MM Red .
5 121111 MM Red .
5 121112 MM Red .
5 121113 MM Red 30
;
data want;
do until (last.id);
do n=1 by 1 until (last.price) ;
set have ;
by id price notsorted ;
end;
if missing(price) then maxrun=max(maxrun,n);
end;
do until (last.id);
set have ;
by id ;
if maxrun < 3 then output;
end;
drop n maxrun ;
run;
Result:
Obs id date code color price 1 2 2012-11-01 CC Black 3 2 2 2012-11-02 CC Black 4 3 4 2012-11-10 KK Purple 100 4 4 2012-11-11 KK Purple 102 5 4 2012-11-11 KK Purple .
Just use a double DOW loop. The first one to find the longest run of missing prices.
The second one to re-read the data for that ID and output the values for the ID's you want to keep.
You can use the NOTSORTED keyword on the BY statement to allow you to group by ID PRICE.
data have;
input id date :yymmdd6. code $ color $ price;
format date yymmdd10.;
datalines;
1 121201 BC Grey 2
1 121202 BC Blue .
1 121203 BC Blue .
1 121204 BC Blue .
2 121101 CC Black 3
2 121102 CC Black 4
3 121101 DD Green 15
3 121102 DD Green .
3 121103 DD Green .
3 121104 DD Green .
3 121105 DD Green 16
3 121107 DD Green .
4 121110 KK Purple 100
4 121111 KK Purple 102
4 121111 KK Purple .
5 121110 MM Red .
5 121111 MM Red .
5 121112 MM Red .
5 121113 MM Red 30
;
data want;
do until (last.id);
do n=1 by 1 until (last.price) ;
set have ;
by id price notsorted ;
end;
if missing(price) then maxrun=max(maxrun,n);
end;
do until (last.id);
set have ;
by id ;
if maxrun < 3 then output;
end;
drop n maxrun ;
run;
Result:
Obs id date code color price 1 2 2012-11-01 CC Black 3 2 2 2012-11-02 CC Black 4 3 4 2012-11-10 KK Purple 100 4 4 2012-11-11 KK Purple 102 5 4 2012-11-11 KK Purple .
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.