Hi all -
I have a data set with variables dx1, dx2, dx3...dx25 (example below). I only want to keep observations with a value that begins with 2775 (that would be #1, 2, and 4). Note that not all values are the same length. Is there a way to do this without regex??
Conversely, how do I exclude values? For example, take out observations that have values starting with 727 (#2).
data;
input dx1 dx2 dx3;
cards;
27756 826 79883
78399 2775 7274
99462 62294 677
48268 54981 27752
18872 992 27274
;
Right now, all I have is a very clunky brute force method. It works but it's a pain in the butt every time I want to add or subtract a new criteria.
data sorted;
set data;
if dx1 in ('2775', '27751, '27752', '27753, '27754', '27755', '27756', '27757', '27758', '27759')
if dx2 in ('2775', '27751, '27752', '27753, '27754', '27755', '27756', '27757', '27758', '27759')
or dx3 in ('2775', '27751, '27752', '27753, '27754', '27755', '27756', '27757', '27758', '27759')
;
run;
Thanks in advance!
data have;
input dx1 dx2 dx3;
cards;
27756 826 79883
78399 2775 7274
99462 62294 677
48268 54981 27752
18872 992 27274
; run;
data want;
set have;
array dx dx1-dx3;
do i=1 to dim(dx);
numx = strip(put(dx(i),8.));
if length(numx) ge 4 and
substr(numx,1,4) = '2775' then flag=1;
end;
if flag;
run;
Is this what you are looking for?
Using Perl Regular Expressions in the DATA Step
data have ;
input dx1 $ dx2 $ dx3 $;
cards;
27756 826 79883
78399 2775 7274
99462 62294 677
48268 54981 27752
18872 992 27274
12345 12345 127750
;
data want ;
set have ;
x=prxmatch('/2775/', dx1) ;
y=prxmatch('/2775/', dx2) ;
z=prxmatch('/2775/', dx3) ;
a=sum(x,y,z) ;
put a= x= y= z=;
if a=0 then
delete ;
run ;
data have;
input dx1 dx2 dx3;
cards;
27756 826 79883
78399 2775 7274
99462 62294 677
48268 54981 27752
18872 992 27274
; run;
data want;
set have;
array dx dx1-dx3;
do i=1 to dim(dx);
numx = strip(put(dx(i),8.));
if length(numx) ge 4 and
substr(numx,1,4) = '2775' then flag=1;
end;
if flag;
run;
Ok now I see your comment "Is there a way to do this without regex??"
Why would you not use RegEx it's the easy way to solve this problem
Perhaps:
data have; input dx1 dx2 dx3; cards; 27756 826 79883 78399 2775 7274 99462 62294 677 48268 54981 27752 18872 992 27274 ; data want; set have; array d dx: ; do i=1 to dim(d); if put(d[i],best12. -L)=: '2775' then do; output; leave; end; end; drop i; run;
Almost any time you need to do the same thing with multiple variables an Array is a likely tool so you can write code that works for one instance and then use all the values in the array.
The -L in the put statement left justifies the result so we can use the =: , a "begins with" comparison with your fixed value of 2775. The explicit output only writes to the output data set when the value is found. The LEAVE instruction is used to quit the loop the first time the condition is true so you would only get one record even if multiple variables meet the condition.
I will leave the exclude as an exercise for the interested reader but the solution is very similar but uses DELETE instead of OUTPUT.
Though one does wonder why these values are numeric to begin with. You do not appear to be using them as numbers. If these are some sort of identifier then you likely should make sure the values are character instead and then you don't need the somewhat kludgy Put to make a comparison value.
I fully agree to @AMSAS: This is the job for a regex.
Please have a look at:
data want;
set have;
if prxmatch('/\b2775/', catx(' ', of dx1-dx3));
run;
data have; input dx1 dx2 dx3; cards; 27756 826 79883 78399 2775 7274 99462 62294 677 48268 54981 27752 18872 992 27274 ; data want; set have; array x{*} _numeric_; do i=1 to dim(x); if int( x{i}/10**(int(log10(x{i}))-3) )=2775 then do;output;leave;end; end; drop i; run;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.