BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jsheu
Calcite | Level 5

Hi all -

I have a data set with variables dx1, dx2, dx3...dx25 (example below).  I only want to keep observations with a value that begins with 2775 (that would be #1, 2, and 4). Note that not all values are the same length.  Is there a way to do this without regex??

 

Conversely, how do I exclude values? For example, take out observations that have values starting with 727 (#2).

 

data;
input dx1 dx2 dx3;
cards;
27756 826 79883
78399 2775 7274
99462 62294 677
48268 54981 27752
18872 992 27274
;

 

Right now, all I have is a very clunky brute force method. It works but it's a pain in the butt every time I want to add or subtract a new criteria.

 

data sorted;
set data;
if dx1 in ('2775', '27751, '27752', '27753, '27754', '27755', '27756', '27757', '27758', '27759')
if dx2 in ('2775', '27751, '27752', '27753, '27754', '27755', '27756', '27757', '27758', '27759')
or dx3 in ('2775', '27751, '27752', '27753, '27754', '27755', '27756', '27757', '27758', '27759')
;
run;

 Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18
data have;
input dx1 dx2 dx3;
cards;
27756  826  79883
78399  2775  7274
99462  62294  677
48268  54981  27752
18872  992  27274
; run;

data want;
 set have;
     array dx dx1-dx3;
     do i=1 to dim(dx);
        numx = strip(put(dx(i),8.));
        if length(numx) ge 4 and
           substr(numx,1,4) = '2775' then flag=1;
     end;
     if flag;
run;

View solution in original post

6 REPLIES 6
AMSAS
SAS Super FREQ

Is this what you are looking for?

 

Using Perl Regular Expressions in the DATA Step 



data have ;
input dx1 $ dx2 $ dx3 $;
cards;
27756  826  79883
78399  2775  7274
99462  62294  677
48268  54981  27752
18872  992  27274
12345  12345 127750
;


data want ;
	set have ;
	x=prxmatch('/2775/', dx1) ;
	y=prxmatch('/2775/', dx2) ;
	z=prxmatch('/2775/', dx3) ;
	a=sum(x,y,z) ;
	put a= x= y= z=;
	if a=0 then
		delete ;
run ;
Shmuel
Garnet | Level 18
data have;
input dx1 dx2 dx3;
cards;
27756  826  79883
78399  2775  7274
99462  62294  677
48268  54981  27752
18872  992  27274
; run;

data want;
 set have;
     array dx dx1-dx3;
     do i=1 to dim(dx);
        numx = strip(put(dx(i),8.));
        if length(numx) ge 4 and
           substr(numx,1,4) = '2775' then flag=1;
     end;
     if flag;
run;
AMSAS
SAS Super FREQ

Ok now I see your comment "Is there a way to do this without regex??" 
Why would you not use RegEx it's the easy way to solve this problem

ballardw
Super User

Perhaps:

 

data have;
input dx1 dx2 dx3;
cards;
27756  826  79883
78399  2775  7274
99462  62294  677
48268  54981  27752
18872  992  27274
;

data want;
   set  have;
   array d dx: ;
   do i=1 to dim(d);
      if put(d[i],best12. -L)=: '2775' then do;
         output;
         leave;
      end;
   end;
   drop i;
run;

Almost any time you need to do the same thing with multiple variables an Array is a likely tool so you can write code that works for one instance and then use all the values in the array.

The -L in the put statement left justifies the result so we can use the =: , a "begins with" comparison with your fixed value of 2775.   The explicit output only writes to the output data set when the value is found. The LEAVE instruction is used to quit the loop the first time the condition is true so you would only get one record even if multiple variables meet the condition.

 

I will leave the exclude as an exercise for the interested reader but the solution is very similar but uses DELETE instead of OUTPUT.

 

Though one does wonder why these values are numeric to begin with. You do not appear to be using them as numbers. If these are some sort of identifier then you likely should make sure the values are character instead and then you don't need the somewhat kludgy Put to make a comparison value.

andreas_lds
Jade | Level 19

I fully agree to @AMSAS: This is the job for a regex.

 

Please have a look at:

data want;
   set have;
   if prxmatch('/\b2775/', catx(' ', of dx1-dx3));
run;

 

Ksharp
Super User
data have;
input dx1 dx2 dx3;
cards;
27756  826  79883
78399  2775  7274
99462  62294  677
48268  54981  27752
18872  992  27274
;
data want;
 set have;
 array x{*} _numeric_;
 do i=1 to dim(x);
  if int( x{i}/10**(int(log10(x{i}))-3) )=2775 then do;output;leave;end;
 end;
 drop i;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 644 views
  • 2 likes
  • 6 in conversation