BookmarkSubscribeRSS Feed
DOBBINHO
Obsidian | Level 7

Hi all,

 

I have been trying to find ways by which I could filter out my dataset based on the following requirement.

I have a variable named path (eg: wwb/dat/<ta>/<3digitnumber>/<3digitnumber>/) which should contain:

a) <ta> which could be ab, bd, ge, mf, ac.

b) either only the first field containing <3digitnumber> or both

 

Kindly advice on how I should approach this.....

13 REPLIES 13
Reeza
Super User

Use SCAN() to get the different parts of the path

 

USE IN to check if it matches your criteria

 

Demo code, untested :

 

letter_part = scan(word, 3, '/');

number_part1 = scan(word, 4, '/');

number_part2 = scan(word, 5, '/');

 

if letter_part in ('ab', 'bc', 'cd') and <some criteria about number_part1) and <some criteria about number_part2>;

DOBBINHO
Obsidian | Level 7
Well, the thing is, paths doesn't really have a fixed format. It may or may
not start that way, it may be starting off with , or may not contain it
at all.... I did try the method you said before approaching the forum for
suggestions. Thanks for trying though

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Try posting some examples so we can actually see what your working with/wanting out.  Post test data in the form of a datastep so we don't have to guess what you data structure is.

It looks a bit like HTML or XML, if so maybe use that to import (XML libname for instance).  But still just guessing.

DOBBINHO
Obsidian | Level 7

malai,luch,ab/007/007/stab/unblin/raw,old,xxxx,full,11/8/2004,16:14:43
malai,luch,ab/007/007/stab/unblin/raw,new,modify,read,11/8/2004,16:14:43
malai,malai,/wwb/data/bd/007/007/stab/unblin/views,new,add,read,11/10/2004,10:54:49
malai,malai,/ww/data/ge/007/007/stab/unblin/./views,old,xxxx,read,11/10/2004,10:57:49
malai,malai,/wwb/data/mf/007/007/stab/unblin/./views,new,modify,full,11/10/2004,10:57:49
malai,luch,/wwb/data/mf/007/007/stab/unblin/views,new,add,full,11/10/2004,10:59:35
malai,boco,/wwb/data/ac/007/007/stab/unblin/views,new,add,full,11/10/2004,10:59:35
malai,luch,/ww/data/mf/007/007/stab/unblin/views,old,xxxx,full,11/10/2004,11:0:7
malai,luch,/ww/data/mf/007/007/stab/unblin/views,new,modify,read,11/10/2004,11:0:7
malai,boco,/wwb/data/bd/007/007/stab/unblin/views,old,xxxx,full,11/10/2004,11:0:7
malai,boco,/wwbd/data/ab/007/007/stab/unblin/views,new,modify,read,11/10/2004,11:0:7
malai,luch,/wwb/data/ab/007/007/stab/unblin/views,old,xxxx,read,11/10/2004,11:0:39
malai,luch,/ww/data/bd/007/007/stab/unblin/views,new,delete,,11/10/2004,11:0:39
malai,boco,/wwb/data/mf/007/007/stab/unblin/views,old,xxxx,read,11/10/2004,11:0:39
malai,boco,/ww/data/ac/007/007/stab/unblin/views,new,delete,,11/10/2004,11:0:39
malai,malai,/wwb/data/ac/007/007/stab/blin/raw,new,add,full,11/10/2004,11:37:19
malai,malai,/apps/bin/toolsdev/ge,old,xxxx,full,11/10/2004,16:34:25
malai,malai,/apps/bin/toolsdev/ge,new,modify,full,11/10/2004,16:34:25
luch,luch,/wwb/data/ac/007/007/stab/blin/**bleep**,new,add,full,11/11/2004,9:44:37
boco,boco,/ww/data/bd/101/800/stab/blin/views,new,add,full,11/16/2004,17:32:57
boco,boco,/wwb/data/bd/101/800/stab/blin/views,old,xxxx,full,11/16/2004,17:37:43
gaise,freem,/wwb/clin/proja/emb,new,add,read,11/10/2005,15:10:26
gaise,hard,/wwb/cl/pro/emb,new,add,read,11/10/2005,15:10:38
gaise,hard,/wwb/cli/pr/emba,old,xxxx,read,11/10/2005,15:10:44
gaise,hard,/wwb/cl/pro/emb,new,modify,full,11/10/2005,15:10:44
gaise,perr,/wwb/cli/pr/embc,new,add,full,11/10/2005,15:10:51

 

The above is a sample of the data to be processed. Order as follows:

user1

user2

path

old/new

action

access

date

time

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Is that in one variable?  Posting test data in the form of a datastep makes things far simpler.  Anyways, if that is one string, then first step is to parse it out into its component parts.  Can't really help there as no idea what the component parts are, maybe it is comma delimited?  Once you have it in variables, then identify what set of filters to apply.

DOBBINHO
Obsidian | Level 7

This is comma delimited indeed. the variables are in the order mentioned earlier , i.e:

a)user1

b)user2

c)path

d)old/new

e)action

f)access

g)date

h)time

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ok, so if you want to filter on:

a) <ta> which could be ab, bd, ge, mf, ac.

 

 

From var C, something like:

data want;
  set have;
  length str code $200;
  str=scan(yourstring,2,",");
  code=ifc(scan(str,1,"/")="",scan(str,2,"/"),scan(str,1,"/"));
  if code in ("ab","bd","ge","mf","ac") then do;
  end;
run;

Its basically just about getting the items you want into variables so you can use them.  You can use scan with the given delimter to chop out the big string into smaller parts, and then scan the smaller parts etc.

ballardw
Super User

And output for that example "data" would be what?

DOBBINHO
Obsidian | Level 7

this is just sample of two observations:

 

STD                SNP      User1    User2     Path                                                          oldnew  acctype1 acctype2  date            time

AB007007Stabluchmalai

ab/007/007/stab/unblin/raw

oldxxxxfull

11/8/2004

16:14:43

BD007007stabmalaimalai

/wwb/data/bd/007/007/stab/unblin/views

newaddread

11/10/2004

10:54:49

Kurt_Bremser
Super User

If you can formulate a rule for the things you want to find, this rule can be put into code for the computer. If you cannot state such a rule, there's nothing we can do.

And whoever came up with

"paths doesn't really have a fixed format."

needs to find a profession that does NOT involve computers at all.

DOBBINHO
Obsidian | Level 7
Well if that were the case, our bosses would have been jobless instead of being where they are now..... 😛 😛
s_lassen
Meteorite | Level 14

Here is a case where Pearl Regular Expressions shine:

data want;
  set have;
  if prxmatch('/\b(ab|bd|ge|mf|ac)\/\d{3,3}\/\d{3,3}\//',path);
run;

A short explanation of the match string:

\b is a word boundary

(ab|bd|ge|mf|ac) means either of the letter combinations you are looking for

\/ means a slash ("/", which has to be escaped with a backslash)

\d{3,3} means exactly 3 digits

DOBBINHO
Obsidian | Level 7

thank you very much for making me understand the Pattern matching method in sas, but now I am facing another issue, where the path is having two consecutive paths included like so:

chadh,EO,/gb/pro/cli/da/bd/185/047/stab/blin/le0/td.sas7bdat /gb/pro/cl/da/bd/185/047/stab/blin/le0/tf.sas7bdat,old,xxxx,none,1/10/2008,12:13:53
chadh,EO,/gb/pro/cli/da/bd/185/047/stab/blin/le0/td.sas7bdat /gb/pro/cli/da/bd/185/047/stab/blin/le0/tf.sas7bdat,new,add,full,1/10/2008,12:13:53

what do I do in this case?

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1206 views
  • 2 likes
  • 6 in conversation