DATA Step, Macro, Functions and more

Regular Expressions: Performing String Extractions on Several Variables at Once

Reply
Frequent Contributor
Posts: 142

Regular Expressions: Performing String Extractions on Several Variables at Once

I have some SAS code that applies a Regular Expression to a URL string which pulls out a Category ID.

The Variable looks like this

URL
http://www.mywebsite.com/family/index.jsp?categoryId=61765546&cp=1766205&ab=en_US_MLP_SLOT_1_S1_SHOP
http://www.mywebsite.com/shop/index.jsp?categoryId=62593996&AB=en_US_HP_S2_Men_slot_1_S2_ShopNow

I apply this code:


data Want;
set have;
category_Id
=input(prxchange('s/.+categoryId=(\d+).+/$1/o',-1,URL), 12.);run;


I get this:


category_Id
61765546
62593996


The data set I have actually has many URL variables – URL_1, URL_2, URL_3….. and so on. It could go up to 80 URL vars or it could be 10.

What I need to do is pull the category IDs out of all the URL vars and then transpose them by Campaign_Name.

The data looks like this.

Campaign_NameURL_1URL_2URL_3URL_4
Summer Campaignhttp://www.mywebsite.com/family/index.jsp?categoryId=61765546http://www.mywebsite.com/shop/index.jsp?categoryId=62593996&AB=en_US_HP_S2_Men_slot_1_S2_ShopNowhttp://www.mywebsite.com/family/index.jsp?categoryId=62593846&cp=1766205&ab=ln_men_cs_thetrend:tropi...http://www.mywebsite.com/family/index.jsp?categoryId=62594006&cp=1766205

What I need is this:

Campaign_Namecategory_Id
Summer Campaign61765546
Summer Campaign62593996
Summer Campaign62593846
Summer Campaign62594006
Super User
Posts: 17,828

Re: Regular Expressions: Performing String Extractions on Several Variables at Once

You'll have to tweak this a bit to deal with when a URL is missing but here's the general idea.

Declare an array to hold the url. Find the Category ID, and then output the records so it transposes in one step. Only keep the variables you need.

data want;

set have;

array url_array(80) $ url_:;

do i=1 to 80;

category_Id=input(prxchange('s/.+categoryId=(\d+).+/$1/o',-1,URL_array(i)), 12.);

OUTPUT;

end;

keep campaign_name category_id;

run;

PS if your question relates to a previous question its sometimes useful to link back to it.

EDIT: Change to correctly reference array

Frequent Contributor
Posts: 142

Re: Regular Expressions: Performing String Extractions on Several Variables at Once

That didn't seem to work for some reason.  The category_id ID var came back all null. Is there something wrong with the array?

Super User
Posts: 17,828

Re: Regular Expressions: Performing String Extractions on Several Variables at Once

Yes, I modified the code above already.

Super Contributor
Posts: 275

Re: Regular Expressions: Performing String Extractions on Several Variables at Once

Try this:

category_Id=input(prxchange('s/.*(?<=categoryId=)(\d+).*/$1/',-1,string),12.);

Ask a Question
Discussion stats
  • 4 replies
  • 238 views
  • 3 likes
  • 3 in conversation