Hi,
I need to C.
The ID always follows “categoryId=” in the URL string and its always numbers. I can’t use SCAN because the ID can be in a different place in the URL and is not always the same length .
Again, its always the numbers which follow "categoryId=" in the string. There may be other IDs in the URL which don't follow category ID so I need to make sure I'm not picking those up. This is the output I'm looking for:
Any assistance is greatly appreciated. Thanks!
/*this one gives you the id if your raw data needs to be read in*/
data have;
infile cards truncover dlm='&';
input @'categoryId=' id @1 url $200.;
cards;
http://www.mywebsite.com/family/index.jsp?categoryId=61765546&cp=1766205&ab=en_US_MLP_SLOT_1_S1_SHOP
http://www.mywebsite.com/shop/index.jsp?categoryId=62593996&AB=en_US_HP_S2_Men_slot_1_S2_ShopNow
http://www.mywebsite.com/family/index.jsp?categoryId=56906456
http://www.mywebsite.com/shop/index.jsp?categoryId=62593866&cp=1766205&ab=ln_men_cs_theshirt
http://www.mywebsite.com/shop/index.jsp?categoryId=57155616
http://www.mywebsite.com/shop/index.jsp?categoryId=1766618&ab=tn_women_golfandTennis&cp=17666
;
/*This one gives you the id if your data is already in a table*/
data want;
set have;
id_prx=prxchange('s/.+categoryId=(\d+).+/$1/o',-1,url);
run;
data have;
infile cards truncover ;
input url $200.;
cards;
http://www.mywebsite.com/family/index.jsp?categoryId=61765546&cp=1766205&ab=en_US_MLP_SLOT_1_S1_SHOP
http://www.mywebsite.com/shop/index.jsp?categoryId=62593996&AB=en_US_HP_S2_Men_slot_1_S2_ShopNow
http://www.mywebsite.com/family/index.jsp?categoryId=62593846&cp=1766205&ab=ln_men_cs_thetrend:tropi...
http://www.mywebsite.com/family/index.jsp?categoryId=62594006&cp=1766205&AB=en_US_MLP_P_slot_10_S1_S...
http://www.mywebsite.com/family/index.jsp?categoryId=56906456
http://www.mywebsite.com/shop/index.jsp?categoryId=62593866&cp=1766205&ab=ln_men_cs_theshirt
http://www.mywebsite.com/family/index.jsp?categoryId=62593876&cp=1795710&AB=en_US_MLP__slot_2_S1_Sho...
http://www.mywebsite.com/shop/index.jsp?categoryId=57155616
http://www.mywebsite.com/shop/index.jsp?categoryId=1766618&ab=tn_women_golfandTennis&cp=17666
;
data want;
set have;
pid= prxparse('/(?<=categoryId=)\d+/i');
call prxsubstr(pid, url, position, length);
if position ne 0 then do;
match = substr(url, position, length);
end;
drop pid position length;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.