BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
J111
Quartz | Level 8
 
Hello,
Please find below data Have and data Want.


Data Have ;
input data $100. ;
cards ;
data/dataflow/BPS.STQR/EZR/1.0/RER_QWE_MED.D.AUD.MED, RER_QWA_MED.D.CAD.WOW.WOW.OF00,  QWZ_PLO_WOW.D.POP
;
Run ;

 

Data Want ;
input x1 $60. ;
cards ;
RER_QWE_MED , RER_QWA_MED , QWZ_PLO_WOW
;
Run ;

We would like to extract X1 from the data according to the following rules for Type A and Type B:
 
Type A
 
1. it appears after the last back slash in the sentence
2. and also before the first dot following this word (if there is a dot)
3. and this word contains only capital letters or underscore in it (without small letters)
4. if such a word is not found write 'NOWORD'
 
Type B
 
here we have to extract more than one word and the logic is as follows:
1. it appears after the last back slash in the sentence
2. continue in the same row until we find a comma and then take the word which is between the comma and the next first dot.
recursively look for all the words between comma and next dot (as long as we are after the last backslash in the row). 
 
the purpose is to clean relevant words from garbage, for example '.CAD.' is garbage.
 
for type A we have the solution
  x1 = scan(scan(data,-1,"/"),1,'.') ; *after last back slash and before first period ;
  if count(x1,lowcase(x1)) = 1 then x1 = 'NOWORD' ; 
 
We need a solution for type B
 
best solution would handle both type A and type B in the same row.
 
------------------------------------------------------
BTW Next step we split the want data by delims.
 
Data Audit_Split ;
   set Audit_short ;
   delims = '+,' ;              
   Array s_ [40] $55 s1-s40 ; 
do i = 1 to 40 ;            
   s_[i] = left(scan(string, i, delims)) ; 
end ;
   drop delims i ; 
Run ; 
----------------------------------------------------------------------------------------------------------

 

Thanks in advance.
 
1 ACCEPTED SOLUTION

Accepted Solutions
J111
Quartz | Level 8

Hello,

 

Seems like this can work for TYPE B,

 

 
Data next ;
   set have ;
   xx = scan(data,-1,"/") ;
   Array x_ [40] $55 x1-x40 ; 
do i = 1 to 40 ;            
   x_[i] = scan(scan(xx,-i,','),1,'.') ; ; 
end ;
  drop i xx data ;
Run ; 

View solution in original post

2 REPLIES 2
J111
Quartz | Level 8

Hello,

 

Seems like this can work for TYPE B,

 

 
Data next ;
   set have ;
   xx = scan(data,-1,"/") ;
   Array x_ [40] $55 x1-x40 ; 
do i = 1 to 40 ;            
   x_[i] = scan(scan(xx,-i,','),1,'.') ; ; 
end ;
  drop i xx data ;
Run ; 
Patrick
Opal | Level 21

@J111 
You've got already a solution so below just for your consideration. 

You wrote: "best solution would handle both type A and type B in the same row". 

I made in below code the assumption that also type B should only select terms that only contain upper case letters and the underscore. If that's the case then you can "merge" your two type into a single rule.

Below should return your desired result based on my understanding of your rules.

Data Have ;
	infile datalines4 truncover;
	input source_string $600. ;
datalines4 ;
/availability/dataflow/*/*/*/*/-
data/dataflow/BPS.STQR/WAB/1.0/NER_PGTABC
/data/dataflow/BPS.STQR/EZR/1.0/RER_QWE_MED.D.AUD.MED.MED.OF00,RER_QWA_MED.D.CAD.WOW.WOW.OF00,QWZ_PLO_WOW.D.POP.WOW.MED.OF00
NOSLASH.AT.ALL
/SLASH_bUT_mIxed_Case.B.C
;;;;
Run;

data want(drop=_:);
	set have;
	/* substring after last forward slash if any, else just whole string */
	_s=scan(source_string,-1,'/');
	
	/* within _s iterate for each substring that ends with a comma or is the last substring */
	do _i=1 to countc(_s,',')+1;
		length _word $50 words $200;
		/* extract the first sub-string that ends with a full stop else the whole string */
		_word=scan(scan(_s,_i,','),1,'.');
		/* check that _word only contains upper case characters or the underscore */
		if prxmatch('/^[A-Z_]+$/',strip(_word))=1 then
			do;
				words=catx(',',words,_word);
			end;
	end;
	if missing(words) then words='NOWORD';
run;

proc print data=want;
run;

 Patrick_0-1737589930139.png

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1478 views
  • 0 likes
  • 2 in conversation