Hi,
I need filter my datas from name of files which are copied in the system.
These files are sometimes renamed and somtimes not.
These files are named like that "201905061636XXXXXXXX65562.xls" but the length of the name is not always the same.
In the beginning of the name, there is the date and the hour when the file was loaded so the length is always the same.
XXXXX is the name of the client and the length changes.
Then there are numbers at the end of the file and, sometimes, when the file have been manually modified, there letter like "Modify" or "test" or "testV2", "modifyV2" etc... after this last numeric string...
I would like to cut these strings to identify the date and time in a single variable then the name of the client and the the last number writen in the name file. I dont want words like "Modify" or "Test" at the end.
I can't find how to do that. I guess I can use Scan function or something like that but I have no idea how to use a sas function with that kind of variable. Of course I get the date and time with:
date=substr(name,1,12) as the length is always the same but I don't know what to do with the other parts I need.
thanks
Using regular expressions seems to be a good idea, if the name of a client always starts with da letter and ends with a letter.
data file_names;
length
filename $ 50
datetime $ 12
client $ 20
lastNumber $ 10
;
input filename;
_rx = prxparse('/(\d{6})(\D+)(\d+)/');
if prxmatch(_rx, filename) then do;
datetime = prxposn(_rx, 1, filename);
client = prxposn(_rx, 2, filename);
lastNumber = prxposn(_rx, 3, filename);
end;
datalines;
201905061636Pepsi65562.xls
201907041351Cisco12345Test.xls
201911050612Lenovo55003modv4.xls
;
run;
You can use the SUBSTR() function to extract the date/time, since you say this will always be the same length. To extract the name, you can use the ANYDIGIT() function to see where the next number begins and then SUBSTR() to extract the name. Then you could use ANYALPHA() to find where the next letter is and then SUBSTR() again to extract the number.
Thank you so much for your answers. I'll try them.
I continued to search over internet after I posted my message and I've found a piece of code I adapted to my need:
/* words to match delimited by "|" */
%let words = MODIFIE|MODIFIEV2|MODIFV2|MODIF|V2|test|new|rectifie|MODIFIE_KOMP|MODIFIE_KOMPV2|MODIFIE_KOMPv2|modifie|MODIF2|sansmauvaisedate|MODIFIEtestianpourjcc|v3|GA|GE|MODIFIEE;
data example_clean;
set testtri2;
if _n_ = 1 then do;
retain regex;
drop regex;
regex = prxparse("s/(&words)\s+//");
end;
string = prxchange(regex, 1, want2);
run;
I have to manually enter the words I want to delete at the end of the filename but I works.
I 'll try your solutions after lunch and I'll tell you the results
I couldn't wait so I tried Andreas_ids solution and it works very well!
I have to lear how to use regular expressions, it seems very powerful.
Thank you so much for your help!
data want;
input filename $80.;
datetime = scan(filename,1,,'kd');
client = scan(filename,1,,'d');
lastNumber = scan(filename,2,,'kd');
datalines;
201905061636Pepsi65562.xls
201907041351Cisco12345Test.xls
201911050612Lenovo55003modv4.xls
;
run;
That's thought process makes you stand out and SHARPPPPPPPPPPPPPPPPPPPPPP!!!!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.