BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
alepage
Barite | Level 11

Hello,

I am getting the path and file name using a unix command.

 

Then I would like to get the month (see two string examples below)

 

/.../prod1010.saa1fm.data/polices_sep2013.sas7bdat
/.../pa.pxbtxax.sabtxfaa.dec2013.data/pa.pxbtxax.sabtxfaa.data/police.sas7bdat

 

In the first example the month is equal to sep while into the second example the month is dec.

I could use the find function as below

 

data test1;

set text1;

 

if find(text, 'jan') gt 0 then month='jan'

...

 

if find(text, 'sep') gt 0 then month='sep'

...

if find(text, 'dec') gt 0 then month='dec'

run;

 

I wonder if there is a more efficient way to do that if I have the following month:

jan, fev, mar, avr, mai, jun, jui, aou, sep, oct , nov, dec.

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Do yourself a favor, represent months as numeric variables, such a 1 2 3 and not as character strings, like 'jan' 'feb', etc. Numbers will sort numerically, while character strings will not sort properly, they will sort alphabetically, and so the first month of the year will be 'apr', which is not what most people want.

 

Do yourself a double favor, represent months as month and year in a numeric variable, so instead of 1 2 3, you have a representation such as January 2023 as a numeric variable. Make your programming easier by using the built-in SAS handling of dates.

 

data want;
    set have;
    datestr1=scan(scan(string,-3,'/'),-2,'.');
    datestr2=scan(string,-2,'._');
    monthyear=coalesce(input(datestr1,monyy7.),input(datestr2,monyy7.));
    format monthyear monyy7.;
run;

 

Now anything you do with these MONTHYEAR values will sort properly, and still work if the years go to something other than 2013.


General rule you should use from now on: calendar information, and clock information, should ALWAYS be numeric, not character.

--
Paige Miller

View solution in original post

6 REPLIES 6
Tom
Super User Tom
Super User

What do you mean by BEST?

If you know the PATTERN of the filenames then just pull out the part of the pattern that should contain the date.

In your first example it is the part before the last period.

datestr=scan(filename,-2,'._');

In your second example it appears to be the next to last part of the next to last directory name.

datestr=scan(scan(filename,-3,'/'),-2,'.');

 

PaigeMiller
Diamond | Level 26

Do yourself a favor, represent months as numeric variables, such a 1 2 3 and not as character strings, like 'jan' 'feb', etc. Numbers will sort numerically, while character strings will not sort properly, they will sort alphabetically, and so the first month of the year will be 'apr', which is not what most people want.

 

Do yourself a double favor, represent months as month and year in a numeric variable, so instead of 1 2 3, you have a representation such as January 2023 as a numeric variable. Make your programming easier by using the built-in SAS handling of dates.

 

data want;
    set have;
    datestr1=scan(scan(string,-3,'/'),-2,'.');
    datestr2=scan(string,-2,'._');
    monthyear=coalesce(input(datestr1,monyy7.),input(datestr2,monyy7.));
    format monthyear monyy7.;
run;

 

Now anything you do with these MONTHYEAR values will sort properly, and still work if the years go to something other than 2013.


General rule you should use from now on: calendar information, and clock information, should ALWAYS be numeric, not character.

--
Paige Miller
ChrisHemedinger
Community Manager

I agree that using real date values is better, but we cannot always control the inputs we get.

 

Here's another approach that uses regular expressions. I always use https://regex101.com/ as a tool to build and validate a regex before trying to use it at scale in a SAS program with the PRXMATCH function.

 

data have;
 length file $ 50;
 infile datalines dsd;
 input file;
datalines;
/.../prod1010.saa1fm.data/polices_sep2013.sas7bdat
/.../pa.pxbtxax.sabtxfaa.dec2013.data/pa.pxbtxax.sabtxfaa.data/police.sas7bdat
/.../pa.pxbtxax.sabtxfaa.JAN2013.data/pa.pxbtxax.sabtxfaa.data/police.sas7bdat
no dates here
;

data dates (drop=pos);
 set have;
 length date 8 month $ 3;
 format date monyy7.;
 pos = prxmatch('/((jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)+(\d\d\d\d))/i',file);
 if pos then do;
  date = input(substr(file,pos,7),monyy7.);
  month=put(date,monname3.);
 end;
run;

 

Output:

ChrisHemedinger_0-1725895390642.png

 

Register for SAS Innovate 2025!! The premier event for SAS users, May 6-9 in Orlando FL. Sign up now for the best deals!
alepage
Barite | Level 11
what's the purpose of +(\d\d\d\d))/i into the prxmatch function ?
ballardw
Super User

@alepage wrote:
what's the purpose of +(\d\d\d\d))/i into the prxmatch function ?

Since \d is the "search for a digit" instruction, why might one look for 4 digits in a row?

The +  is "the preceding subexpression"

   i.e. any of the 3-letter abbreviation of month names followed by 4 digits

The / is the close of the pattern started with the previous /

i is case insensitive

ChrisHemedinger
Community Manager

As I look at this again a better regex would be:

 

/((jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)(\d{4}))/i

Which means:

  • Match exactly one of the twelve alternates for month abbreviations
  • followed by 4 digits (0-9, represented by \d directive)
  • case-insensitive (the /i modifier)

The regex I supplied in my original answer works, but the above is more concise and limits the match for month name to "exactly one" instead of "one or more" (which is what the + modifier does).

Register for SAS Innovate 2025!! The premier event for SAS users, May 6-9 in Orlando FL. Sign up now for the best deals!

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1391 views
  • 5 likes
  • 5 in conversation