BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
alepage
Barite | Level 11

Hello,

I am getting the path and file name using a unix command.

 

Then I would like to get the month (see two string examples below)

 

/.../prod1010.saa1fm.data/polices_sep2013.sas7bdat
/.../pa.pxbtxax.sabtxfaa.dec2013.data/pa.pxbtxax.sabtxfaa.data/police.sas7bdat

 

In the first example the month is equal to sep while into the second example the month is dec.

I could use the find function as below

 

data test1;

set text1;

 

if find(text, 'jan') gt 0 then month='jan'

...

 

if find(text, 'sep') gt 0 then month='sep'

...

if find(text, 'dec') gt 0 then month='dec'

run;

 

I wonder if there is a more efficient way to do that if I have the following month:

jan, fev, mar, avr, mai, jun, jui, aou, sep, oct , nov, dec.

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Do yourself a favor, represent months as numeric variables, such a 1 2 3 and not as character strings, like 'jan' 'feb', etc. Numbers will sort numerically, while character strings will not sort properly, they will sort alphabetically, and so the first month of the year will be 'apr', which is not what most people want.

 

Do yourself a double favor, represent months as month and year in a numeric variable, so instead of 1 2 3, you have a representation such as January 2023 as a numeric variable. Make your programming easier by using the built-in SAS handling of dates.

 

data want;
    set have;
    datestr1=scan(scan(string,-3,'/'),-2,'.');
    datestr2=scan(string,-2,'._');
    monthyear=coalesce(input(datestr1,monyy7.),input(datestr2,monyy7.));
    format monthyear monyy7.;
run;

 

Now anything you do with these MONTHYEAR values will sort properly, and still work if the years go to something other than 2013.


General rule you should use from now on: calendar information, and clock information, should ALWAYS be numeric, not character.

--
Paige Miller

View solution in original post

6 REPLIES 6
Tom
Super User Tom
Super User

What do you mean by BEST?

If you know the PATTERN of the filenames then just pull out the part of the pattern that should contain the date.

In your first example it is the part before the last period.

datestr=scan(filename,-2,'._');

In your second example it appears to be the next to last part of the next to last directory name.

datestr=scan(scan(filename,-3,'/'),-2,'.');

 

PaigeMiller
Diamond | Level 26

Do yourself a favor, represent months as numeric variables, such a 1 2 3 and not as character strings, like 'jan' 'feb', etc. Numbers will sort numerically, while character strings will not sort properly, they will sort alphabetically, and so the first month of the year will be 'apr', which is not what most people want.

 

Do yourself a double favor, represent months as month and year in a numeric variable, so instead of 1 2 3, you have a representation such as January 2023 as a numeric variable. Make your programming easier by using the built-in SAS handling of dates.

 

data want;
    set have;
    datestr1=scan(scan(string,-3,'/'),-2,'.');
    datestr2=scan(string,-2,'._');
    monthyear=coalesce(input(datestr1,monyy7.),input(datestr2,monyy7.));
    format monthyear monyy7.;
run;

 

Now anything you do with these MONTHYEAR values will sort properly, and still work if the years go to something other than 2013.


General rule you should use from now on: calendar information, and clock information, should ALWAYS be numeric, not character.

--
Paige Miller
ChrisHemedinger
Community Manager

I agree that using real date values is better, but we cannot always control the inputs we get.

 

Here's another approach that uses regular expressions. I always use https://regex101.com/ as a tool to build and validate a regex before trying to use it at scale in a SAS program with the PRXMATCH function.

 

data have;
 length file $ 50;
 infile datalines dsd;
 input file;
datalines;
/.../prod1010.saa1fm.data/polices_sep2013.sas7bdat
/.../pa.pxbtxax.sabtxfaa.dec2013.data/pa.pxbtxax.sabtxfaa.data/police.sas7bdat
/.../pa.pxbtxax.sabtxfaa.JAN2013.data/pa.pxbtxax.sabtxfaa.data/police.sas7bdat
no dates here
;

data dates (drop=pos);
 set have;
 length date 8 month $ 3;
 format date monyy7.;
 pos = prxmatch('/((jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)+(\d\d\d\d))/i',file);
 if pos then do;
  date = input(substr(file,pos,7),monyy7.);
  month=put(date,monname3.);
 end;
run;

 

Output:

ChrisHemedinger_0-1725895390642.png

 

SAS Innovate 2025: Call for Content! Submit your proposals before Sept 25. Accepted presenters get amazing perks to attend the conference!
alepage
Barite | Level 11
what's the purpose of +(\d\d\d\d))/i into the prxmatch function ?
ballardw
Super User

@alepage wrote:
what's the purpose of +(\d\d\d\d))/i into the prxmatch function ?

Since \d is the "search for a digit" instruction, why might one look for 4 digits in a row?

The +  is "the preceding subexpression"

   i.e. any of the 3-letter abbreviation of month names followed by 4 digits

The / is the close of the pattern started with the previous /

i is case insensitive

ChrisHemedinger
Community Manager

As I look at this again a better regex would be:

 

/((jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)(\d{4}))/i

Which means:

  • Match exactly one of the twelve alternates for month abbreviations
  • followed by 4 digits (0-9, represented by \d directive)
  • case-insensitive (the /i modifier)

The regex I supplied in my original answer works, but the above is more concise and limits the match for month name to "exactly one" instead of "one or more" (which is what the + modifier does).

SAS Innovate 2025: Call for Content! Submit your proposals before Sept 25. Accepted presenters get amazing perks to attend the conference!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 491 views
  • 5 likes
  • 5 in conversation