read underscore

SASPhile · Posted 01-19-2017 01:35 PM

there are text files of varying lengths. I would like to get the first part of files before processing them.
like if there are fles like this:

credit_scores_201501_usa.txt
mortgage_scores_201501_usa_st.txt
I only want credit_scores_201501 and mortgage_scores_201501.
There is either one underscore or two underscores before '.' that will give me the file name I'm interested in.

how to read number of underscores in a text file

SASPhile · Posted 01-19-2017 01:24 PM

there are text files of varying lengths. I would like to get the first part of files before processing them.
like if there are fles like this:

credit_scores_201501_usa.txt
mortgage_scores_201501_usa_st.txt
I only want credit_scores_201501 and mortgage_scores_201501.
There is either one underscore or two underscores before '.' that will give me the file name I'm interested in.

And how to read number of underscores in a text file?

Astounding · Posted 01-19-2017 01:43 PM

Counting the number of underscores is relatively straightforward:

n_underscores = countc(filename, '_');

That doesn't finish the job though. Are you comfortable with the rest of it, or do you need further help?

This might be easier:

new_filename = catx('_', scan(filename, 1, '_'), scan(filename, 2, '_'), scan(filename, 3, '_') );

SASPhile · Posted 01-19-2017 01:59 PM

There are not just three underscores in the file.

i wan the file names will have either 8 or 9 underscores. if it has 8 i want the the file name before the 7th underscore. if it has 9 I want file name before the 8th underscore.

Astounding · Posted 01-19-2017 02:18 PM

OK, working with that idea ...

n_underscores = countc(filename, '_');

new_filename = scan(filename, 1, '_');

do i=2 to n_underscores-1;

new_filename = catx('_', new_filename, scan(filename, i, '_') );

end;

If I'm understanding you correctly, this does it. But there is a possiblity you may need to change n_underscores-1 to n_underscores-2.

SASPhile · Posted 01-19-2017 02:45 PM

Thanks!

but the idea is to output the file name that is either one underscores or two underscores before '.' based on the length of file name.

Astounding · Posted 01-19-2017 03:16 PM

Now I'm confused. Are you saying that you want only the date piece, or all the pieces up to and including the date piece?

SASPhile · Posted 01-19-2017 03:51 PM

all the pieces up to and including the date piece

Astounding · Posted 01-19-2017 04:06 PM

Doesn't my code in message #4 do exactly that?

SASPhile · Posted 01-22-2017 10:47 AM

Its reading the first three words.

Astounding · Posted 01-22-2017 01:05 PM

Yes, my original response reads three words. But my second response in message #4 should read more than that.

Kurt_Bremser · Posted 01-19-2017 02:04 PM

countw(string,'_') - 1

will give you the number of underscores in string.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mkeintz · Posted 01-22-2017 03:00 PM

If sometimes you remove two underscore-preceded "words" and sometimes you remove one, then getting a count of underscores would not seem to be the main issue. It's coming up with a criterion to stop removing words.

It appears that you really want to remove trailing words until the last word is all numeric. If that's the case then you want to see if the NOTDIGIT function applied to that last word returns a zero, i.e.

Keep removing last word until ( NOTDIGIT(last word of phrase)=0).

Getting the last underscore-separated word (trimmed of trailing blanks) is just the function trim(SCAN(phrase,-1,'_')). So the stopping criterion is:

until (NOTDIGIT(trim(scan(phrase,-1,'_')))=0)

This program uses that approach:

data have;
  input phrase $40. ;
datalines;
credit_scores_201501_usa.txt
mortgage_scores_201501_usa_st.txt
run;

data want;
  set have;
  length rslt $40;
  rslt=phrase;
  do until (notdigit(trim(scan(rslt,-1,'_')))=0);
    rslt=tranwrd(rslt,cats('_',scan(rslt,-1,'_')),'');
  end;
  put rslt=;
run;

The tranwrd function takes the full phrase in RSLT and "translates" the last word (including leading underscore) to a blank - i.e. it removes the last word.

This needs no further modification unless you expecte a trailing word you wish to remove to be exactly duplicated in some earlier word (i.e. if you had a phrase like "mortgage_scores_usa_201501_usa_st.txt"). If that were the case, then - inside the TRANWRD function - temporarily make the last word absolutely unique by appending something like an exclamation to it, and remove it in the TRANWRD function, as in

rslt=tranwrd(cats(rslt,'!'),cats('_',scan(rslt,-1,'_'),'!'),'');

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ChrisNZ · Posted 01-22-2017 05:56 PM

2 possible one-liners:

data WANT;
  input NAME : $30.;
  SHORT=substr(NAME,1,anydigit(NAME)+5);
  putlog SHORT=;
  SHORT=prxchange('s/(.*\d{6}).*/$1/',1,NAME);
  putlog SHORT=;
cards;
credit_scores_201501_usa.txt
mortgage_score_s_201501_usa_st.txt
run;

SHORT=credit_scores_201501
SHORT=credit_scores_201501
SHORT=mortgage_score_s_201501
SHORT=mortgage_score_s_201501

High-Performance SAS Coding - Third Edition

ChrisNZ · Posted 01-22-2017 06:01 PM

If you want to ensure that the date is surrounded by underscores:

  SHORT=prxchange('s/(.*_\d{6})_.*/$1/',1,NAME);

High-Performance SAS Coding - Third Edition

read underscore

read underscores in text file

Re: read underscore

Re: read underscore

Re: read underscore

Re: read underscore

Re: read underscore

Re: read underscore

Re: read underscore

Re: read underscore

Re: read underscore

Re: read underscore

Re: read underscore

Re: read underscore

Re: read underscore

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away