DATA Step, Macro, Functions and more

How to parse compound class variable into separate component variables

Accepted Solution Solved
Reply
Contributor das
Contributor
Posts: 44
Accepted Solution

How to parse compound class variable into separate component variables

Looking for help with data step coding.

I've imported a very large (~22K rows) dataset output by another program as a tab delimited file. I have no trouble importing the file which consists of a class variable and several measurement variables. Unfortunately, the values of the class variable are compound and need to be parsed into two or three new variables. Here is the format and an example:

compound class variable name = "Label"

example: "Stack:B-r2-t5"

  1. "Stack:" is irrelevant and no need to keep
  2. "B" is shorthand for the local thresholding method used (Bernsen)
  3. "r2" is the (r)adius variable, a parameter used by the Bernsen method and in this example has a value of 2
  4. "t5" is the (t)hreshold variable, a second parameter used by the Bernsen method and in this example has a value of 5

I don't need the method variable since I'm only looking at the Bernsen method right now.

I do need to create two new variables, call them "radius" and "threshold", which respectively take on the numeric values following 'r' and 't' in the compound value of "Label".

Thank you,

Dave


Accepted Solutions
Solution
‎02-27-2014 06:04 PM
Super User
Posts: 19,770

Re: How to parse compound class variable into separate component variables

It looks like you're trying to parse text fields out.

Some useful functions:

Scan will separate into your 3/4 component parts, ie STACK/B/R2/T5

Substr can extract the numeric values from r2/t5.

Untested:

radius=substr(scan(label, 3, ':-'), 2,1);

threshold=substr(scan(label, 4, , ':-'), 2,1);

View solution in original post


All Replies
Solution
‎02-27-2014 06:04 PM
Super User
Posts: 19,770

Re: How to parse compound class variable into separate component variables

It looks like you're trying to parse text fields out.

Some useful functions:

Scan will separate into your 3/4 component parts, ie STACK/B/R2/T5

Substr can extract the numeric values from r2/t5.

Untested:

radius=substr(scan(label, 3, ':-'), 2,1);

threshold=substr(scan(label, 4, , ':-'), 2,1);

Contributor das
Contributor
Posts: 44

Re: How to parse compound class variable into separate component variables

Reeza,

That's great and I'm sure it'll work in the end. At the moment having a little problem with length of the numbers extracted. Here is a screen clip of a problem area where you'll see radius=20 and threshold=25 are clipped to 2 and 2:

Capture.PNG

Here is a copy of my current code:

data bernsen ;
     set bernsen_import ;
     radius=substr(scan(label, 3, ':-'), 2,1);
     threshold=substr(scan(label, 4, ':-'), 2,1);
run;

I'm looking at the SAS Help on these procedures but not there yet so thought I'd put it back out there.

Dave

Contributor das
Contributor
Posts: 44

Re: How to parse compound class variable into separate component variables

OK, think I got it. Here is a screen capture of the trouble area:

Capture.PNG

And here is the code that procduces it:

data bernsen ;

     set bernsen_import ;

     radius=substr(scan(label, 3, ':-'), 2 );

     threshold=substr(scan(label, 4, ':-'), 2 );

run;

I finally understood that the last number in the substring statement dictates length and that it did not have to be specified. So removing it fixed the problem.

Thank you so much for your amazingly fast help. I'll remember those useful functions because I do this all the time.

Dave

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 255 views
  • 3 likes
  • 2 in conversation