Extracting numbers from a mixed variable

Accepted Solution Solved
Reply
Contributor
Posts: 22
Accepted Solution

Extracting numbers from a mixed variable

Hi all,

I have a variable (Size_Cyst) recorded with height, length and width dimensions in a single variable. I want to extract them as 3 separate variables (height, length and width). The problem with the original variable is it is recorded as 261x 143 (x245)  to give you all an example. 261 corresponds to height, 143 for length and 245 for width.

 

All I want to do is to create 3 separate variables. The numbers before first x as height,  numbers after first x as length, i.e the middle set of numbers. And finally the numbers after second x as width, i.e in the parenthesis without x. Each of these numbers are not fixed, in the sense they may be 3 digit or 2 digit. It varies from ID to ID. However they range only from 2 digit to 3 digit ONLY. And also there are spacing issues between the numbers and X's, it is not captured consistently (e.g:  261x 143 (x245)  198 x 97 (x 160)

 

 

If I use compress I can get the numbers but I may lose the actual size. For 261x 143 (x245) i want it to be 261, 143 and 245 but not 26 1143 and 143...

 

Any help with the SAS code for this extraction would be greatly appreciated.

 

Thank you in advance!

 

Here is the example of my data set.

 

ID     Size_Cyst

1      261x 143 (x245)

2      23x 14 (x27)

3      23 x 14 (x 27)

4      243 x 14 (x234)

5      33 x 233 (x 67)

6      89x45x (x148)


Accepted Solutions
Solution
‎04-10-2018 10:53 AM
PROC Star
Posts: 1,814

Re: Extracting numbers from a mixed variable

data have;
input ID     Size_Cyst & $20.;
datalines;
1      261x 143 (x245)
2      23x 14 (x27)
3      23 x 14 (x 27)
4      243 x 14 (x234)
5      33 x 233 (x 67)
6      89x45x (x148)
;

data want;
set have;
t=compress(translate(Size_Cyst,' ','x'),' ','kd');
height=scan(t,1);
length=scan(t,2);
width=scan(t,3);
run;

You don;t need input to convert to numeric as sas will do auto conversion when you compute arithmetic

View solution in original post


All Replies
Super User
Posts: 23,729

Re: Extracting numbers from a mixed variable

Use SCAN with the X delimiter since that seems consistent.

Use COMPRESS to keep only digits

Use INPUT to convert to a numeric variable.

 

The last record seems to have an extra X, which may be problematic, but if you scan in a loop that should work still.

 


@sms1891 wrote:

Hi all,

I have a variable (Size_Cyst) recorded with height, length and width dimensions in a single variable. I want to extract them as 3 separate variables (height, length and width). The problem with the original variable is it is recorded as 261x 143 (x245)  to give you all an example. 261 corresponds to height, 143 for length and 245 for width.

 

All I want to do is to create 3 separate variables. The numbers before first x as height,  numbers after first x as length, i.e the middle set of numbers. And finally the numbers after second x as width, i.e in the parenthesis without x. Each of these numbers are not fixed, in the sense they may be 3 digit or 2 digit. It varies from ID to ID. However they range only from 2 digit to 3 digit ONLY. And also there are spacing issues between the numbers and X's, it is not captured consistently (e.g:  261x 143 (x245)  198 x 97 (x 160)

 

 

If I use compress I can get the numbers but I may lose the actual size. For 261x 143 (x245) i want it to be 261, 143 and 245 but not 26 1143 and 143...

 

Any help with the SAS code for this extraction would be greatly appreciated.

 

Thank you in advance!

 

Here is the example of my data set.

 

ID     Size_Cyst

1      261x 143 (x245)

2      23x 14 (x27)

3      23 x 14 (x 27)

4      243 x 14 (x234)

5      33 x 233 (x 67)

6      89x45x (x148)


 

Contributor
Posts: 22

Re: Extracting numbers from a mixed variable

Sorry for the typo...last record does not have the extra X. It should be 89x45 (x148)

Super User
Posts: 6,776

Re: Extracting numbers from a mixed variable

As character variables:

 

height = scan(size_cyst, 1, 'Xx( )');

length = scan(size_cyst, 2, 'Xx( )');

width = scan(size_cyst, 3, 'Xx( )');

 

To get these variables as numeric takes a little more work.  For example:

 

height = input(scan(size_cyst, 1, 'Xx( )'), 3.);

 

Also notice that the space in the third parameter of SCAN is required.

Solution
‎04-10-2018 10:53 AM
PROC Star
Posts: 1,814

Re: Extracting numbers from a mixed variable

data have;
input ID     Size_Cyst & $20.;
datalines;
1      261x 143 (x245)
2      23x 14 (x27)
3      23 x 14 (x 27)
4      243 x 14 (x234)
5      33 x 233 (x 67)
6      89x45x (x148)
;

data want;
set have;
t=compress(translate(Size_Cyst,' ','x'),' ','kd');
height=scan(t,1);
length=scan(t,2);
width=scan(t,3);
run;

You don;t need input to convert to numeric as sas will do auto conversion when you compute arithmetic

Super User
Posts: 10,784

Re: Extracting numbers from a mixed variable

data have;
input ID     Size_Cyst & $20.;
datalines;
1      261x 143 (x245)
2      23x 14 (x27)
3      23 x 14 (x 27)
4      243 x 14 (x234)
5      33 x 233 (x 67)
6      89x45x (x148)
;

data want;
set have;
do i=1 to countw( Size_Cyst,,'kd');
  value=scan( Size_Cyst,i,,'kd'); output;
end;
drop i;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 128 views
  • 1 like
  • 5 in conversation