turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Extracting numbers from a mixed variable

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-09-2018 05:02 PM

Hi all,

I have a variable (Size_Cyst) recorded with height, length and width dimensions in a single variable. I want to extract them as 3 separate variables (height, length and width). The problem with the original variable is it is recorded as 261x 143 (x245) to give you all an example. 261 corresponds to height, 143 for length and 245 for width.

All I want to do is to create 3 separate variables. The numbers before first x as height, numbers after first x as length, i.e the middle set of numbers. And finally the numbers after second x as width, i.e in the parenthesis without x. Each of these numbers are not fixed, in the sense they may be 3 digit or 2 digit. It varies from ID to ID. However they range only from 2 digit to 3 digit ONLY. And also there are spacing issues between the numbers and X's, it is not captured consistently (e.g: 261x 143 (x245) 198 x 97 (x 160)

If I use compress I can get the numbers but I may lose the actual size. For 261x 143 (x245) i want it to be 261, 143 and 245 but not 26 1143 and 143...

Any help with the SAS code for this extraction would be greatly appreciated.

Thank you in advance!

Here is the example of my data set.

ID Size_Cyst

1 261x 143 (x245)

2 23x 14 (x27)

3 23 x 14 (x 27)

4 243 x 14 (x234)

5 33 x 233 (x 67)

6 89x45x (x148)

Accepted Solutions

Solution

04-10-2018
10:53 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to sms1891

04-09-2018 05:30 PM

```
data have;
input ID Size_Cyst & $20.;
datalines;
1 261x 143 (x245)
2 23x 14 (x27)
3 23 x 14 (x 27)
4 243 x 14 (x234)
5 33 x 233 (x 67)
6 89x45x (x148)
;
data want;
set have;
t=compress(translate(Size_Cyst,' ','x'),' ','kd');
height=scan(t,1);
length=scan(t,2);
width=scan(t,3);
run;
```

You don;t need input to convert to numeric as sas will do auto conversion when you compute arithmetic

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to sms1891

04-09-2018 05:06 PM

Use SCAN with the X delimiter since that seems consistent.

Use COMPRESS to keep only digits

Use INPUT to convert to a numeric variable.

The last record seems to have an extra X, which may be problematic, but if you scan in a loop that should work still.

@sms1891 wrote:

Hi all,

I have a variable (Size_Cyst) recorded with height, length and width dimensions in a single variable. I want to extract them as 3 separate variables (height, length and width). The problem with the original variable is it is recorded as 261x 143 (x245) to give you all an example. 261 corresponds to height, 143 for length and 245 for width.

All I want to do is to create 3 separate variables. The numbers before first x as height, numbers after first x as length, i.e the middle set of numbers. And finally the numbers after second x as width, i.e in the parenthesis without x. Each of these numbers are not fixed, in the sense they may be 3 digit or 2 digit. It varies from ID to ID. However they range only from 2 digit to 3 digit ONLY. And also there are spacing issues between the numbers and X's, it is not captured consistently (e.g: 261x 143 (x245) 198 x 97 (x 160)

If I use compress I can get the numbers but I may lose the actual size. For 261x 143 (x245) i want it to be 261, 143 and 245 but not 26 1143 and 143...

Any help with the SAS code for this extraction would be greatly appreciated.

Thank you in advance!

Here is the example of my data set.

ID Size_Cyst

1 261x 143 (x245)

2 23x 14 (x27)

3 23 x 14 (x 27)

4 243 x 14 (x234)

5 33 x 233 (x 67)

6 89x45x (x148)

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to sms1891

04-09-2018 05:08 PM

Sorry for the typo...last record does not have the extra X. It should be 89x45 (x148)

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to sms1891

04-09-2018 05:12 PM

As character variables:

height = scan(size_cyst, 1, 'Xx( )');

length = scan(size_cyst, 2, 'Xx( )');

width = scan(size_cyst, 3, 'Xx( )');

To get these variables as numeric takes a little more work. For example:

height = input(scan(size_cyst, 1, 'Xx( )'), 3.);

Also notice that the space in the third parameter of SCAN is required.

Solution

04-10-2018
10:53 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to sms1891

04-09-2018 05:30 PM

```
data have;
input ID Size_Cyst & $20.;
datalines;
1 261x 143 (x245)
2 23x 14 (x27)
3 23 x 14 (x 27)
4 243 x 14 (x234)
5 33 x 233 (x 67)
6 89x45x (x148)
;
data want;
set have;
t=compress(translate(Size_Cyst,' ','x'),' ','kd');
height=scan(t,1);
length=scan(t,2);
width=scan(t,3);
run;
```

You don;t need input to convert to numeric as sas will do auto conversion when you compute arithmetic

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to sms1891

04-10-2018 09:30 AM

```
data have;
input ID Size_Cyst & $20.;
datalines;
1 261x 143 (x245)
2 23x 14 (x27)
3 23 x 14 (x 27)
4 243 x 14 (x234)
5 33 x 233 (x 67)
6 89x45x (x148)
;
data want;
set have;
do i=1 to countw( Size_Cyst,,'kd');
value=scan( Size_Cyst,i,,'kd'); output;
end;
drop i;
run;
```