turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- reading data, expanding to seperate variables

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-26-2014 06:40 AM

Hi,

I appreciate any help/suggestions

I have a character variable list_of_codes ($50) which looks like this.

It is a list of codes in a shorthand notation, easy enough to read for a human

'C00.3-5, C02-04, C05.0, C05.8-9, C06'

'C01, C10.0, C10.2-3, C10.8-9, C11-12'

'C01, C05.1-2, C09, C10.0, C10.2-3, C10.8-9, C11-14'

'C44.0-4, C47.0, C49.0, C00-14, C30-32, C73'

'C44.5, C44.8-9, C47.6-9, C49.6-9'

'C00-C26, C32-C69, C73, C75.4-9, C76-C80'

'C00-50, C60-61, C63-69, C73-74, C76-80'

'C00, C03-08, C15-17, C19-41, C44-76'

'C00, C03-08, C15-17, C19-41, C44-76'

But is there a simple way to expand list_of_codes to the list of all the codes?

For example 'C01, C10.0, C10.2-3, C10.8-9, C11-12' should be expanded to:

C01.0 C10.0 C11.0 C12.0

C01.1 C10.2 C11.1 C12.1

C01.2 C10.3 C11.2 C12.2

C01.3 C10.8 C11.3 C12.3

C01.4 C10.9 C11.4 C12.4

C01.5 C11.5 C12.5

C01.6 C11.6 C12.6

C01.7 C11.7 C12.7

C01.8 C11.8 C12.8

C01.9 C11.9 C12.9

Columnar lay-out only for easier reading!

Many thanks!

Accepted Solutions

Solution

08-26-2014
01:18 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-26-2014 01:18 PM

Try this:

**data** have;

infile cards dlm=",'";

input var :$20. @@;

if not missing(var);

cards;

'C00.3-5, C02-04, C05.0, C05.8-9, C06'

'C01, C10.0, C10.2-3, C10.8-9, C11-12'

'C01, C05.1-2, C09, C10.0, C10.2-3, C10.8-9, C11-14'

'C44.0-4, C47.0, C49.0, C00-14, C30-32, C73'

'C44.5, C44.8-9, C47.6-9, C49.6-9'

'C00-C26, C32-C69, C73, C75.4-9, C76-C80'

'C00-50, C60-61, C63-69, C73-74, C76-80'

'C00, C03-08, C15-17, C19-41, C44-76'

'C00, C03-08, C15-17, C19-41, C44-76'

;

**data** want;

set have;

_varc=compress(scan(var,**1**,'-'),,'ka');

_var1=compress(scan(var,**1**,'-'),,'a');

_var2=compress(scan(var,**2**,'-'),,'a');

if findc(_var1,'.') > **0** then

do _var=_var1 to catx('.',int(_var1),_var2) by **0.1**;

new_var=cats(_varc,put(_var,z4.1));

output;

end;

else

do _var=_var1 to catx('.',coalescec(_var2,_var1),**9**) by **0.1**;

new_var=cats(_varc,put(_var,z4.1));

output;

end;

keep var new_var;

**run**;

Haikuo

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-26-2014 07:35 AM

There is no ready function to do this (simple). The question is whether is easy to build one.

FCMP is meant to define user functions by your own. The feeling is to use that one.

The input string is defined. The output must also be a string and than with length 4k?

As the C is more an indicator together with the comma than the next thing are numbers 00 to 99? with a hyphen is should generate an inlcuding loop

The last thing is a number behind the dot. When the dot is missing it should generate a loop 0 to 9 when it is not missing is van be a number or a number with a hyphen (to be looped)

---->-- ja karman --<-----

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-26-2014 08:24 AM

Jaap,

My plan is to create seperate variables for every distinct C code, but I guess one large string variables could also be used. See also the answer to data _null_

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-26-2014 08:50 AM

Here is a solution which will write the original 'code' and its expansion to the log. It can be adapted to write to a file. Fortuitously it deals with the miscode spotted by Data _Null_. Next step named in honour.

Data _null_ ;

input codes $char60. ;

length code base range result $8 ;

codes = compress(codes, "'") ;

Do k = 1 to 99 ;

code = scan (codes, k, ', ') ;

if missing(code) then return ;

range = scan (code, 2, '.') ;

base = scan (code, 1, '.') ;

m = input (scan (base, 1, 'C-'), 2.) ;

n = input (scan (base, 2, 'C-'), 2.) ;

p = input (scan (range, 1, '-'), 1.) ;

q = input (scan (range, 2, '-'), 1.) ;

if missing (range) then

do ;

p = 0 ;

q = 9 ;

end ;

put code '> ' ;

do j = m to max (m, n) ;

base = cat ('C', put (j, z2.)) ;

do i = p to max (p, q) ;

result = catx ('.', base, i) ;

put result @ ;

end ;

end ;

put ;

end ;

drop codes i j k m n p q base range ;

Datalines ;

'C00.3-5, C02-04, C05.0, C05.8-9, C06'

'C01, C10.0, C10.2-3, C10.8-9, C11-12'

'C01, C05.1-2, C09, C10.0, C10.2-3, C10.8-9, C11-14'

'C44.0-4, C47.0, C49.0, C00-14, C30-32, C73'

'C44.5, C44.8-9, C47.6-9, C49.6-9'

'C00-C26, C32-C69, C73, C75.4-9, C76-C80'

'C00-50, C60-61, C63-69, C73-74, C76-80'

'C00, C03-08, C15-17, C19-41, C44-76'

'C00, C03-08, C15-17, C19-41, C44-76'

run ;

Richard

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-26-2014 09:01 AM

Well, I see I have missed a whole chunk of conversation whilst I was collating the below code, so will post anyways. This creates a variable with the expanded numerics, so you could use this to then expand, however not sure exactly what you output should look like so have left it at that (admittedly not the nicest bit of code ever).

data have;

attrib code format=$50.;

code="C01, C10.0, C10.2-3, C10.8-9, C11-12";

output;

run;

data want (keep=code loop);

attrib loop lo hi format=8.1;

set have;

i=1;

test=strip(scan(tranwrd(code,"C",""),i,','));

do until (test="");

if index(test,"-")=0 then do;

if index(test,".")=0 then test=strip(test)||".0";

loop=input(test,4.1);

output;

end;

else do;

loc=scan(test,1,"-");

hic=scan(test,2,"-");

if index(loc,".")=0 and index(hic,".")=0 then do;

loc=strip(loc)||'.0';

hic=strip(hic)||'.9';

end;

else do;

hic=scan(loc,1,".")||"."||strip(hic);

end;

lo=input(loc,4.1);

hi=input(hic,4.1);

do j=lo to hi by 0.1;

loop=j;

output;

end;

end;

i=i+1;

test=strip(scan(tranwrd(code,"C",""),i,','));

end;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-26-2014 08:12 AM

The notation seems inconsistent to me. You have C76-C80 and in the next row C76-80. What does it mean? Are they different? Is it a typo?

'C00-C26, C32-C69, C73, C75.4-9, C76-C80'

'C00-50, C60-61, C63-69, C73-74, C76-80'

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-26-2014 08:22 AM

Thank you for your exact reading!

Unfortunately, these are actual codes from an excel file which i imported to SAS. They where compiled by hand. The total file gives about 500 of these lists, I only showed a few of the longer ones. There are also lists containing of 1 C-code.

The C codes are from the International Classification of Diseases, revision 10 for Oncology, and are given to group tumours. I had hoped to use the excel file to make an identical grouping in SAS. There also some other variables involved which I didn't show here.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-26-2014 11:40 AM

You would probably need to do the following.

1) Have a complete list of the valid codes.

2) Use the strings you currently have to generate a WHERE condition to select records from the master list.

So C00-C26, C32-C69, C73, C75.4-9, C76-C80

Would become something like

WHERE CODE BETWEEN 'C00' and 'C26'

or CODE BETWEEN 'C32' and 'C69'

or CODE = 'C73'

or CODE BETWEEN 'C75.4' and 'C75.9'

or CODE BETWEEN 'C76' and 'C80'

;

A few notes.

You will probably need to clean up the lists to make them consistently have the full base values for ranges (C75.4-C75.49) or the logic for converting will be much more complex.

You might need to consider whether requests for 'C73' is only for an exact match to 'C73' or would it also include any sub-codes that start with 'C73'. This also applies to the ranges. So the way I have written the WHERE condition above a value like 'C00.1' would be included in the first range, but 'C26.1' would not. If you want it to include subcodes of C26 then you might make the upper bound for the range to 'C26~'.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-26-2014 08:57 AM

Louis, If you want being helped with coding issues, that there is a need to be clear in expectations on what it should do. (soft skills) We are all nerds and need to understand each other.

That dumb computer (hope it keeps dumb) is doing exactly it is told to do, that is not the same as in human interactions what is intended.

I found http://www.who.int/classifications/icd/ICD10Volume2_en_2010.pdf?ua=1 An it looks my assumptions are correct (2 digits and 0-9 decimal)

New is that it is a subset of a more complete coding convention. There can be a dagger and asterisk *.

The coding of data _null_ is another variation to take care. IMO it is a better one the standardize on.

New is that you plan to generate variable lists from that that is nice and not nice.

- nice as the numbering approach is almost like SAS syntax defining variables.

- not nice as the decimal point is not part of this SAS syntax. IS the choic of using full 3 digits acceptable (assuming implied dot)?

SAS(R) 9.4 Language Reference: Concepts, Third Edition

Depending on your plans using separate values it could go into some other technical approaches.

---->-- ja karman --<-----

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-26-2014 09:29 AM

What you should be expanded to :

'C44.0-4, C47.0, C49.0, C00-14, C30-32, C73'

'C00, C03-08, C15-17, C19-41, C44-76'

Solution

08-26-2014
01:18 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-26-2014 01:18 PM

Try this:

**data** have;

infile cards dlm=",'";

input var :$20. @@;

if not missing(var);

cards;

'C00.3-5, C02-04, C05.0, C05.8-9, C06'

'C01, C10.0, C10.2-3, C10.8-9, C11-12'

'C01, C05.1-2, C09, C10.0, C10.2-3, C10.8-9, C11-14'

'C44.0-4, C47.0, C49.0, C00-14, C30-32, C73'

'C44.5, C44.8-9, C47.6-9, C49.6-9'

'C00-C26, C32-C69, C73, C75.4-9, C76-C80'

'C00-50, C60-61, C63-69, C73-74, C76-80'

'C00, C03-08, C15-17, C19-41, C44-76'

'C00, C03-08, C15-17, C19-41, C44-76'

;

**data** want;

set have;

_varc=compress(scan(var,**1**,'-'),,'ka');

_var1=compress(scan(var,**1**,'-'),,'a');

_var2=compress(scan(var,**2**,'-'),,'a');

if findc(_var1,'.') > **0** then

do _var=_var1 to catx('.',int(_var1),_var2) by **0.1**;

new_var=cats(_varc,put(_var,z4.1));

output;

end;

else

do _var=_var1 to catx('.',coalescec(_var2,_var1),**9**) by **0.1**;

new_var=cats(_varc,put(_var,z4.1));

output;

end;

keep var new_var;

**run**;

Haikuo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-27-2014 01:35 PM

There are many active programmers in this community, I thank everyone of them for their time and effort!

As usual, more than one solution exists. RichardinOz's solution I can verify is working. RDW9's has a problem with the first code he's reading, but I sure that could be fixed.

Tom's comment on checking the existence of the generated code's I will surely take to heart.

But I'm really impressed with Haikuo's compact and elegant use of the SAS functions COMPRESS, CATS and CATX. He also circumvents the typo problem noted by Data _null_

A big thank you all!