BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LvanderHeijden
Calcite | Level 5


Hi,

I appreciate any help/suggestions

I have a character variable list_of_codes ($50) which looks like this.
It is a list of codes in a shorthand notation, easy enough to read for a human

'C00.3-5, C02-04, C05.0, C05.8-9, C06'
'C01, C10.0, C10.2-3, C10.8-9, C11-12'
'C01, C05.1-2, C09, C10.0, C10.2-3, C10.8-9, C11-14'
'C44.0-4, C47.0, C49.0, C00-14, C30-32, C73'
'C44.5, C44.8-9, C47.6-9, C49.6-9'
'C00-C26, C32-C69, C73, C75.4-9, C76-C80'
'C00-50, C60-61, C63-69, C73-74, C76-80'
'C00, C03-08, C15-17, C19-41, C44-76'
'C00, C03-08, C15-17, C19-41, C44-76'

But is there a simple way to expand list_of_codes to the list of all the codes?

For example 'C01, C10.0, C10.2-3, C10.8-9, C11-12' should be expanded to:

C01.0 C10.0 C11.0 C12.0
C01.1   C10.2   C11.1   C12.1
C01.2   C10.3   C11.2   C12.2
C01.3   C10.8   C11.3   C12.3
C01.4   C10.9   C11.4   C12.4
C01.5           C11.5   C12.5
C01.6           C11.6   C12.6
C01.7           C11.7   C12.7
C01.8           C11.8   C12.8
C01.9           C11.9   C12.9

Columnar lay-out only for easier reading!

Many thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Try this:

data have;

     infile cards dlm=",'";

     input var :$20. @@;

     if not missing(var);

     cards;

'C00.3-5, C02-04, C05.0, C05.8-9, C06'

'C01, C10.0, C10.2-3, C10.8-9, C11-12'

'C01, C05.1-2, C09, C10.0, C10.2-3, C10.8-9, C11-14'

'C44.0-4, C47.0, C49.0, C00-14, C30-32, C73'

'C44.5, C44.8-9, C47.6-9, C49.6-9'

'C00-C26, C32-C69, C73, C75.4-9, C76-C80'

'C00-50, C60-61, C63-69, C73-74, C76-80'

'C00, C03-08, C15-17, C19-41, C44-76'

'C00, C03-08, C15-17, C19-41, C44-76'

;

data want;

     set have;

     _varc=compress(scan(var,1,'-'),,'ka');

     _var1=compress(scan(var,1,'-'),,'a');

     _var2=compress(scan(var,2,'-'),,'a');

     if  findc(_var1,'.') > 0 then

           do _var=_var1 to catx('.',int(_var1),_var2) by 0.1;

                new_var=cats(_varc,put(_var,z4.1));

                output;

           end;

     else

           do _var=_var1 to catx('.',coalescec(_var2,_var1),9) by 0.1;

                new_var=cats(_varc,put(_var,z4.1));

                output;

           end;

     keep var new_var;

run;

Haikuo

View solution in original post

11 REPLIES 11
jakarman
Barite | Level 11

There is no ready function to do this (simple). The question is whether is easy to build one.

FCMP is meant to define user functions by your own.  The feeling is to use that one.

The input string is defined. The output must also be a string and than with length 4k?  

As the C is more an indicator together with the comma than the next thing are numbers 00 to 99? with a hyphen is should generate an inlcuding loop

The last thing is a number behind the dot. When the dot is missing it should generate a loop 0 to 9  when it is not missing is van be a number or a number with a hyphen (to be looped)      

---->-- ja karman --<-----
LvanderHeijden
Calcite | Level 5

Jaap,

My plan is to create seperate variables for every distinct C code, but I guess one large string variables could also be used. See also the answer to data _null_

RichardinOz
Quartz | Level 8

Here is a solution which will write the original 'code' and its expansion to the log.  It can be adapted to write to a file. Fortuitously it deals with the miscode spotted by Data _Null_.  Next step named in honour.

Data _null_ ;

  input codes $char60. ;

  length code base range result $8 ;

  codes = compress(codes, "'") ;

  Do k = 1 to 99 ;

  code = scan (codes, k, ', ') ;

  if missing(code) then return ;

  range = scan (code, 2, '.') ;

  base = scan (code, 1, '.') ;

  m = input (scan (base, 1, 'C-'), 2.) ;

  n = input (scan (base, 2, 'C-'), 2.) ;

  p = input (scan (range, 1, '-'), 1.) ;

  q = input (scan (range, 2, '-'), 1.) ;

  if missing (range) then

  do ;

  p = 0 ;

  q = 9 ;

  end ;

  put code '> ' ;

  do j = m to max (m, n) ;

  base = cat ('C', put (j, z2.)) ;

  do i = p to max (p, q) ;

  result = catx ('.', base, i) ;

  put result @ ;

  end ;

  end ;

  put  ;

  end ;

  drop codes i j k m n p q base range ;

Datalines ;

'C00.3-5, C02-04, C05.0, C05.8-9, C06'

'C01, C10.0, C10.2-3, C10.8-9, C11-12'

'C01, C05.1-2, C09, C10.0, C10.2-3, C10.8-9, C11-14'

'C44.0-4, C47.0, C49.0, C00-14, C30-32, C73'

'C44.5, C44.8-9, C47.6-9, C49.6-9'

'C00-C26, C32-C69, C73, C75.4-9, C76-C80'

'C00-50, C60-61, C63-69, C73-74, C76-80'

'C00, C03-08, C15-17, C19-41, C44-76'

'C00, C03-08, C15-17, C19-41, C44-76'

run ;

Richard

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, I see I have missed a whole chunk of conversation whilst I was collating the below code, so will post anyways.  This creates a variable with the expanded numerics, so you could use this to then expand, however not sure exactly what you output should look like so have left it at that (admittedly not the nicest bit of code ever).

data have;
  attrib code format=$50.;
  code="C01, C10.0, C10.2-3, C10.8-9, C11-12";
  output;
run;

data want (keep=code loop);
  attrib loop lo hi format=8.1;
  set have;
  i=1;
  test=strip(scan(tranwrd(code,"C",""),i,','));
  do until (test="");
    if index(test,"-")=0 then do; 
      if index(test,".")=0 then test=strip(test)||".0";
      loop=input(test,4.1);
      output;
    end;
    else do; 
      loc=scan(test,1,"-");
      hic=scan(test,2,"-");
      if index(loc,".")=0 and index(hic,".")=0 then do;
        loc=strip(loc)||'.0';
        hic=strip(hic)||'.9';
      end;
      else do;
        hic=scan(loc,1,".")||"."||strip(hic);
      end;
      lo=input(loc,4.1);
      hi=input(hic,4.1);
      do j=lo to hi by 0.1;
        loop=j;
        output;
      end;
    end;
    i=i+1; 
    test=strip(scan(tranwrd(code,"C",""),i,','));
  end;
run;

data_null__
Jade | Level 19

The notation seems inconsistent to me.  You have C76-C80 and in the next row C76-80.  What does it mean? Are they different? Is it a typo?

'C00-C26, C32-C69, C73, C75.4-9, C76-C80'

'C00-50, C60-61, C63-69, C73-74, C76-80'

LvanderHeijden
Calcite | Level 5

Thank you for your exact reading!

Unfortunately, these are actual codes from an excel file which i imported to SAS. They where compiled by hand. The total file gives about 500 of these lists, I only showed a few of the longer ones. There are also lists containing of 1 C-code.

The C codes are from the International Classification of Diseases, revision 10 for Oncology, and are given to group tumours. I had hoped to use the excel file to make an identical grouping in SAS. There also some other variables involved which I didn't show here.

Tom
Super User Tom
Super User

You would probably need to do the following.

1) Have a complete list of the valid codes.

2) Use the strings you currently have to generate a WHERE condition to select records from the master list.

So C00-C26, C32-C69, C73, C75.4-9, C76-C80

Would become something like

WHERE CODE BETWEEN 'C00' and 'C26'

   or CODE BETWEEN 'C32' and 'C69'

   or CODE = 'C73'

   or CODE BETWEEN 'C75.4' and 'C75.9'

   or CODE BETWEEN 'C76' and 'C80'

;

A few notes.

You will probably need to clean up the lists to make them consistently have the full base values for ranges (C75.4-C75.49) or the logic for converting will be much more complex.

You might need to consider whether requests for 'C73' is only for an exact match to 'C73' or would it also include any sub-codes that start with 'C73'.  This also applies to the ranges. So the way I have written the WHERE condition above a value like 'C00.1' would be included in the first range, but 'C26.1' would not.  If you want it to include subcodes of C26 then you might make the upper bound for the range to 'C26~'.

jakarman
Barite | Level 11

Louis, If you want being helped with coding issues, that there is a need to be clear in expectations on what it should do. (soft skills) We are all nerds and need to understand each other.

That dumb computer (hope it keeps dumb) is doing exactly it is told to do, that is not the same as in human interactions what is intended.
I found http://www.who.int/classifications/icd/ICD10Volume2_en_2010.pdf?ua=1 An it looks my assumptions are correct (2 digits and 0-9 decimal)
New is that it is a subset of a more complete coding convention.  There can be a dagger and asterisk *.

The coding of data _null_ is another variation to take care. IMO it is a better one the standardize on.  

New is that you plan to generate variable lists from that that is nice and not nice.
- nice as the numbering approach is almost like SAS syntax defining variables.
- not nice as the decimal point is not part of this SAS syntax. IS the choic of using full 3 digits acceptable (assuming implied dot)?

SAS(R) 9.4 Language Reference: Concepts, Third Edition

Depending on your plans using separate values it could go into some other technical approaches. 

---->-- ja karman --<-----
Ksharp
Super User

What you should be expanded to :

'C44.0-4, C47.0, C49.0, C00-14, C30-32, C73'



'C00, C03-08, C15-17, C19-41, C44-76'

Haikuo
Onyx | Level 15

Try this:

data have;

     infile cards dlm=",'";

     input var :$20. @@;

     if not missing(var);

     cards;

'C00.3-5, C02-04, C05.0, C05.8-9, C06'

'C01, C10.0, C10.2-3, C10.8-9, C11-12'

'C01, C05.1-2, C09, C10.0, C10.2-3, C10.8-9, C11-14'

'C44.0-4, C47.0, C49.0, C00-14, C30-32, C73'

'C44.5, C44.8-9, C47.6-9, C49.6-9'

'C00-C26, C32-C69, C73, C75.4-9, C76-C80'

'C00-50, C60-61, C63-69, C73-74, C76-80'

'C00, C03-08, C15-17, C19-41, C44-76'

'C00, C03-08, C15-17, C19-41, C44-76'

;

data want;

     set have;

     _varc=compress(scan(var,1,'-'),,'ka');

     _var1=compress(scan(var,1,'-'),,'a');

     _var2=compress(scan(var,2,'-'),,'a');

     if  findc(_var1,'.') > 0 then

           do _var=_var1 to catx('.',int(_var1),_var2) by 0.1;

                new_var=cats(_varc,put(_var,z4.1));

                output;

           end;

     else

           do _var=_var1 to catx('.',coalescec(_var2,_var1),9) by 0.1;

                new_var=cats(_varc,put(_var,z4.1));

                output;

           end;

     keep var new_var;

run;

Haikuo

LvanderHeijden
Calcite | Level 5

There are many active programmers in this community, I thank everyone of them for their time and effort!

As usual, more than one solution exists. RichardinOz's solution I can verify is working. RDW9's has a problem with the first code he's reading, but I sure that could be fixed.
Tom's comment on checking the existence of the generated code's I will surely take to heart.
But I'm really impressed with Haikuo's compact and elegant use of the SAS functions COMPRESS, CATS and CATX. He also circumvents the typo problem noted by Data _null_

A big thank you all!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 978 views
  • 4 likes
  • 8 in conversation