reading data, expanding to seperate variables

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

reading data, expanding to seperate variables


Hi,

I appreciate any help/suggestions

I have a character variable list_of_codes ($50) which looks like this.
It is a list of codes in a shorthand notation, easy enough to read for a human

'C00.3-5, C02-04, C05.0, C05.8-9, C06'
'C01, C10.0, C10.2-3, C10.8-9, C11-12'
'C01, C05.1-2, C09, C10.0, C10.2-3, C10.8-9, C11-14'
'C44.0-4, C47.0, C49.0, C00-14, C30-32, C73'
'C44.5, C44.8-9, C47.6-9, C49.6-9'
'C00-C26, C32-C69, C73, C75.4-9, C76-C80'
'C00-50, C60-61, C63-69, C73-74, C76-80'
'C00, C03-08, C15-17, C19-41, C44-76'
'C00, C03-08, C15-17, C19-41, C44-76'

But is there a simple way to expand list_of_codes to the list of all the codes?

For example 'C01, C10.0, C10.2-3, C10.8-9, C11-12' should be expanded to:

C01.0 C10.0 C11.0 C12.0
C01.1   C10.2   C11.1   C12.1
C01.2   C10.3   C11.2   C12.2
C01.3   C10.8   C11.3   C12.3
C01.4   C10.9   C11.4   C12.4
C01.5           C11.5   C12.5
C01.6           C11.6   C12.6
C01.7           C11.7   C12.7
C01.8           C11.8   C12.8
C01.9           C11.9   C12.9

Columnar lay-out only for easier reading!

Many thanks!


Accepted Solutions
Solution
‎08-26-2014 01:18 PM
Respected Advisor
Posts: 3,156

Re: reading data, expanding to seperate variables

Posted in reply to LouisvanderHeijden

Try this:

data have;

     infile cards dlm=",'";

     input var :$20. @@;

     if not missing(var);

     cards;

'C00.3-5, C02-04, C05.0, C05.8-9, C06'

'C01, C10.0, C10.2-3, C10.8-9, C11-12'

'C01, C05.1-2, C09, C10.0, C10.2-3, C10.8-9, C11-14'

'C44.0-4, C47.0, C49.0, C00-14, C30-32, C73'

'C44.5, C44.8-9, C47.6-9, C49.6-9'

'C00-C26, C32-C69, C73, C75.4-9, C76-C80'

'C00-50, C60-61, C63-69, C73-74, C76-80'

'C00, C03-08, C15-17, C19-41, C44-76'

'C00, C03-08, C15-17, C19-41, C44-76'

;

data want;

     set have;

     _varc=compress(scan(var,1,'-'),,'ka');

     _var1=compress(scan(var,1,'-'),,'a');

     _var2=compress(scan(var,2,'-'),,'a');

     if  findc(_var1,'.') > 0 then

           do _var=_var1 to catx('.',int(_var1),_var2) by 0.1;

                new_var=cats(_varc,put(_var,z4.1));

                output;

           end;

     else

           do _var=_var1 to catx('.',coalescec(_var2,_var1),9) by 0.1;

                new_var=cats(_varc,put(_var,z4.1));

                output;

           end;

     keep var new_var;

run;

Haikuo

View solution in original post


All Replies
Trusted Advisor
Posts: 3,212

Re: reading data, expanding to seperate variables

Posted in reply to LouisvanderHeijden

There is no ready function to do this (simple). The question is whether is easy to build one.

FCMP is meant to define user functions by your own.  The feeling is to use that one.

The input string is defined. The output must also be a string and than with length 4k?  

As the C is more an indicator together with the comma than the next thing are numbers 00 to 99? with a hyphen is should generate an inlcuding loop

The last thing is a number behind the dot. When the dot is missing it should generate a loop 0 to 9  when it is not missing is van be a number or a number with a hyphen (to be looped)      

---->-- ja karman --<-----
New Contributor
Posts: 4

Re: reading data, expanding to seperate variables

Jaap,

My plan is to create seperate variables for every distinct C code, but I guess one large string variables could also be used. See also the answer to data _null_

Super Contributor
Posts: 644

Re: reading data, expanding to seperate variables

Posted in reply to LouisvanderHeijden

Here is a solution which will write the original 'code' and its expansion to the log.  It can be adapted to write to a file. Fortuitously it deals with the miscode spotted by Data _Null_.  Next step named in honour.

Data _null_ ;

  input codes $char60. ;

  length code base range result $8 ;

  codes = compress(codes, "'") ;

  Do k = 1 to 99 ;

  code = scan (codes, k, ', ') ;

  if missing(code) then return ;

  range = scan (code, 2, '.') ;

  base = scan (code, 1, '.') ;

  m = input (scan (base, 1, 'C-'), 2.) ;

  n = input (scan (base, 2, 'C-'), 2.) ;

  p = input (scan (range, 1, '-'), 1.) ;

  q = input (scan (range, 2, '-'), 1.) ;

  if missing (range) then

  do ;

  p = 0 ;

  q = 9 ;

  end ;

  put code '> ' ;

  do j = m to max (m, n) ;

  base = cat ('C', put (j, z2.)) ;

  do i = p to max (p, q) ;

  result = catx ('.', base, i) ;

  put result @ ;

  end ;

  end ;

  put  ;

  end ;

  drop codes i j k m n p q base range ;

Datalines ;

'C00.3-5, C02-04, C05.0, C05.8-9, C06'

'C01, C10.0, C10.2-3, C10.8-9, C11-12'

'C01, C05.1-2, C09, C10.0, C10.2-3, C10.8-9, C11-14'

'C44.0-4, C47.0, C49.0, C00-14, C30-32, C73'

'C44.5, C44.8-9, C47.6-9, C49.6-9'

'C00-C26, C32-C69, C73, C75.4-9, C76-C80'

'C00-50, C60-61, C63-69, C73-74, C76-80'

'C00, C03-08, C15-17, C19-41, C44-76'

'C00, C03-08, C15-17, C19-41, C44-76'

run ;

Richard

Super User
Super User
Posts: 7,942

Re: reading data, expanding to seperate variables

Posted in reply to LouisvanderHeijden

Well, I see I have missed a whole chunk of conversation whilst I was collating the below code, so will post anyways.  This creates a variable with the expanded numerics, so you could use this to then expand, however not sure exactly what you output should look like so have left it at that (admittedly not the nicest bit of code ever).

data have;
  attrib code format=$50.;
  code="C01, C10.0, C10.2-3, C10.8-9, C11-12";
  output;
run;

data want (keep=code loop);
  attrib loop lo hi format=8.1;
  set have;
  i=1;
  test=strip(scan(tranwrd(code,"C",""),i,','));
  do until (test="");
    if index(test,"-")=0 then do; 
      if index(test,".")=0 then test=strip(test)||".0";
      loop=input(test,4.1);
      output;
    end;
    else do; 
      loc=scan(test,1,"-");
      hic=scan(test,2,"-");
      if index(loc,".")=0 and index(hic,".")=0 then do;
        loc=strip(loc)||'.0';
        hic=strip(hic)||'.9';
      end;
      else do;
        hic=scan(loc,1,".")||"."||strip(hic);
      end;
      lo=input(loc,4.1);
      hi=input(hic,4.1);
      do j=lo to hi by 0.1;
        loop=j;
        output;
      end;
    end;
    i=i+1; 
    test=strip(scan(tranwrd(code,"C",""),i,','));
  end;
run;

Respected Advisor
Posts: 3,799

Re: reading data, expanding to seperate variables

Posted in reply to LouisvanderHeijden

The notation seems inconsistent to me.  You have C76-C80 and in the next row C76-80.  What does it mean? Are they different? Is it a typo?

'C00-C26, C32-C69, C73, C75.4-9, C76-C80'

'C00-50, C60-61, C63-69, C73-74, C76-80'

New Contributor
Posts: 4

Re: reading data, expanding to seperate variables

Posted in reply to data_null__

Thank you for your exact reading!

Unfortunately, these are actual codes from an excel file which i imported to SAS. They where compiled by hand. The total file gives about 500 of these lists, I only showed a few of the longer ones. There are also lists containing of 1 C-code.

The C codes are from the International Classification of Diseases, revision 10 for Oncology, and are given to group tumours. I had hoped to use the excel file to make an identical grouping in SAS. There also some other variables involved which I didn't show here.

Super User
Super User
Posts: 7,039

Re: reading data, expanding to seperate variables

Posted in reply to LouisvanderHeijden

You would probably need to do the following.

1) Have a complete list of the valid codes.

2) Use the strings you currently have to generate a WHERE condition to select records from the master list.

So C00-C26, C32-C69, C73, C75.4-9, C76-C80

Would become something like

WHERE CODE BETWEEN 'C00' and 'C26'

   or CODE BETWEEN 'C32' and 'C69'

   or CODE = 'C73'

   or CODE BETWEEN 'C75.4' and 'C75.9'

   or CODE BETWEEN 'C76' and 'C80'

;

A few notes.

You will probably need to clean up the lists to make them consistently have the full base values for ranges (C75.4-C75.49) or the logic for converting will be much more complex.

You might need to consider whether requests for 'C73' is only for an exact match to 'C73' or would it also include any sub-codes that start with 'C73'.  This also applies to the ranges. So the way I have written the WHERE condition above a value like 'C00.1' would be included in the first range, but 'C26.1' would not.  If you want it to include subcodes of C26 then you might make the upper bound for the range to 'C26~'.

Trusted Advisor
Posts: 3,212

Re: reading data, expanding to seperate variables

Posted in reply to LouisvanderHeijden

Louis, If you want being helped with coding issues, that there is a need to be clear in expectations on what it should do. (soft skills) We are all nerds and need to understand each other.

That dumb computer (hope it keeps dumb) is doing exactly it is told to do, that is not the same as in human interactions what is intended.
I found http://www.who.int/classifications/icd/ICD10Volume2_en_2010.pdf?ua=1 An it looks my assumptions are correct (2 digits and 0-9 decimal)
New is that it is a subset of a more complete coding convention.  There can be a dagger and asterisk *.

The coding of data _null_ is another variation to take care. IMO it is a better one the standardize on.  

New is that you plan to generate variable lists from that that is nice and not nice.
- nice as the numbering approach is almost like SAS syntax defining variables.
- not nice as the decimal point is not part of this SAS syntax. IS the choic of using full 3 digits acceptable (assuming implied dot)?

SAS(R) 9.4 Language Reference: Concepts, Third Edition

Depending on your plans using separate values it could go into some other technical approaches. 

---->-- ja karman --<-----
Super User
Posts: 10,023

Re: reading data, expanding to seperate variables

Posted in reply to LouisvanderHeijden

What you should be expanded to :

'C44.0-4, C47.0, C49.0, C00-14, C30-32, C73'



'C00, C03-08, C15-17, C19-41, C44-76'

Solution
‎08-26-2014 01:18 PM
Respected Advisor
Posts: 3,156

Re: reading data, expanding to seperate variables

Posted in reply to LouisvanderHeijden

Try this:

data have;

     infile cards dlm=",'";

     input var :$20. @@;

     if not missing(var);

     cards;

'C00.3-5, C02-04, C05.0, C05.8-9, C06'

'C01, C10.0, C10.2-3, C10.8-9, C11-12'

'C01, C05.1-2, C09, C10.0, C10.2-3, C10.8-9, C11-14'

'C44.0-4, C47.0, C49.0, C00-14, C30-32, C73'

'C44.5, C44.8-9, C47.6-9, C49.6-9'

'C00-C26, C32-C69, C73, C75.4-9, C76-C80'

'C00-50, C60-61, C63-69, C73-74, C76-80'

'C00, C03-08, C15-17, C19-41, C44-76'

'C00, C03-08, C15-17, C19-41, C44-76'

;

data want;

     set have;

     _varc=compress(scan(var,1,'-'),,'ka');

     _var1=compress(scan(var,1,'-'),,'a');

     _var2=compress(scan(var,2,'-'),,'a');

     if  findc(_var1,'.') > 0 then

           do _var=_var1 to catx('.',int(_var1),_var2) by 0.1;

                new_var=cats(_varc,put(_var,z4.1));

                output;

           end;

     else

           do _var=_var1 to catx('.',coalescec(_var2,_var1),9) by 0.1;

                new_var=cats(_varc,put(_var,z4.1));

                output;

           end;

     keep var new_var;

run;

Haikuo

New Contributor
Posts: 4

Re: reading data, expanding to seperate variables

There are many active programmers in this community, I thank everyone of them for their time and effort!

As usual, more than one solution exists. RichardinOz's solution I can verify is working. RDW9's has a problem with the first code he's reading, but I sure that could be fixed.
Tom's comment on checking the existence of the generated code's I will surely take to heart.
But I'm really impressed with Haikuo's compact and elegant use of the SAS functions COMPRESS, CATS and CATX. He also circumvents the typo problem noted by Data _null_

A big thank you all!

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 349 views
  • 4 likes
  • 8 in conversation