topic Data stored in one cell delimiting issues in SAS Programming

Data stored in one cell delimiting issues

mly — Mon, 30 Dec 2024 20:41:29 GMT

anyone familiar with data all stored in one cell with delimiters and successfully breaking them down into their own cell?

For example, we have all drugs that the patient started on in one cell in this format:

If there is a blank within the repeating block, no data will be written into the repeating block like below:

my issue is creating a repeating block that will split the cells that have missing information like above;

%macro repeating_block (input=have

var=Drug,

output=want) ;

proc sql noprint;

select max(countw(Drug_ALL,'|')) into : max from have;

quit;

data want;

set have;

array drug_{&max} $ 100;

array iteration{&max}8 _temporary_ ;

do i=1 to &max;

drug_clean=scan(drug_all, i, '|');

if drug_clean ne '' then do;

drug=scan(drug_clean, 1, '[');

iter_num=input(scan(scan(Drug_clean,2,'['), 1, ']'), 8.);

if iter_num>0 and iter_num<=&max then do;

Deug_[iter_num]=trim (drug);

iteration[iter_num] = iter_num;

if drug_clean='' then drug_[iter_num]="";

end;

drop iter_num drug_clean i;

run;

%mend repeating_block;

%repeating_block;

Re: Data stored in one cell delimiting issues

Tom — Mon, 30 Dec 2024 23:38:46 GMT

I don't really follow the language you are using. CELL is something I would use when referring to part of Excel Worksheet, not anything to do with actual datasets or variables.

What result do want from these two observations you shared?

Isoniazid[1] | Rifampin[2] | Pyrazinamide[3] | Ethambutol[4] | Streptomycin[5] | Rifapentine[6] | Ethionamide[7] | Amikacin[8] | Kanamycin[9] | Streptomycin[10]
Isoniazid[1] | Pyrazinamide[3] | Ethambutol[4] | Streptomycin[5] | Ethionamide[7] | Amikacin[8] | Kanamycin[9] | Streptomycin[10]

To me they both appear to be lists of pairs of values delimited with | character. Where the pairs of values consists of a string followed by a number in square brackets.

data have;
  infile datalines truncover;
  row+1;
  input drug_all $200.;
datalines;
Isoniazid[1] | Rifampin[2] | Pyrazinamide[3] | Ethambutol[4] | Streptomycin[5] | Rifapentine[6] | Ethionamide[7] | Amikacin[8] | Kanamycin[9] | Streptomycin[10]
Isoniazid[1] | Pyrazinamide[3] | Ethambutol[4] | Streptomycin[5] | Ethionamide[7] | Amikacin[8] | Kanamycin[9] | Streptomycin[10]
;

data want;
   set have;
   length pair $100 drug $100 position 8 ;
   do index=1 to countw(drug_all,'|');
      pair = left(scan(drug_all,index,'|'));
      drug = scan(pair,1,'[');
      position = input(scan(pair,-1,'[ ]'),32.);
      output;
   end;
   drop drug_all pair;
run;

Results

If for some reason you need to convert it into something less useful for analysis but easier to dump into spreadsheet then perhaps you could use PROC TRANSPOSE?

proc transpose data=want out=wide(drop=_name_) prefix=drug;
  by row;
  id position;
  var drug;
run;

Re: Data stored in one cell delimiting issues

Cynthia_sas — Tue, 31 Dec 2024 01:23:45 GMT

Hi:

I'm not sure why this is in a Macro program and I don't have a clear picture of your input data and your desired output. However, it seems to me that you have named the array as "DRUG_" but then later in the program, you refer to the array item as "DEUG_".

What do you see in the SAS log after you run this code? Can you post a better example of what you expect your output to look like. It seems to me that all you really want to do is put each drug into a numbered variable, where the number for any drug corresponds to the number in [] as in. But you only show one "cell" of data. Are the numbers in brackets always in order, as you show? Or, could the numbers in the brackets be out of order like this:

"Isoniazid[1] | Rifampin[2] | Streptomycin[4] | Montelukast[3]"

Is it possible for something like this to happen where there are only 10 drugs listed but the numbers in the brackets are more than 10? This could cause issues if your array is defined only to have 10 members, but your program tries to write to a member greater than 10:

I don't really understand the data and the desired results well enough to comment in more detail. My first suggestion is to fix the typo in the code. As always with Macro programs, my first suggestion is to always start with a working program that is NOT in a macro definition and make sure your program logic works correctly first before macro-izing your program.

Cynthia