BookmarkSubscribeRSS Feed
mly
Calcite | Level 5 mly
Calcite | Level 5

anyone familiar with data all stored in one cell with delimiters and successfully breaking them down into their own cell?

 

For example, we have all drugs that the patient started on in one cell in this format:

 

Isoniazid[1] | Rifampin[2] | Pyrazinamide[3] | Ethambutol[4] | Streptomycin[5] | Rifapentine[6] | Ethionamide[7] | Amikacin[8] | Kanamycin[9] | Streptomycin[10]

 

If there is a blank within the repeating block, no data will be written into the repeating block like below:

 

Isoniazid[1] | Pyrazinamide[3] | Ethambutol[4] | Streptomycin[5] | Ethionamide[7] | Amikacin[8] | Kanamycin[9] | Streptomycin[10]

 

 

my issue is creating a repeating block that will split the cells that have missing information like above;

 

%macro repeating_block (input=have
var=Drug, 
output=want) ; 
 
proc sql noprint;
select max(countw(Drug_ALL,'|')) into : max from have;
quit;
 
data want;
set have;
array drug_{&max} $ 100;
array iteration{&max}8 _temporary_ ;
do i=1 to &max;
drug_clean=scan(drug_all, i, '|'); 
if drug_clean ne '' then do; 
drug=scan(drug_clean, 1, '['); 
iter_num=input(scan(scan(Drug_clean,2,'['), 1, ']'), 8.); 
if iter_num>0 and iter_num<=&max then do;  
Deug_[iter_num]=trim (drug);
iteration[iter_num] = iter_num;
if drug_clean='' then drug_[iter_num]="";
end; 
end; 
end;
 
drop  iter_num drug_clean  i;
run;
%mend repeating_block;  
%repeating_block;
2 REPLIES 2
Tom
Super User Tom
Super User

I don't really follow the language you are using.  CELL is something I would use when referring to part of Excel Worksheet, not anything to do with actual datasets or variables.

 

What result do want from these two observations you shared?

Isoniazid[1] | Rifampin[2] | Pyrazinamide[3] | Ethambutol[4] | Streptomycin[5] | Rifapentine[6] | Ethionamide[7] | Amikacin[8] | Kanamycin[9] | Streptomycin[10]
Isoniazid[1] | Pyrazinamide[3] | Ethambutol[4] | Streptomycin[5] | Ethionamide[7] | Amikacin[8] | Kanamycin[9] | Streptomycin[10]

To me they both appear to be lists of pairs of values delimited with | character.  Where the pairs of values consists of a string followed by a number in square brackets.

data have;
  infile datalines truncover;
  row+1;
  input drug_all $200.;
datalines;
Isoniazid[1] | Rifampin[2] | Pyrazinamide[3] | Ethambutol[4] | Streptomycin[5] | Rifapentine[6] | Ethionamide[7] | Amikacin[8] | Kanamycin[9] | Streptomycin[10]
Isoniazid[1] | Pyrazinamide[3] | Ethambutol[4] | Streptomycin[5] | Ethionamide[7] | Amikacin[8] | Kanamycin[9] | Streptomycin[10]
;

data want;
   set have;
   length pair $100 drug $100 position 8 ;
   do index=1 to countw(drug_all,'|');
      pair = left(scan(drug_all,index,'|'));
      drug = scan(pair,1,'[');
      position = input(scan(pair,-1,'[ ]'),32.);
      output;
   end;
   drop drug_all pair;
run;

Results

Tom_0-1735601589542.png

If for some reason you need to convert it into something less useful for analysis but easier to dump into spreadsheet then perhaps you could use PROC TRANSPOSE?

proc transpose data=want out=wide(drop=_name_) prefix=drug;
  by row;
  id position;
  var drug;
run;

Tom_0-1735601913933.png

 

Cynthia_sas
Diamond | Level 26

Hi:

  I'm not sure why this is in a Macro program and I don't have a clear picture of your input data and your desired output. However, it seems to me that you have named the array as "DRUG_" but then later in the program, you refer to the array item as "DEUG_".

  What do you see in the SAS log after you run this code? Can you post a better example of what you expect your output to look like. It seems to me that all you really want to do is put each drug into a numbered variable, where the number for any drug corresponds to the number in [] as in. But you only show one "cell" of data. Are the numbers in brackets always in order, as you show? Or, could the numbers in the brackets be out of order like this:

"Isoniazid[1] | Rifampin[2] | Streptomycin[4] | Montelukast[3]"

  Is it possible for something like this to happen where there are only 10 drugs listed but the numbers in the brackets are more than 10? This could cause issues if your array is defined only to have 10 members, but your program tries to write to a member greater than 10:

Isoniazid[11] | Rifampin[12] | Pyrazinamide[13] | Ethambutol[14] | Streptomycin[15] | Rifapentine[16] | Ethionamide[17] | Amikacin[18] | Kanamycin[19] | Streptomycin[20]

  I don't really understand the data and the desired results well enough to comment in more detail. My first suggestion is to fix the typo in the code. As always with Macro programs, my first suggestion is to always start with a working program that is NOT in a macro definition and make sure your program logic works correctly first before macro-izing your program.

Cynthia

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 440 views
  • 1 like
  • 3 in conversation