BookmarkSubscribeRSS Feed
Addison
Fluorite | Level 6

I received a raw data from other hospital and they combine individual their  all lab dates, lab names, lab values, and lab units into one column and using "|" as delimiter

 

I am trying combine array and scan to separate them into columns but it's not working for me,

 

here is my coding,

ARRAY LabName(*) Lab_Name1-Lab_Name79;
DO I= 1 TO dim(LabName);
LabName(I)=scan(Lab_Name,I,"|");
END;
DROP I;
RUN;

 

here is the raw data looks like

ID Lab Date Lab Name Lab Value Lab Unit
1 09/19/2008|09/19/2008|09/19/2008|09/19/2008|09/19/2008|09/19/2008|09/19/2008|09/19/2008|09/19/2008|09/19/2008|09/19/2008 Hematocrit|Hemoglobin|MCH Blood|MCHC Blood|MCV Blood|MPV|Platelet Count|RBC|RDW|WBC|Instr WBC 39.1|13.1|25.3|33.5|75.3|8.1|200|5.19|18.0|9.6|9.6 %|g/dL|pg|g/dL|fL|fL|thou/mcL|x10(6)/mcL|%|thou/mcL|
2 10/12/2011|10/12/2011|10/12/2011|10/12/2011|10/12/2011|10/12/2011|10/12/2011|10/12/2011|10/12/2011|10/12/2011|10/12/2011 Hematocrit|Hemoglobin|MCH Blood|MCHC Blood|MCV Blood|MPV|Platelet Count|RBC|RDW|WBC|Instr WBC 36.2|12.4|25.8|34.3|75.0|8.9|208|4.83|18.4|5.1|5.1 %|g/dL|pg|g/dL|fL|fL|thou/mcL|x10(6)/mcL|%|thou/mcL|
3 10/05/2011|10/05/2011|10/05/2011|10/05/2011|10/05/2011|10/05/2011|10/05/2011|10/05/2011|10/05/2011|10/05/2011|10/05/2011|10/05/2011 Cell Morphology|Hematocrit|Hemoglobin|MCH Blood|MCHC Blood|MCV Blood|MPV|Platelet Count|RBC|RDW|WBC|Instr WBC Yes|40.0|13.7|25.4|34.1|74.6|8.7|214|5.37|19.0|5.6|5.6 |%|g/dL|pg|g/dL|fL|fL|thou/mcL|x10(6)/mcL|%|thou/mcL|
4 09/28/2012|09/28/2012|09/28/2012|09/28/2012|09/28/2012|09/28/2012|09/28/2012|09/28/2012|09/28/2012|09/28/2012|09/28/2012|09/28/2012 Cell Morphology|Hematocrit|Hemoglobin|MCH Blood|MCHC Blood|MCV Blood|MPV|Platelet Count|RBC|RDW|WBC|Instr WBC Yes|40.2|13.2|24.6|32.8|74.8|9.2|194|5.38|18.7|8.7|8.7 |%|g/dL|pg|g/dL|fL|fL|thou/mcL|x10(6)/mcL|%|thou/mcL|
5 11/19/2012|11/19/2012|11/19/2012|11/19/2012|11/19/2012|11/19/2012|11/19/2012|11/19/2012|11/19/2012|11/19/2012|11/19/2012|11/19/2012|11/19/2012|11/19/2012 Cell Morphology|Hematocrit|Hemoglobin|MCH Blood|MCHC Blood|MCV Blood|MPV|Platelet Count|RBC|RDW|WBC|Differential Type|Instr WBC|LD Yes|39.1|12.8|23.4|32.7|71.5|8.3|243|5.47|21.1|4.8|Yes|4.8|190 |%|g/dL|pg|g/dL|fL|fL|thou/mcL|x10(6)/mcL|%|thou/mcL|||unit/L
6 04/15/2013|04/15/2013|04/15/2013|04/15/2013|04/15/2013|04/15/2013|04/15/2013|04/15/2013|04/15/2013|04/15/2013|04/15/2013 Hematocrit|Hemoglobin|MCH Blood|MCHC Blood|MCV Blood|MPV|Platelet Count|RBC|RDW|WBC|Instr WBC 38.0|12.5|24.1|33.0|73.0|8.6|173|5.21|19.6|4.9|4.9 %|g/dL|pg|g/dL|fL|fL|thou/mcL|x10(6)/mcL|%|thou/mcL|
7 08/05/2013|08/05/2013|08/05/2013|08/05/2013|08/05/2013|08/05/2013|08/05/2013|08/05/2013|08/05/2013|08/05/2013|08/05/2013 Hematocrit|Hemoglobin|MCH Blood|MCHC Blood|MCV Blood|MPV|Platelet Count|RBC|RDW|WBC|Instr WBC 37.9|12.6|24.4|33.1|73.7|9.3|179|5.15|19.4|3.9|3.9 %|g/dL|pg|g/dL|fL|fL|thou/mcL|x10(6)/mcL|%|thou/mcL|
8 09/30/2013|09/30/2013|09/30/2013|09/30/2013|09/30/2013|09/30/2013|09/30/2013|09/30/2013|09/30/2013|09/30/2013|09/30/2013|09/30/2013 Hematocrit|Hemoglobin|MCH Blood|MCHC Blood|MCV Blood|MPV|Platelet Count|RBC|RDW|WBC|Instr WBC|LD 38.6|12.8|24.6|33.2|74.1|8.6|207|5.21|19.0|4.2|4.2|223 %|g/dL|pg|g/dL|fL|fL|thou/mcL|x10(6)/mcL|%|thou/mcL||unit/L
9 01/21/2014|01/21/2014|01/21/2014|01/21/2014|01/21/2014|01/21/2014|01/21/2014|01/21/2014|01/21/2014|01/21/2014|01/21/2014|01/21/2014 Hematocrit|Hemoglobin|MCH Blood|MCHC Blood|MCV Blood|MPV|Platelet Count|RBC|RDW|WBC|Instr WBC|Nucleated Red Blood Cells 37.9|12.8|24.0|33.9|70.9|8.5|181|5.34|18.7|5.0|5.0|0.2 %|g/dL|pg|g/dL|fL|fL|thou/mcL|x10(6)/mcL|%|thou/mcL||/100 WBC
10 06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014|06/02/2014 Albumin|Alkaline Phosphatase|ALT|Anion Gap|AST|Calcium|Carbon Dioxide|Chloride|Glucose Serum|Potassium|Sodium|Total Bilirubin|Urea Nitrogen|CREATININE|Protein Total|Cell Morphology|Hematocrit|Hemoglobin|MCH Blood|MCHC Blood|MCV Blood|MPV|Platelet Count|RBC|RDW|WBC|Differential Type|Instr WBC|Nucleated Red Blood Cells|Absolute Retic|Percent Retic|Retic RBC 3.8|83|23|6|16|8.8|27|104|117|3.9|137|0.5|12|0.80|7.9|Yes|38.6|13.0|23.8|33.7|70.7|8.2|201|5.47|19.4|4.1|Yes|4.1|0.1|109.4|2.0|5.47 g/dL|unit/L|unit/L||unit/L|mg/dL|mmol/L|mmol/L|mg/dL|mmol/L|mmol/L|mg/dL|mg/dL|mg/dL|g/dL||%|g/dL|pg|g/dL|fL|fL|thou/mcL|x10(6)/mcL|%|thou/mcL|||/100 WBC|thou/mcL|%|x10(6)/mcL
14 REPLIES 14
andreas_lds
Jade | Level 19

Do you need to split LabDate and LabValue, too?

 

If yes, then creating one observation for each value in the lists is recommended. This leads to cleaner data-structure, which is often easier to use, than a datasets with 100+ variables.

 

data want;
   set have(
      rename=(
         LabDate = DateList
         LabName = NameList
         LabValue = ValueList
      )
   );

   length LabDate 8 LabName $ 50 LabValue 8;
   format LabDate mmddyyd10.;

   do i = 1 to countw(NameList, '|');
      LabDate = input(scan(DateList, i, '|'), mmddyy10.);
      LabName = scan(NameList, i, '|');
      LabValue = input(scan(ValueList, i, '|'), best32.);
      output;
   end;

   keep Id Lab:;
run;

 

Tom
Super User Tom
Super User

You cannot use the same name to mean two different things. 

Also make sure to define the new variables type (and length).

data want;
  set have;
  array _name $30 Lab_Name1-Lab_Name79;
  array _date 8 Lab_Date1-Lab_Date79;
  format Lab_Date1-Lab_Date79 yymmdd10.;
  array _value $30 Lab_Value1-Lab_Value79;
  do I= 1 do dim(_name);
    _name[i]=scan(Lab_Name,I,"|");
    _date[i]=input(scan(LabDate,i,'|'),mmddyy10.);
    _value[i]= scan(LabValue,i,'|');
  end;
  drop i LabName labDate labValue ;
run;
Reeza
Super User
Is the data originally in a text file or in a SAS data set? If the original file is a text file and this is the full structure you need to change how you import the data, not fix it after the fact.

If it is a SAS data set, I'm assuming this is one variable among others that are already read in correctly?
Addison
Fluorite | Level 6

The original raw data is .csv file. 

I use proc import to bring data to SAS.

Reeza
Super User
It's not CSV if the delimiter is a pipe which is why it wasn't read in correctly. Unless this is only one field among many others that were read in correctly.You need to use PROC IMPORT with the delimiter option or write your own code.


proc import out=want datafile='path to file' dbms=dlm replace;
delimiter='|';
guessingrows=max;*will be a bit slower;
run;
Addison
Fluorite | Level 6

Our affiliate IT team save those data files as csv format and using "|" as delimiter upload to our data warehouse.

 

If I directly import those .csv files to SAS, SAS will not read them correctly.

SAS will read those data files correctly only if I convert them to .xlsx format.

 

Reeza
Super User
Because they are not CSV. CSV means comma separated values, but your file doesn't have comma's, it has pipes as the delimiter.

Did you try the code I suggested? Or when importing via GUI, select as a text file and specify the delimiter. Excel is already converting the delimiter for you which is why SAS reads that correctly. But your file is not CSV and treating it as such is what's causing this issue.
Tom
Super User Tom
Super User

Actually the way the text posted in the original message looks it appears to be a TAB delimited file with three long character fields that have individual values delimited by pipe character.  But that might have been caused by copying a pasting from an Excel file instead copying lines from the original file.

 

If the long fields always have  the same number of pipe characters (instead of only including enough for the number of actual values) then you might be able to read the original file using two characters as the delimiters.

 

If you want help reading the original file then post the first few lines of the actual file (not the file as converted by being opened by Excel).  Open the file in a text editor (the SAS program editor should work) and copy a few lines. Then use the Insert Code icon in for forum editor to get a pop-up text box where you can paste the lines.  That way the forum editor won't try to convert the lines into paragraphs.

Addison
Fluorite | Level 6

I use notepad to copy couple raw data and attached here.

 

Thank you for your help!!

I am looking forward a good solution to import these raw data files. I will receive them monthly and I am tired of keeping covert them to excel file and import them.

ChrisNZ
Tourmaline | Level 20

That's a weird file you have here.

You can read the whole original file in one go:

data _null_;
  file "%sysfunc(pathname(WORK))\t.txt" lrecl=500;
  put '12312231';
  put '68879';
  put '09/19/2008|09/19/2008|09/19/2008|09/19/2008|09/19/2008|09/19/2008|09/19/2008|09/19/2008|09/19/2008|09/19/2008|09/18/2008' 
      '09'x
      'Hematocrit|Hemoglobin|MCH Blood|MCHC Blood|MCV Blood|MPV|Platelet Count|RBC|RDW|WBC|Instr WBC'
      '09'x
      '39.1|13.1|25.3|33.5|75.3|8.1|200|5.19|18.0|9.6|9.6'
      '09'x
      '%|g/dL|pg|g/dL|fL|fL|thou/mcL|x10(6)/mcL|%|thou/mcL|';
run;

data WANT;
  infile "%sysfunc(pathname(WORK))\t.txt" lrecl=500 pad dlm='|' _infile_=BUFF;
  input @ ;
  _infile_ =translate(BUFF,'|','09'x);    
  if char(BUFF,3)='/' then  
    input @1 (DATE1-DATE11) (:mmddyy10.) (M1-M11) (:$25.) (R1-R11) (:best.) (V1-V11) (:$20.);    
  else input @1 NUM best.; 
  format DATE1-DATE11 date9.;
run;

Not too sure the structure is useful though:

 

NUM DATE1 DATE2 M1 M2 R1 R2 V1 V2
12312231 . .     . .    
68879 . .     . .    
. 19SEP2008 19SEP2008 Hematocrit Hemoglobin 39.1 13.1 % g/dL

 

 

 

 

Patrick
Opal | Level 21

@Addison 

The attached sample data really helps to understand what you're dealing with. 

Below sample code reads your source data into a long format which is often an easier structure to deal with.

%let source_file=c:\temp\test file.txt;
data long(drop=_:);
  attrib 
    id          length=$15    informat=$15.
    fin         length=$15    informat=$15. 
    _lab_dates  length=$400   informat=$400.
    _lab_names  length=$400   informat=$400.
    _lab_values length=$400   informat=$400.
    _lab_units  length=$400   informat=$400.
    lab_date    length=8      informat=mmddyy10. format=date9.
    lab_name    length=$40
/*    lab_value   length=8*/
    lab_value   length=$40
    lab_unit    length=$40
    ;

  infile "&source_file" dlm='09'x dsd truncover firstobs=2 lrecl=1650;
  input id fin _lab_dates _lab_names _lab_values _lab_units;

  /* start: if rows without lab data not required then remove below code */ 
  if cmiss(_lab_dates, _lab_names, _lab_values, _lab_units)=4 then output;
  else
  /* end: of remove section */
  do;
    _loop_cnt=1+max(countc(_lab_dates,'|'), countc(_lab_names,'|'), countc(_lab_values,'|'), countc(_lab_units,'|'));
    do _i=1 to _loop_cnt;
      lab_date=input(scan(_lab_dates,_i,'|','M'),mmddyy10.);
      lab_name=scan(_lab_names,_i,'|','M');
/*      lab_value=input(scan(_lab_values,_i,'|'),best32.);*/
      lab_value=scan(_lab_values,_i,'|','M');
      lab_unit=scan(_lab_units,_i,'|','M');
      output;
    end;
  end;

run;

I wasn't sure if you also need the rows without Lab data. If not then remove the section as per comment in the code.

 

Capture.JPG

 

Changes - new code version

1. change type of lab_value to character because there are "yes" strings in the source data (is this a DQ issue?)

2. add 'M' modifier to scan() function to read data correctly in case of missing data elements

3. add '1' to loop counter to read all the pipe delimited data elements

Addison
Fluorite | Level 6
I try the code you suggest, looks like SAS don't think it's .CSV file so did not read record correctly.
proc import out=want datafile='Z:\lab_06032019.csv'
dbms=dlm replace;
delimiter='|';
guessingrows=max;
run;
Reeza
Super User
I don't know what htat means. Can you please explain? Did the code not work? What does the log say?
Reeza
Super User
Ok, first read it in as a tab delimited file.And then you'll need to parse it out in SAS.

proc import out=want datafile='path to file' dbms=tab replace;run;

Do you want the measures to be in their own columns (wide file) or output to their own rows for a long file?

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 3722 views
  • 5 likes
  • 6 in conversation