Solved: Re: Select specific(varying size) columns from a pipe delimited text f...

Odyssey2001 · Posted 02-24-2019 08:44 AM

Hi,

SAS version: 9.4

I would like to know whether it is possible to select specific columns from a pipe delimited text file and insert them into a SAS dataset. I have a huge file (several gigabytes) with 100 columns (= variables from SAS point of view), but I need to read only specific columns from that file. For example: the 10th, 23rd, 42th, 51th and 75th columns from the file.

I believe that similar questions have already been asked but the answers I found while I was googling, were mostly for column aligned data with fixed lengths and as a result fixed start and end positions for each column, that is, using the @ operator with appropriate column number as the start reading position, followed by appropriate offset numbers from that point, to continue reading by skiping the unwanted columns.

But my problem is different in a sense that there is no assumption on a fixed equal length per column (apart from the max possible length) and it is the delimiter "|" which separates the data and it is the sole criterion that allows to distinguish columns one from another.

Therefore, I would like to ask whether it is possible to do this and select specific (varying size) columns from a pipe delimited text file and insert them into SAS.

Thanks in advance,

data_null__ · Posted 02-24-2019 10:54 AM

@Astounding wrote:
Yes you can but you have to program some of it. For example:

data want;
infile source dad dlm='!';
length dummy $1;
do k=1 to 9;
input dummy @;
end;
Input col10 @;
do k=11 to 22;
input dummy @;
end;
input col23 @;
do k=24 to 41;

... etc.

The input statement has features that make "skipping" much easier than you suggest.

data skip3;
   infile cards dsd dlm='|';
   length dummy $1.;
   input var1:$8. 3*dummy var3;
   cards;
Alfred|M|14|69|112.5
Alice|F|13|56.5|84
Barbara|F|13|65.3|98
Carol|F|14|62.8|102.5
Henry|M|14|63.5|102.5
James|M|12|57.3|83
Jane|F|12|59.8|84.5
Janet|F|15|62.5|112.5
Jeffrey|M|13|62.5|84
John|M|12|59|99.5
Joyce|F|11|51.3|50.5
;;;;
   run;
proc print;
   run;

View solution in original post

VDD · Posted 02-24-2019 08:57 AM

While google is a very nice tool, try using the search option in the communities.

https://communities.sas.com/t5/ODS-and-Base-Reporting/Read-Pipe-Delimited-File/m-p/11789#M3090

https://communities.sas.com/t5/SAS-Procedures/Splitting-a-delimited-column-into-multiple-columns/m-p...

SASJedi · Posted 02-24-2019 09:46 AM

When reading delimited files you have to read in all of the columns (up to the last one you want to keep) - there's no way of finding the correct data otherwise. But you don't have to keep all of the columns - just the ones you want:

/* This is just to create a suitable text file for testing */
proc sql noprint;
select name into :collist separated by ' '
   from dictionary.columns
   where libname='SASHELP' and memname='LEUTRAIN'
     and name <='x1500';
;
quit; 

data _null_;
   set sashelp.leutrain (obs=10 keep=&collist);
   file "test.pipe" dsd dlm='|';
   put &colprint;
run;

/* Here's the read program code: 
   Read in the text file, keep only columns 10, 23, 42, 51 and 75.*/
data want;
   keep v10 v23 v42 v51 v75;
   array  v[75] $ 32767;
   array  l[75] _temporary_;
   length thisvar $ 32767;
   infile "test.pipe" dsd dlm='|' end=last;
   input @;
   do i=1 to 75;
      input thisvar @;
      v[i]=thisvar;
      /* Keep track of longest length for each variable */
      l[i]=max(l[i],length(v[i]));
   end;
   if last then do i=1 to 75;
      call symputx(cats('v',i),l[i]);
   end;
run;

/* Now, the columns are WAY too large for the data 
   let's fix that:*/
proc sql;
alter table want 
   modify v10 char(&v10) 
   modify v23 char(&v23) 
   modify v42 char(&v42) 
   modify v51 char(&v51) 
   modify v75 char(&v75) 
;
quit;

Check out my Jedi SAS Tricks for SAS Users

Astounding · Posted 02-24-2019 09:50 AM

Yes you can but you have to program some of it. For example:

data want;
infile source dsd dlm='|';
length dummy $1;
do k=1 to 9;
input dummy @;
end;
input col10 @;
do k=11 to 22;
input dummy @;
end;
input col23 @;
do k=24 to 41;

... etc.

data_null__ · Posted 02-24-2019 10:54 AM

@Astounding wrote:
Yes you can but you have to program some of it. For example:

data want;
infile source dad dlm='!';
length dummy $1;
do k=1 to 9;
input dummy @;
end;
Input col10 @;
do k=11 to 22;
input dummy @;
end;
input col23 @;
do k=24 to 41;

... etc.

The input statement has features that make "skipping" much easier than you suggest.

data skip3;
   infile cards dsd dlm='|';
   length dummy $1.;
   input var1:$8. 3*dummy var3;
   cards;
Alfred|M|14|69|112.5
Alice|F|13|56.5|84
Barbara|F|13|65.3|98
Carol|F|14|62.8|102.5
Henry|M|14|63.5|102.5
James|M|12|57.3|83
Jane|F|12|59.8|84.5
Janet|F|15|62.5|112.5
Jeffrey|M|13|62.5|84
John|M|12|59|99.5
Joyce|F|11|51.3|50.5
;;;;
   run;
proc print;
   run;

Odyssey2001 · Posted 03-02-2019 08:13 PM

Hello

I apologize for the delay of replying.

Thank you very much and I thank everyone who participated for their time and their attention to my question.

Regards.

Astounding · Posted 02-24-2019 09:52 AM

Handhelds make typing difficult!

Dsd not dad.

Pipe character not exclamation point.

FreelanceReinh · Posted 02-24-2019 11:53 AM

Hi @Odyssey2001,

Another option is to scan the input record. Let's assume columns 23 and 51 contain character values (to be stored in variables CVAR1, CVAR2 of lengths, say, 11 and 12, respectively) and the other three specified columns contain numeric values (for variables NVAR1-NVAR3). Then your DATA step could look like this:

data want;
length cvar1 $11 cvar2 $12;
infile 'test.pipe';
input;
nvar1=input(scan(_infile_,10,'|'), 32.);
cvar1=      scan(_infile_,23,'|');
nvar2=input(scan(_infile_,42,'|'), 32.);
cvar2=      scan(_infile_,51,'|');
nvar3=input(scan(_infile_,75,'|'), 32.);
run;

(Thanks, @SASJedi, for the sample data.)

If all target variables are of the same type, e.g. numeric, you can use an array:

data want(drop=col i);
array x[*] these are your target varnames;
infile 'test.pipe';
input;
do col=10, 23, 42, 51, 75;
  i=sum(i,1);
  x[i]=input(scan(_infile_,col,'|'), 32.);
end;
run;

@data_null__: Nice trick! Thanks for sharing here and there.

Edit: Removed the unnecessary options from the INFILE statements. Also, note that in case of possibly quoted character values in the raw data you may want to apply the DEQUOTE function, e.g.

cvar1=dequote(scan(_infile_,23,'|'));

Select specific(varying size) columns from a pipe delimited text file & insert them into a dataset.

Re: Select specific(varying size) columns from a pipe delimited text file & insert them into a d

Re: Select specific(varying size) columns from a pipe delimited text file & insert them into a d

Re: Select specific(varying size) columns from a pipe delimited text file & insert them into a d

Re: Select specific(varying size) columns from a pipe delimited text file & insert them into a d

Re: Select specific(varying size) columns from a pipe delimited text file & insert them into a d

Re: Select specific(varying size) columns from a pipe delimited text file & insert them into a d

Re: Select specific(varying size) columns from a pipe delimited text file & insert them into a d

Re: Select specific(varying size) columns from a pipe delimited text file & insert them into a d

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away