Hi,
SAS version: 9.4
I would like to know whether it is possible to select specific columns from a pipe delimited text file and insert them into a SAS dataset. I have a huge file (several gigabytes) with 100 columns (= variables from SAS point of view), but I need to read only specific columns from that file. For example: the 10th, 23rd, 42th, 51th and 75th columns from the file.
I believe that similar questions have already been asked but the answers I found while I was googling, were mostly for column aligned data with fixed lengths and as a result fixed start and end positions for each column, that is, using the @ operator with appropriate column number as the start reading position, followed by appropriate offset numbers from that point, to continue reading by skiping the unwanted columns.
But my problem is different in a sense that there is no assumption on a fixed equal length per column (apart from the max possible length) and it is the delimiter "|" which separates the data and it is the sole criterion that allows to distinguish columns one from another.
Therefore, I would like to ask whether it is possible to do this and select specific (varying size) columns from a pipe delimited text file and insert them into SAS.
Thanks in advance,
@Astounding wrote:
Yes you can but you have to program some of it. For example:
data want;
infile source dad dlm='!';
length dummy $1;
do k=1 to 9;
input dummy @;
end;
Input col10 @;
do k=11 to 22;
input dummy @;
end;
input col23 @;
do k=24 to 41;
... etc.
The input statement has features that make "skipping" much easier than you suggest.
data skip3;
infile cards dsd dlm='|';
length dummy $1.;
input var1:$8. 3*dummy var3;
cards;
Alfred|M|14|69|112.5
Alice|F|13|56.5|84
Barbara|F|13|65.3|98
Carol|F|14|62.8|102.5
Henry|M|14|63.5|102.5
James|M|12|57.3|83
Jane|F|12|59.8|84.5
Janet|F|15|62.5|112.5
Jeffrey|M|13|62.5|84
John|M|12|59|99.5
Joyce|F|11|51.3|50.5
;;;;
run;
proc print;
run;
While google is a very nice tool, try using the search option in the communities.
https://communities.sas.com/t5/ODS-and-Base-Reporting/Read-Pipe-Delimited-File/m-p/11789#M3090
When reading delimited files you have to read in all of the columns (up to the last one you want to keep) - there's no way of finding the correct data otherwise. But you don't have to keep all of the columns - just the ones you want:
/* This is just to create a suitable text file for testing */
proc sql noprint;
select name into :collist separated by ' '
from dictionary.columns
where libname='SASHELP' and memname='LEUTRAIN'
and name <='x1500';
;
quit;
data _null_;
set sashelp.leutrain (obs=10 keep=&collist);
file "test.pipe" dsd dlm='|';
put &colprint;
run;
/* Here's the read program code:
Read in the text file, keep only columns 10, 23, 42, 51 and 75.*/
data want;
keep v10 v23 v42 v51 v75;
array v[75] $ 32767;
array l[75] _temporary_;
length thisvar $ 32767;
infile "test.pipe" dsd dlm='|' end=last;
input @;
do i=1 to 75;
input thisvar @;
v[i]=thisvar;
/* Keep track of longest length for each variable */
l[i]=max(l[i],length(v[i]));
end;
if last then do i=1 to 75;
call symputx(cats('v',i),l[i]);
end;
run;
/* Now, the columns are WAY too large for the data
let's fix that:*/
proc sql;
alter table want
modify v10 char(&v10)
modify v23 char(&v23)
modify v42 char(&v42)
modify v51 char(&v51)
modify v75 char(&v75)
;
quit;
Yes you can but you have to program some of it. For example:
data want;
infile source dsd dlm='|';
length dummy $1;
do k=1 to 9;
input dummy @;
end;
input col10 @;
do k=11 to 22;
input dummy @;
end;
input col23 @;
do k=24 to 41;
... etc.
@Astounding wrote:
Yes you can but you have to program some of it. For example:
data want;
infile source dad dlm='!';
length dummy $1;
do k=1 to 9;
input dummy @;
end;
Input col10 @;
do k=11 to 22;
input dummy @;
end;
input col23 @;
do k=24 to 41;
... etc.
The input statement has features that make "skipping" much easier than you suggest.
data skip3;
infile cards dsd dlm='|';
length dummy $1.;
input var1:$8. 3*dummy var3;
cards;
Alfred|M|14|69|112.5
Alice|F|13|56.5|84
Barbara|F|13|65.3|98
Carol|F|14|62.8|102.5
Henry|M|14|63.5|102.5
James|M|12|57.3|83
Jane|F|12|59.8|84.5
Janet|F|15|62.5|112.5
Jeffrey|M|13|62.5|84
John|M|12|59|99.5
Joyce|F|11|51.3|50.5
;;;;
run;
proc print;
run;
Hello
I apologize for the delay of replying.
Thank you very much and I thank everyone who participated for their time and their attention to my question.
Regards.
Hi @Odyssey2001,
Another option is to scan the input record. Let's assume columns 23 and 51 contain character values (to be stored in variables CVAR1, CVAR2 of lengths, say, 11 and 12, respectively) and the other three specified columns contain numeric values (for variables NVAR1-NVAR3). Then your DATA step could look like this:
data want;
length cvar1 $11 cvar2 $12;
infile 'test.pipe';
input;
nvar1=input(scan(_infile_,10,'|'), 32.);
cvar1= scan(_infile_,23,'|');
nvar2=input(scan(_infile_,42,'|'), 32.);
cvar2= scan(_infile_,51,'|');
nvar3=input(scan(_infile_,75,'|'), 32.);
run;
(Thanks, @SASJedi, for the sample data.)
If all target variables are of the same type, e.g. numeric, you can use an array:
data want(drop=col i);
array x[*] these are your target varnames;
infile 'test.pipe';
input;
do col=10, 23, 42, 51, 75;
i=sum(i,1);
x[i]=input(scan(_infile_,col,'|'), 32.);
end;
run;
@data_null__: Nice trick! Thanks for sharing here and there.
Edit: Removed the unnecessary options from the INFILE statements. Also, note that in case of possibly quoted character values in the raw data you may want to apply the DEQUOTE function, e.g.
cvar1=dequote(scan(_infile_,23,'|'));
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.