BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Odyssey2001
Fluorite | Level 6

Hi,

 

SAS version: 9.4

 

I would like to know whether it is possible to select specific columns from a pipe delimited text file and insert them into a SAS dataset. I have a huge file (several gigabytes) with 100 columns (= variables from SAS point of view), but I need to read only specific columns from that file. For example: the 10th, 23rd, 42th, 51th and 75th columns from the file.

 

I believe that similar questions have already been asked but the answers I found while I was googling, were mostly for column aligned data with fixed lengths and as a result fixed start and end positions for each column, that is, using  the @ operator with appropriate column number as the start reading position, followed by appropriate offset numbers from that point, to continue reading by skiping the unwanted columns.

 

But my problem is different in a sense that there is no assumption on a fixed equal length per column (apart from the max possible length) and it is the delimiter "|" which separates the data and it is the sole criterion that allows to distinguish columns one from another.

 

Therefore, I would like to ask whether it is possible to do this and select specific (varying size) columns from a pipe delimited text file and insert them into SAS.

 

Thanks in advance,

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

@Astounding wrote:
Yes you can but you have to program some of it. For example:

data want;
infile source dad dlm='!';
length dummy $1;
do k=1 to 9;
input dummy @;
end;
Input col10 @;
do k=11 to 22;
input dummy @;
end;
input col23 @;
do k=24 to 41;

... etc.

The input statement has features that make "skipping" much easier than you suggest.

 

data skip3;
   infile cards dsd dlm='|';
   length dummy $1.;
   input var1:$8. 3*dummy var3;
   cards;
Alfred|M|14|69|112.5
Alice|F|13|56.5|84
Barbara|F|13|65.3|98
Carol|F|14|62.8|102.5
Henry|M|14|63.5|102.5
James|M|12|57.3|83
Jane|F|12|59.8|84.5
Janet|F|15|62.5|112.5
Jeffrey|M|13|62.5|84
John|M|12|59|99.5
Joyce|F|11|51.3|50.5
;;;;
   run;
proc print;
   run;

Capture.PNG

View solution in original post

7 REPLIES 7
SASJedi
SAS Super FREQ

When reading delimited files you have to read in all of the columns (up to the last one you want to keep)  - there's no way of finding the correct data otherwise. But you don't have to keep all of the columns - just the ones you want:

/* This is just to create a suitable text file for testing */
proc sql noprint;
select name into :collist separated by ' '
   from dictionary.columns
   where libname='SASHELP' and memname='LEUTRAIN'
     and name <='x1500';
;
quit; 

data _null_;
   set sashelp.leutrain (obs=10 keep=&collist);
   file "test.pipe" dsd dlm='|';
   put &colprint;
run;

/* Here's the read program code: 
Read in the text file, keep only columns 10, 23, 42, 51 and 75.*/ data want; keep v10 v23 v42 v51 v75; array v[75] $ 32767; array l[75] _temporary_; length thisvar $ 32767; infile "test.pipe" dsd dlm='|' end=last; input @; do i=1 to 75; input thisvar @; v[i]=thisvar; /* Keep track of longest length for each variable */ l[i]=max(l[i],length(v[i])); end; if last then do i=1 to 75; call symputx(cats('v',i),l[i]); end; run; /* Now, the columns are WAY too large for the data let's fix that:*/ proc sql; alter table want modify v10 char(&v10) modify v23 char(&v23) modify v42 char(&v42) modify v51 char(&v51) modify v75 char(&v75) ; quit;
Check out my Jedi SAS Tricks for SAS Users
Astounding
PROC Star

Yes you can but you have to program some of it. For example:

data want;
infile source dsd dlm='|';
length dummy $1;
do k=1 to 9;
input dummy @;
end;
input col10 @;
do k=11 to 22;
input dummy @;
end;
input col23 @;
do k=24 to 41;

... etc.

data_null__
Jade | Level 19

@Astounding wrote:
Yes you can but you have to program some of it. For example:

data want;
infile source dad dlm='!';
length dummy $1;
do k=1 to 9;
input dummy @;
end;
Input col10 @;
do k=11 to 22;
input dummy @;
end;
input col23 @;
do k=24 to 41;

... etc.

The input statement has features that make "skipping" much easier than you suggest.

 

data skip3;
   infile cards dsd dlm='|';
   length dummy $1.;
   input var1:$8. 3*dummy var3;
   cards;
Alfred|M|14|69|112.5
Alice|F|13|56.5|84
Barbara|F|13|65.3|98
Carol|F|14|62.8|102.5
Henry|M|14|63.5|102.5
James|M|12|57.3|83
Jane|F|12|59.8|84.5
Janet|F|15|62.5|112.5
Jeffrey|M|13|62.5|84
John|M|12|59|99.5
Joyce|F|11|51.3|50.5
;;;;
   run;
proc print;
   run;

Capture.PNG

Odyssey2001
Fluorite | Level 6

Hello

 

I apologize for the delay of replying.

 

Thank you very much and I thank everyone who participated for their time and their attention to my question.

 

Regards.

Astounding
PROC Star
Handhelds make typing difficult!

Dsd not dad.

Pipe character not exclamation point.
FreelanceReinh
Jade | Level 19

Hi @Odyssey2001,

 

Another option is to scan the input record. Let's assume columns 23 and 51 contain character values (to be stored in variables CVAR1, CVAR2 of lengths, say, 11 and 12, respectively) and the other three specified columns contain numeric values (for variables NVAR1-NVAR3). Then your DATA step could look like this:

data want;
length cvar1 $11 cvar2 $12;
infile 'test.pipe';
input;
nvar1=input(scan(_infile_,10,'|'), 32.);
cvar1=      scan(_infile_,23,'|');
nvar2=input(scan(_infile_,42,'|'), 32.);
cvar2=      scan(_infile_,51,'|');
nvar3=input(scan(_infile_,75,'|'), 32.);
run;

(Thanks, @SASJedi, for the sample data.)

 

If all target variables are of the same type, e.g. numeric, you can use an array:

data want(drop=col i);
array x[*] these are your target varnames;
infile 'test.pipe';
input;
do col=10, 23, 42, 51, 75;
  i=sum(i,1);
  x[i]=input(scan(_infile_,col,'|'), 32.);
end;
run;

 

@data_null__: Nice trick! Thanks for sharing here and there.

 

Edit: Removed the unnecessary options from the INFILE statements. Also, note that in case of possibly quoted character values in the raw data you may want to apply the DEQUOTE function, e.g.

cvar1=dequote(scan(_infile_,23,'|'));

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1873 views
  • 6 likes
  • 6 in conversation