- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
is there a smart way of importing delimited files which have records in columns and variables in rows? Something like:
parameter subject1 subject2 subject3 numvar1 1232 24342 3.51 charvar1 afdf bsd cdfsgff numvar2 412123 5324 6.1
Importing columns as variables is possible but as some rows are numeric and some are character, I would need to run a bunch of input statements after import to get the numeric data back to numeric. Thank you for the feedback in advance.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you know how many rows there are and which ROWS have numeric values then I would suggest just reading into arrays and then looping over the array elements to write out observations. You can just set the upper limit on the size of the array to something that is larger than maximum you will ever get.
options parmcards=text ;
filename text temp;
parmcards;
parameter subject1 subject2 subject3
numvar1 1232 24342 3.51
charvar1 afdf bsd cdfsgff
numvar2 412123 5324 6.1
;
data want;
array p [100] $10 _temporary_;
array n1 [100] _temporary_;
array c1 [100] $8 _temporary_;
array n2 [100] _temporary_;
infile text dsd dlm=' ' truncover end=eof;
input varname :$32. @ ;
select (varname);
when ('parameter') do i=1 to 100;
input p[i] @;
end;
when ('numvar1') do i=1 to 100;
input n1[i] @;
end;
when ('charvar1') do i=1 to 100;
input c1[i] @;
end;
when ('numvar2') do i=1 to 100;
input n2[i] @;
end;
end;
input;
if eof then do;
length parameter $10 numvar1 8 charvar1 $8 numvar2 8 ;
do i=1 to 100;
parameter=p[i];
numvar1=n1[i];
charvar1=c1[i];
numvar2=n2[i];
if cmiss(of parameter -- numvar2)<4 then output;
end;
end;
keep i parameter -- numvar2;
rename i=column;
run;
proc print;
run;
If you wanted more flexibility then read the file into a tall dataset with ROW, COL and STRING variables. Resort and write back out as a real CSV file you can then run PROC IMPORT on.
data tall ;
infile text dsd dlm=' ' truncover column=cc length=ll ;
row+1;
do col=1 by 1 until(cc>ll);
input value :$100. @;
output;
end;
input;
run;
proc sort;
by col row ;
run;
filename csv temp;
data _null_;
file csv dsd ;
do until(last.col) ;
set tall;
by col row;
put value @ ;
end;
put;
run;
proc import datafile=csv dbms=csv out=want replace;
run;
proc print;
run;
Obs parameter numvar1 charvar1 numvar2 1 subject1 1232 afdf 412123 2 subject2 24342 bsd 5324 3 subject3 3.51 cdfsgff 6.1
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
First, do you have a document that describes the content of that file? Such as the maximum length of values in any given column, explicitly state what the delimiter is, how a missing value may appear? Do any of the values in the Parameter column repeat?
How many columns/rows are involved?
One approach, much as I hate to say it, may be to import that data into a spreadsheet and use the spreadsheet tools to do a transposition of rows and columns. Highlight and copy the data you have, on a different sheet in the spreadsheet, paste transposed. Then export that sheet to a delimited file and read that.
Caution with this approach: Some values might be get changed by the spreadsheet when importing the delimited files unless you force all the columns to be character when read.
Then find out whoever created that file a meet them in a dark alley and whip them with wet spaghetti noodles until they promise not to do that again. Though I suspect the delimited file likely started on a spreadsheet somewhere.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you know how many rows there are and which ROWS have numeric values then I would suggest just reading into arrays and then looping over the array elements to write out observations. You can just set the upper limit on the size of the array to something that is larger than maximum you will ever get.
options parmcards=text ;
filename text temp;
parmcards;
parameter subject1 subject2 subject3
numvar1 1232 24342 3.51
charvar1 afdf bsd cdfsgff
numvar2 412123 5324 6.1
;
data want;
array p [100] $10 _temporary_;
array n1 [100] _temporary_;
array c1 [100] $8 _temporary_;
array n2 [100] _temporary_;
infile text dsd dlm=' ' truncover end=eof;
input varname :$32. @ ;
select (varname);
when ('parameter') do i=1 to 100;
input p[i] @;
end;
when ('numvar1') do i=1 to 100;
input n1[i] @;
end;
when ('charvar1') do i=1 to 100;
input c1[i] @;
end;
when ('numvar2') do i=1 to 100;
input n2[i] @;
end;
end;
input;
if eof then do;
length parameter $10 numvar1 8 charvar1 $8 numvar2 8 ;
do i=1 to 100;
parameter=p[i];
numvar1=n1[i];
charvar1=c1[i];
numvar2=n2[i];
if cmiss(of parameter -- numvar2)<4 then output;
end;
end;
keep i parameter -- numvar2;
rename i=column;
run;
proc print;
run;
If you wanted more flexibility then read the file into a tall dataset with ROW, COL and STRING variables. Resort and write back out as a real CSV file you can then run PROC IMPORT on.
data tall ;
infile text dsd dlm=' ' truncover column=cc length=ll ;
row+1;
do col=1 by 1 until(cc>ll);
input value :$100. @;
output;
end;
input;
run;
proc sort;
by col row ;
run;
filename csv temp;
data _null_;
file csv dsd ;
do until(last.col) ;
set tall;
by col row;
put value @ ;
end;
put;
run;
proc import datafile=csv dbms=csv out=want replace;
run;
proc print;
run;
Obs parameter numvar1 charvar1 numvar2 1 subject1 1232 afdf 412123 2 subject2 24342 bsd 5324 3 subject3 3.51 cdfsgff 6.1
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, I modified your first approach and it worked as expected. As far as the format is concerned, it is quite spread in the next-generation sequencing field. Not sure why, but it is the reality I have to deal with.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So interesting and odd question .
option parmcards=t;
filename t temp;
parmcards;
parameter subject1 subject2 subject3
numvar1 1232 24342 3.51
charvar1 afdf bsd cdfsgff
numvar2 412123 5324 6.1
;
data _null_;
infile t end=last;
input;
retain max;
max=max(max,countw(_infile_,' '));
if last then call symputx('n',max);
run;
data temp;
infile t truncover;
input (t1-t&n.) (: $100.);
run;
proc transpose data=temp out=temp2(drop=_name_);
var _all_;
run;
filename x temp;
proc export data=temp2 outfile=x dbms=csv replace;
putnames=no;
run;
proc import datafile=x out=want dbms=csv replace;
guessingrows=max;
run;