BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
js5
Pyrite | Level 9 js5
Pyrite | Level 9

Hello,

 

is there a smart way of importing delimited files which have records in columns and variables in rows? Something like:

parameter subject1 subject2 subject3
numvar1 1232 24342 3.51
charvar1 afdf bsd cdfsgff
numvar2 412123 5324 6.1

Importing columns as variables is possible but as some rows are numeric and some are character, I would need to run a bunch of input statements after import to get the numeric data back to numeric. Thank you for the feedback in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

If you know how many rows there are and which ROWS have numeric values then I would suggest just reading into arrays and then looping over the array elements to write out observations.  You can just set the upper limit on the size of the array to something that is larger than maximum you will ever get.

 

options parmcards=text ;
filename text temp;
parmcards;
parameter subject1 subject2 subject3
numvar1 1232 24342 3.51
charvar1 afdf bsd cdfsgff
numvar2 412123 5324 6.1
;


data want;
  array p [100] $10 _temporary_;
  array n1 [100] _temporary_;
  array c1 [100] $8 _temporary_;
  array n2 [100] _temporary_;

  infile text dsd dlm=' ' truncover end=eof;
  input varname :$32. @ ;
  select (varname);
    when ('parameter') do i=1 to 100;
       input p[i] @;
    end;
    when ('numvar1') do i=1 to 100;
       input n1[i] @;
    end;
    when ('charvar1') do i=1 to 100;
       input c1[i] @;
    end;
    when ('numvar2') do i=1 to 100;
       input n2[i] @;
    end;
  end;
  input;
  if eof then do;

  length parameter $10 numvar1 8 charvar1 $8 numvar2 8 ;
    do i=1 to 100;
      parameter=p[i];
      numvar1=n1[i];
      charvar1=c1[i];
      numvar2=n2[i];
      if cmiss(of parameter -- numvar2)<4 then output;
    end;
  end;

  keep i parameter -- numvar2;
  rename i=column;
run;

proc print;
run;

If you wanted more flexibility then read the file into a tall dataset with ROW, COL and STRING variables.  Resort and write back out as a real CSV file you can then run PROC IMPORT on.

data tall ;
  infile text dsd dlm=' ' truncover column=cc length=ll ;
  row+1;
  do col=1 by 1 until(cc>ll);
    input value :$100. @;
    output;
  end;
  input;
run;

proc sort;
  by col row ;
run;

filename csv temp;

data _null_;
  file csv dsd ;
  do until(last.col) ;
    set tall;
    by col row;
    put value @ ;
  end;
  put;
run;

proc import datafile=csv dbms=csv out=want replace;
run;

proc print;
run;
Obs    parameter         numvar1    charvar1         numvar2

 1     subject1             1232    afdf              412123
 2     subject2            24342    bsd                 5324
 3     subject3             3.51    cdfsgff              6.1

 

 

View solution in original post

4 REPLIES 4
ballardw
Super User

First, do you have a document that describes the content of that file? Such as the maximum length of values in any given column, explicitly state what the delimiter is, how a missing value may appear? Do any of the values in the Parameter column repeat?

How many columns/rows are involved?

 

One approach, much as I hate to say it, may be to import that data into a spreadsheet and use the spreadsheet tools to do a transposition of rows and columns. Highlight and copy the data you have, on a different sheet in the spreadsheet, paste transposed. Then export that sheet to a delimited file and read that.

Caution with this approach: Some values might be get changed by the spreadsheet when importing the delimited files unless you force all the columns to be character when read.

 

Then find out whoever created that file a meet them in a dark alley and whip them with wet spaghetti noodles until they promise not to do that again. Though I suspect the delimited file likely started on a spreadsheet somewhere.

Tom
Super User Tom
Super User

If you know how many rows there are and which ROWS have numeric values then I would suggest just reading into arrays and then looping over the array elements to write out observations.  You can just set the upper limit on the size of the array to something that is larger than maximum you will ever get.

 

options parmcards=text ;
filename text temp;
parmcards;
parameter subject1 subject2 subject3
numvar1 1232 24342 3.51
charvar1 afdf bsd cdfsgff
numvar2 412123 5324 6.1
;


data want;
  array p [100] $10 _temporary_;
  array n1 [100] _temporary_;
  array c1 [100] $8 _temporary_;
  array n2 [100] _temporary_;

  infile text dsd dlm=' ' truncover end=eof;
  input varname :$32. @ ;
  select (varname);
    when ('parameter') do i=1 to 100;
       input p[i] @;
    end;
    when ('numvar1') do i=1 to 100;
       input n1[i] @;
    end;
    when ('charvar1') do i=1 to 100;
       input c1[i] @;
    end;
    when ('numvar2') do i=1 to 100;
       input n2[i] @;
    end;
  end;
  input;
  if eof then do;

  length parameter $10 numvar1 8 charvar1 $8 numvar2 8 ;
    do i=1 to 100;
      parameter=p[i];
      numvar1=n1[i];
      charvar1=c1[i];
      numvar2=n2[i];
      if cmiss(of parameter -- numvar2)<4 then output;
    end;
  end;

  keep i parameter -- numvar2;
  rename i=column;
run;

proc print;
run;

If you wanted more flexibility then read the file into a tall dataset with ROW, COL and STRING variables.  Resort and write back out as a real CSV file you can then run PROC IMPORT on.

data tall ;
  infile text dsd dlm=' ' truncover column=cc length=ll ;
  row+1;
  do col=1 by 1 until(cc>ll);
    input value :$100. @;
    output;
  end;
  input;
run;

proc sort;
  by col row ;
run;

filename csv temp;

data _null_;
  file csv dsd ;
  do until(last.col) ;
    set tall;
    by col row;
    put value @ ;
  end;
  put;
run;

proc import datafile=csv dbms=csv out=want replace;
run;

proc print;
run;
Obs    parameter         numvar1    charvar1         numvar2

 1     subject1             1232    afdf              412123
 2     subject2            24342    bsd                 5324
 3     subject3             3.51    cdfsgff              6.1

 

 

js5
Pyrite | Level 9 js5
Pyrite | Level 9

Thank you, I modified your first approach and it worked as expected. As far as the format is concerned, it is quite spread in the next-generation sequencing field. Not sure why, but it is the reality I have to deal with.

Ksharp
Super User

So interesting and odd question .

 

option parmcards=t;
filename t temp;
parmcards;
parameter subject1 subject2 subject3
numvar1 1232 24342 3.51
charvar1 afdf bsd cdfsgff
numvar2 412123 5324 6.1
;


data _null_;
 infile t end=last;
 input;
 retain max;
 max=max(max,countw(_infile_,' '));
 if last then call symputx('n',max);
run;
data temp;
infile t truncover;
input (t1-t&n.) (: $100.);
run; 
proc transpose data=temp out=temp2(drop=_name_);
var _all_;
run;
filename x temp;
proc export data=temp2 outfile=x dbms=csv replace;
putnames=no;
run;
proc import datafile=x out=want dbms=csv replace;
guessingrows=max;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 557 views
  • 1 like
  • 4 in conversation