BookmarkSubscribeRSS Feed
blackraven
Fluorite | Level 6

Hi.

I got stuck on TAB separated TXT files with 200,000 variables each. I need to import them and then process them and export them back. I can import the first row of variable names like this:

data varnames;

infile 'x'  dsd delimiter = '09'x recfm = v lrecl = 5000000 obs = 1;

input varname : $32. @@;

run;

However, the datalayout should be that the variable names are on the columns and the values on the rows. I need to import all 200,000 variables, and then export them as well.

How to do this?

8 REPLIES 8
ballardw
Super User

I would start with PROC IMPORT. You have most of the important bits already.
Proc import datafile='X' /* or what ever the path/name to your data*/

       out = dataout

       dbms=dlm

       replace; /* in case you need to run this again*/

       delimiter = '09'x;

       datarow = 2; /*this will assume that the first row has the variable names */

       guessingrows = 32767; /* sas will "guess" whether your variables are numeric (or date/time) or string by looking at up to this many rows*/

run;

Haikuo
Onyx | Level 15

If my memory serves me right, Proc import has the 'lrecl' limit as 32,767. While OP needs to import 200,000 variables. Not sure how to accommodate using proc import.

One thing you could try is to put all of the variable names into macro variables (you probably need more than one, as the length of macro variable also has the same limit) if they are of the same type (character or numeric) or ideally same informat, and then using data step to input them.

Regards,

Haikuo

FriedEgg
SAS Employee

filename x temp;

*build fake datafile;

data _null_;

file x;

put '6509660967'x;

put '4809490950'x;

run;

*collect variable names and number of variables;

data _null_;

infile x dlm='09'x obs=1;

input varname : $32. @@;

call symputx(cats('varname',_n_),varname);

call symputx('nobs',_n_);

run;

*input data, assuming all columns have the same informat $32.;

options mprint;

data foo;

infile x dlm='09'x firstobs=2;

input %macro vloop; ( %do i=1 %to &nobs; &&varname&i %end; ) (:$32.) %mend; %vloop ;

run;

JasonDiVirgilio
Quartz | Level 8

Can you provide us with a snapshot of the data? Maybe the first 5 rows of columns 1 to 80 to give us an idea of what you mean?

blackraven
Fluorite | Level 6

Many thanks. I noted it was much faster with the following:

data varnames;
infile "x" dsd delimiter = '09'x recfm = v lrecl = 5000000 pad obs = 1;
input varname : $32. @@;

call symput('nvars', _n_);
run;

data varnames (drop = i);
set varnames end = eof ;
if _n_ = 1 then delete;
do i = 1 to 1000; if _n_ >= 1000 * (i - 1) then mvar_cat = i; end;
if eof then call symput('ncats', mvar_cat);
run;

%macro mvarcat_loop(i); proc sql noprint; %global var&i; select varname into :var&i separated by ' ' from varnames where mvar_cat = &i; quit; %mend mvarcat_loop;
%macro mvarcat; %do i = 1 %to &ncats; %mvarcat_loop(&i); %end; %mend mvarcat; %mvarcat;

%macro mvarcat2;

data prework;
infile "x" dsd delimiter = '09'x recfm = v lrecl = 5000000 pad firstobs = 2 obs = 10;
input (%do i = 1 %to &ncats; &&var&i %end;) (:$2.);
run;
%mend mvarcat2;
%mvarcat2;

Tom
Super User Tom
Super User

How about letting PROC TRANSPOSE do it for you.

* 1) Read the names into a dataset.  I used _NAME_ for the variable name as this is the default for PROC TRANSPOSE. ;

* Generate NCOL macro variable to see how many columns to read for each row. ;

data varnames;

  length _name_ $32 ;

  infile tmpfile1 dlm='09'x dsd obs=1 lrecl=5000000 end=eof;

  input _name_ @@;

  if eof then call symputx('ncol',_n_);

run;

sasfile varnames open;

* 2) Create a data step view to read the actual values.  Generate a ROW number  ;

* Use POINT= option to get the _NAME_ for the column. ;

data vertical / view=vertical;

  infile tmpfile1 dlm='09'x dsd firstobs=2 truncover lrecl=5000000;

  row+1;

  do col=1 to &ncol;

    input value $ @;

    set varnames point=col;

    output;

  end;

run;

* 3) PROC TRANSPOSE ;

proc transpose data=vertical out=horizontal(drop=_name_);

  by row;

  var value;

run;

sasfile varnames close;

FriedEgg
SAS Employee

In my first response I did not think through the quantity of macro variables you would be creating.  It would not be a good approach.  Instead I would generate an include file, the following could be cleaned up:

filename x temp;

data _null_;

file x;

put '6509660967'x;

put '4809490950'x;

run;

filename codegen temp;

data _null_;

file codegen;

put '%macro vars;';

run;

data _null_;

file codegen mod;

infile x dlm='09'x obs=1 end=last;

input varname : $32. @@;

put varname;

run;

data _null_;

file codegen mod;

put '%mend;';

run;

%include "%sysfunc(pathname(codegen))" /source2;

data foo;

infile x dlm='09'x firstobs=2;

input ( %vars ) (:$32.);

run;

blackraven
Fluorite | Level 6

It works just fine, and I will something similar for the TXT-export as well.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1342 views
  • 0 likes
  • 6 in conversation