SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Export 2000000 variables into delimited file

Accepted Solution Solved
Reply
Contributor
Posts: 40
Accepted Solution

Export 2000000 variables into delimited file

Dear friends I need your help again.

I have data set with more than 2000000 variables

 

id snp1 snp2 snp3 ... snp2000000

 

i want to export it to any kind of delimited file, and preserve header.

I tried few stuff like:

 

PROC EXPORT DATA= bikovi (OBS=0)
OUTFILE= "C:\Users\Maja\Desktop\gv50.csv"
DBMS=CSV REPLACE;
PUTNAMES=YES;
RUN;


DATA _NULL_;
SET bikovi;
FILE "C:\Users\Maja\Desktop\gv50.csv" MOD DSD DLM=',' LRECL=5000000 ;
PUT (_ALL_) (Smiley Happy ;
RUN;

 

and while the data are fine, my header is stopping at snp3708 every time. 

What can be a reason?

I tried on smaller data set (40000 variables) and the story was the same.

 

My data are very simple

id is character value length 15 but if needed i can have it as numeric 1,2,3...671 while SNPs are numbers 0,1,2.

I tried also a robust macro from 

http://support.sas.com/kb/24/727.html but this one only complicated everything

Hope you guys can help me

Thanks!


Accepted Solutions
Solution
‎10-13-2016 03:12 AM
Super User
Super User
Posts: 7,039

Re: Export 2000000 variables into delimited file

[ Edited ]
Posted in reply to MajaFerencakovic

Just replace your PROC EXPORT with a data step to write the header.

 

filename snp  "C:\Users\Maja\Desktop\gv50.csv";

data _null_ ;
  file snp dsd lrecl=10000000 ;
  put 'id' @;
  do i=1 to 2000000;
    name = cats('snp',i);
    put name @;
  end;
  put;
run;

data _null_ ;
  file snp dsd lrecl=10000000 mod ;
  set bikovi;
  put ( id snp1-snp2000000 ) (+0) ;
run;

 

(or do it in the other data step)

 

View solution in original post


All Replies
Super User
Posts: 7,762

Re: Export 2000000 variables into delimited file

Posted in reply to MajaFerencakovic

First of all, I have the notion that a dataset with 2 million variables is the result of a design failure (holding data in structure), I guess that switching from a wide to a long format would be a big improvement in further handling the data.

 

With which application are you viewing your csv?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 40

Re: Export 2000000 variables into delimited file

Posted in reply to KurtBremser

well, here it is not usefull to discuss why my colegue need wide instead of long format. I have them in long. But as I said it, this is not up to me...

However those are genotype data, lot of info... This is kind of standard format.

 

TextPad opens this one nicely. No issues at all. If issues would apear I woul use long file text viewer or something to check. 

As I said, data are fine. Only header stops. 

No problem with exporting data. But guy needs header to... 

Super User
Posts: 5,424

Re: Export 2000000 variables into delimited file

Posted in reply to MajaFerencakovic

Why on earth do you have 2'' variables?

What is the purpose on such a layout?

General advice - transpose from wide to long.

Data never sleeps
Contributor
Posts: 40

Re: Export 2000000 variables into delimited file

guys, this is not helpful.
This is standard .ped format used in genetics. So can we skip the part why would I need to have 2000000 variables. I have them, I must deal with them...
Super User
Posts: 7,762

Re: Export 2000000 variables into delimited file

Posted in reply to MajaFerencakovic

The data step example you posted will not create a header line, only the proc export.

 

Now, proc export creates a data step that does the export, and runs that.

I have the suspicion that in this data step, SAS runs into the limitation for statements (32767) when it tries to write the header line.

 

I guess you might have to roll your own macro for this:

%macro output_file;
data _null_;
set dataset;
file "c:\outfile.csv" lrecl=1000000000 dlm=',';
if _n_ = 1
then do;
  put "ID"@;
%do i = 1 %to 2000000;
  put ",SNP&i"@;
%end;
end;
put id@;
%do i = 1 %to 2000000;
put snp&i.@;
%end;
run;
%mend;

This splits the big put for the header into single puts, as is done in the proc export-generated data step for the variables.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 40

Re: Export 2000000 variables into delimited file

Posted in reply to KurtBremser

looks promising.

I tried it on smaller data set (40000). 

But, after the last one header (snp40000), data starts to apear in the same line.

 

 

Super User
Posts: 7,762

Re: Export 2000000 variables into delimited file

Posted in reply to MajaFerencakovic

MajaFerencakovic wrote:

looks promising.

I tried it on smaller data set (40000). 

But, after the last one header (snp40000), data starts to apear in the same line.

 

 


Ups. Add a simple put; statement immediately before the end; of the if _n_ = 1 block, but after the %end;

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 40

Re: Export 2000000 variables into delimited file

Posted in reply to KurtBremser

hm... still not working...

 

Smiley Sad

Super User
Posts: 11,343

Re: Export 2000000 variables into delimited file

Posted in reply to MajaFerencakovic

If you have that many variables and the column labels are longer than the values then you need to make sure that the output line is long enough to hold the text of the headers.

Contributor
Posts: 40

Re: Export 2000000 variables into delimited file

the file by itself is fine. All headers are in but data starts in the same line as headers

Super User
Posts: 7,762

Re: Export 2000000 variables into delimited file

Posted in reply to MajaFerencakovic

MajaFerencakovic wrote:

hm... still not working...

 

Smiley Sad


%macro output_file;
%let maxvar=2000000;
data _null_;
set dataset;
file "c:\outfile.csv" lrecl=1000000000 dlm=',';
if _n_ = 1
then do;
  put "ID"@;
%do i = 1 %to %eval(&maxvar-1);
  put ",SNP&i"@;
%end;
end;
put ",SNP&maxvar";
put id@;
%do i = 1 %to &maxvar;
put snp&i.@;
%end;
run;
%mend;

That should deal with the issue of the header line not terminating correctly.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 40

Re: Export 2000000 variables into delimited file

Posted in reply to KurtBremser
Works!!! Thank you! It is nice to have so many working solutions.
Valued Guide
Posts: 505

Re: Export 2000000 variables into delimited file

Posted in reply to KurtBremser
Actually handling fat 'blobs' is important and SAS has recforms f and n to handle these situations. Also useful with binary data.

Thanks for the question

HAVE 2,000,000 variables with values

 -- CHARACTER --
Variable          Len     Value

T00000001    C    10      0000000000
T00000002    C    10      0000000002
T00000003    C    10      0000000003
T00000004    C    10      0000000004
T00000005    C    10      0000000005
T00000006    C    10      0000000006
T00000007    C    10      0000000007
T00000008    C    10      0000000008
....
T02000000    C    10      0020000000


WANT CSV FILE 9 2,000,000 WIDE

"T0000000"  ,"T00000002","T00000003","T00000004","T00000005",..."T20000000"

"00000000"  ,"000000002","000000003","000000004","000000005",..."020000000"
"10000000"  ,"100000002","100000003","100000004","100000005",..."120000000"
"20000000"  ,"200000002","200000003","200000004","200000005",..."220000000"


SOLUTION (MIGHT BE EASIER WITH RECFM=N?)
=======================================
this is not a complete solution but it should scale.
Might get you started

FSLIST is critical to understanding


filename fixblk "d:/txt/fixblk.txt" lrecl=12 recfm=f;

data _null_;
  length varout $12;
  file fixblk;
  varout=' "T0000000"';
  put varout @;
  do var=2 to 1000;
    varout=cats(',"T',put(var,z8.),'"');
    put varout @;
  end;
  put '202020202020202020200D0A'x @;
  varout=put(0,z9.);
  put varout @;
  do var=2 to 1000;
    varout=cats(',',put(var,z10.));
    put varout;
  end;
  put '202020202020202020200D0A'x @;
run;quit;

* turn numbers on and hex on;
filename fixblk "d:/txt/fixblk.txt" lrecl=120 recfm=f;
proc fslist file=fixblk;
run;quit;

* import to check;
data fixblk;
  infile "d:/txt/fixblk.txt" firstobs=2 lrecl=32756;
  array t[1000] $10 t0001-t1000;
  input;
  do i=1 to 1000;
    t[i]=scan(_infile_,i,',');
  end;
  output;
run;quit;
Super User
Posts: 10,020

Re: Export 2000000 variables into delimited file

Posted in reply to MajaFerencakovic

Code should be right.

You must generate that file under UNIX and browse it under WINDOWS.

They have different end of row character , so you will see they are in the same row.

suggest you open that file by Notepad++ .

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 18 replies
  • 1100 views
  • 7 likes
  • 8 in conversation