BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rfarmenta
Obsidian | Level 7

I have a large SAS dataset with 201,000 observations and 2800 variables. I need to export it to multiple CSV or text files that have no more than 1000 columns each. The dataset contains two identifiers studyid and partnum that need to be included in each dataset to remerge them later. I have been trying to use proc export with different delimiters but keep getting a couple hundred extra columns. Is there a way to stip a potential delimiter from all the data to make sure I do not get these type of errors? Also, is there a way to export certain groups of variables, maybe alphabetically or something. Any suggestions would be greatly apprecaited.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Here is corrected code that will generate the delimiters for the header row manually in the code.  The code I tried before with the trailing @ would only work when putting a variable and not a string literal.  You could adapt the code to use teh variable list from the list used to generate the code, but then the generated code could not be used later as a stand alone program.

filename code temp;
data _null_;
 file code lrecl=80 ;
 do until (last.group);
   set files ;
   by group ;
   fname = cats('outfile',group,'.csv');
   if first.group then put
    / 'data _null_;'
    / '  set have; '
    / '  file ' fname :$quote. 'dsd lrecl=100000;'
    / '  if _n_=1 then put "studyid,partnum," ' @
   ;
   put  name :$quote. @ ;
   if last.group then put / '  ;' ;
   else put '"," ' @ ;
 end;
 do until (last.group);
   set files ;
   by group ;
   if first.group then put
      '  put ( studyid partno ' @
   ;
   put name @ ;
   if last.group then put
      ') (+0);'
    / 'run;'
   ;
 end;
run;

If you want special missing value to print as missing instead of the letter attached to them then you might need to add more code into the generated program to set the special missing to normal missing.  

 

Something like this would work even if the source data did not have any numeric variables.

 array __N _n_ _numeric_;
 do _n_=1 to dim(__N); if missing(__N(_n_)) then __N(_n_)=.; end;

Similarly if you want to trap CR and/or LF in the character variables you could add code like this to convert them into spaces. The first two lines purposely adds the variable _C_ to insure that the _CHARACTER_ variable list is not empty.  You might not need those two lines if your STUDYID variable is also character.

 retain _c_ ' ' ;
 drop _c_;
 array __C _character_;
 do _n_=1 to dim(__C); __C(_n_) = translate(__C(_n_),'  ','0D0A'x); end;

 

View solution in original post

22 REPLIES 22
ballardw
Super User

You have several options all somewhat ugly depending on your data set names.

One would be to use dataset options to list the variables used

Proc export data=mydataset(keep= idvar1 idvar2 othvar -- anothervar)

                   outfile ="outdata1.csv"

                   dbms=csv

                   replace;

run;

 

Proc export data=mydataset(keep= idvar1 idvar2 anothervarplus1 -- someothervar)

                   outfile ="outdata2.csv"

                   dbms=csv

                   replace;

run;

Proc export data=mydataset(keep= idvar1 idvar2 someothervarplus1 -- lastvar)

                   outfile ="outdata3.csv"

                   dbms=csv

                   replace;

run;

 

The -- says to keep all of the variables in order so if you have variables:

var thatvar anothervar somevar yetanothervar

and use Keep = var -- somevar then the result is to keep the first 4 of the above variables.

 

Shmuel
Garnet | Level 18

The easiest way shoul be:

 

 

%macro split_export(keys=, vars=, outnum=);
      proc export data=have(keep=&keys &vars);
          outfile "..path..name&outnum..csv"   /* adapt path and file name prefix */
     run;
%mend split_export;
%split_export(keys=studid partnum, vars=var1-var998 , 1);
%split_export(keys=studid partnum, vars=var999-var1996 ,2);

Of course, if your variable names have no constant prefix (like var...) you shall need more comlicated code,

but it is possible.

     

rfarmenta
Obsidian | Level 7

Thank you both for the proposed solutions. I will try both now. My variable names do not have a constant prefix so I will try to modify to accomodate this.

 

For my delimiter issue, is there something I should do that would help with this or is it a non-issue?

Shmuel
Garnet | Level 18

If you choose dbms=csv the the deimiter is a comma,

otherwise try define dlm=  your own delimiter.

 

If there is no text field that may contain commas then prefer the dbms=csv;

rfarmenta
Obsidian | Level 7

Thank you. Part of my problem is that there are text fields that contain commas and many other characters so it is difficult to know what delimiter might work best. Is there a stip a certain character from the entire dataset to make sure I can export correctly?

 

Thank you again for your help!

Shmuel
Garnet | Level 18

Usually text variables don't include tabs.

try dbms=tab or dlm='09'x;

Tom
Super User Tom
Super User

Why do you need to eliminate the delimiters from the data? SAS will automatically include quotes around values that contain delimiters.  What software are you using to read the files that does not understand that?

Tom
Super User Tom
Super User

1) CSV (or any delimited file) is easy to write with a data step. No need to get complex and try to use proc's to generate them.

2) SAS will put quotes around values that contain delimiters so there is no need to eliminate them from your data. You should check the tool that you are using to read the files and make sure that it understands this basic rule of delimited files.

3) You can use PROC CONTENTS or other methods to get a list of the variables in your file and then break them into sets of less than 1,000 variables. Then use those sets to generate the data steps to write your CSV files.

 

proc contents data=have out=contents(keep=varnum name) noprint ; run;
proc sort data=contents;
  where lowcase(name) not in ('studyid','partnum');
  by varnum;
run;
data files ;
  group+1;
  do col=3 to 1000 until (eof);
    set contents end=eof;
    output;
  end;
run;
filename code temp;
data _null_;
 file code lrecl=80 ;
 do until (last.group);
   set files ;
   by group ;
   fname = cats('outfile',group,'.csv');
   if first.group then put
    / 'data _null_;'
    / '  set have; '
    / '  file ' fname :$quote. 'dsd lrecl=100000;'
    / '  if _n_=1 then do;'
    / '    put "studyid" @;'
    / '    put "partnum" @;'
   ;
   put '    put ' name :$quote. '@;' ;
   if last.group then put
      '    put;'
    / '  end;'
   ;
 end;
 do until (last.group);
   set files ;
   by group ;
   if first.group then put
    / '  put ( studyid partnum ' @
   ;
   put name @ ;
   if last.group then put
      ') (+0);'
    / 'run;'
   ;
 end;
run;

then just use %INCLUDE to run the generated code.

%include code / source2 ;
rfarmenta
Obsidian | Level 7

I don't need to eliminate delimiters, I was just running into an issue of extra rows being created in the past so was trying to come up with a solution to fix this. I am trying a couple different options now that you all have suggested. Thank you for your promopt responses and help!

ballardw
Super User

@rfarmenta wrote:

I don't need to eliminate delimiters, I was just running into an issue of extra rows being created in the past so was trying to come up with a solution to fix this. I am trying a couple different options now that you all have suggested. Thank you for your promopt responses and help!


"Extra rows" likely because of carriage return or line feed characters in a data field. Proc export will only create one row of output per observation but if the data contains something that another application considers and end of row the result will look like that. If you have one or more data fields that are some sort of unstructed comment or note field you may consider investigating if the variable has such characters as part of the content and remove them prior to export.

 

 

rfarmenta
Obsidian | Level 7

Thank you! I tried that code that Tom sent and when I open the data in excel all of the variable names are in the first cell so when I read that data set into SAS is names all the variables var1-var1000. I am not sure what is going on with that. Also, is there a way to have sepcial missing values just be exported as missing since most other programs do not recognize special missing?

Shmuel
Garnet | Level 18

@rfarmenta wrote: "when I open the data in excel all of the variable names are in the first cell so when I read that data set into SAS is names all the variables var1-var1000. I am not sure what is going on with that."

 

It is not clear what have you run and what results you got. Please post log of your run and screenshot of the excel.

 

As to missing values, you can try add next statement before the proc export or the data steps:

options missing=' ';
Tom
Super User Tom
Super User

Here is corrected code that will generate the delimiters for the header row manually in the code.  The code I tried before with the trailing @ would only work when putting a variable and not a string literal.  You could adapt the code to use teh variable list from the list used to generate the code, but then the generated code could not be used later as a stand alone program.

filename code temp;
data _null_;
 file code lrecl=80 ;
 do until (last.group);
   set files ;
   by group ;
   fname = cats('outfile',group,'.csv');
   if first.group then put
    / 'data _null_;'
    / '  set have; '
    / '  file ' fname :$quote. 'dsd lrecl=100000;'
    / '  if _n_=1 then put "studyid,partnum," ' @
   ;
   put  name :$quote. @ ;
   if last.group then put / '  ;' ;
   else put '"," ' @ ;
 end;
 do until (last.group);
   set files ;
   by group ;
   if first.group then put
      '  put ( studyid partno ' @
   ;
   put name @ ;
   if last.group then put
      ') (+0);'
    / 'run;'
   ;
 end;
run;

If you want special missing value to print as missing instead of the letter attached to them then you might need to add more code into the generated program to set the special missing to normal missing.  

 

Something like this would work even if the source data did not have any numeric variables.

 array __N _n_ _numeric_;
 do _n_=1 to dim(__N); if missing(__N(_n_)) then __N(_n_)=.; end;

Similarly if you want to trap CR and/or LF in the character variables you could add code like this to convert them into spaces. The first two lines purposely adds the variable _C_ to insure that the _CHARACTER_ variable list is not empty.  You might not need those two lines if your STUDYID variable is also character.

 retain _c_ ' ' ;
 drop _c_;
 array __C _character_;
 do _n_=1 to dim(__C); __C(_n_) = translate(__C(_n_),'  ','0D0A'x); end;

 

rfarmenta
Obsidian | Level 7

Thank you, Tom! This worked wonderfully.

 

One other thing, is  there a way to get rid of all of the tabs in the data? Apparently the group this is being shared with needs the tabs to be removed for their ingestion of the data.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 5588 views
  • 7 likes
  • 4 in conversation