The SAS Output Delivery System and reporting techniques

split dataset and export into multiple files

Reply
Occasional Contributor
Posts: 13

split dataset and export into multiple files

Hi all!

 

I have a dataset which has shipment data, where a shipment can exist in multiple rows with additional data (such as return info, etc)

 

The dataset has 10000 shipments, the row count is more, but remains irrelevant, coz I need to split this dataset into splits of 100 shipments (column 2) and export these splits into xml files using ODS..

 

Dataset sample :

Rowcount    Shipment   Shipmentdeatails1  Shipmentdetails2

     1                  a                  x                               y

     2                  a                  x                               y

     3                  b                  x                               y

     4                  b                  x                               y

     5                  c                  x                               y

     6                  d                  x                               y

...

 

so in the above case, the first two rows are 1 shipment of 'a', and the row 5 is a single row and is the shipment 'c'

 

Could anyone please help me out with a code for this function.. thanks in advance!

Super User
Super User
Posts: 9,193

Re: split dataset and export into multiple files

Something like - and not tested as not typing in the test data - provide test data in the form of a datastep in future:

proc sort data=have out=loop nodupkey;
  by shipment;
run;

data _null_;
  set loop;
  retain fno 0;
  if _n_=1 or mod(_n_,100)=0 then do;
if _n_ > 1 then call execute('); run;'); fno=fno+1; call execute(cats('data want',put(fno,best.),'; set have; where shipment in (',quote(shipment))); end;
else call execute(' '||quote(strip(shipment))); run;

This shows how to split a dataset out into lots of datasets called wantX - X being incremental.  You could do something similar.

 

Trusted Advisor
Posts: 1,822

Re: split dataset and export into multiple files

You can use next step to split dataset as you want - just addapt number of datasets to gather into one output dataset:

data have;
  input shipment $;
cards;
a
a
a
b
b
c
d
d
d
e
;
run;
%let gather=2; /* number of shipments per dataset */
proc sort data=have; by shipment; run;
data temp;
 set have end=eof;
  by shipment;
     retain dsno 0   /* split dataset number */
            ship ;   /* shipping sequence    */
     if first.shipment then do;
        ship+1;
        if mod(ship, &gather) = 1 then dsno+1;
     end; 
     if eof then do;
        put ship= dsno=;
        call symput('maxship',strip(dsno));
     end;
run;
     
%macro split;
  %do i=1 %to &maxship.;
     data dsn_&i;
      set temp (where=(dsno = &i));
     run;
  %end;
%mend;
%split;
 
Occasional Contributor
Posts: 13

Re: split dataset and export into multiple files

thanks so much for your help! I was able to split it well to batch1, batch2.... so on Smiley Happy

so now.. is there a way to link the splits to a proc template program I've written to export? (..to xml in a certain hierarchy)

so as to get batch1.xml..batch2..etc

Trusted Advisor
Posts: 1,822

Re: split dataset and export into multiple files

You can combine the proc template into the macro as new step just after creating the splitted dataset.

 

Please post an example of your proc template, dealing with one dataset as an example,

something like:

 

%macro split;
  %do i=1 %to &maxship.;
     data dsn_&i;
      set temp (where=(dsno = &i));
     run;

    /* add here the proc template dealing with one dataset: dsn_&i */

  %end;
%mend;
Occasional Contributor
Posts: 13

Re: split dataset and export into multiple files

Hi Shmuel,

 

ok so then a simplified version of the proc template program goes like this for a single dataset named 'set0001' :

 

ods path(prepend) work.templat(update);

 

proc template;

/*tagset defined and named as xmlexport*/

define tagset tagsets.xmlexport /store=work.templat;

     define event doc;

     start:

     /*put required declarations in xml*/

     put  "<?xml version=""1.0""";

     putq " encoding=" encoding;

     put  " ?>";

     put  "<!--xml file to input the services data for price calculation-->";

     put "<shipments>";

     put "<header>";

     put "<sourceref>x0203p</sourceref>";

     put "<version>1.0</version>";

     put "</header>";

     eval $counter 0;

     eval $currentvalue "1";

     eval $previousvalue "2";

     eval $currentpk "3";

     eval $previouspk "4";

     finish:

     put "</pack>";

     put "</shipment>";

     put "</shipments>";

     end;

 

     /*execution of the event 'data' performed for each row*/

     define event row;

     start:

       trigger data /if cmp(section,"body");   

     end;

 

     /*step execution of columns within the main body of available data*/

     define event data;

     start:

     set $currentvalue value /if cmp(name,"shipnumber");

     do /if ^cmp($currentvalue, $previousvalue);

     put "</pack>" /if (cmp(name," shipnumber ") and ^cmp($previousvalue, "2"));

     put "</shipment>" /if (cmp(name," shipnumber ") and ^cmp($previousvalue, "2"));

     put "<shipment>" /if cmp(name," shipnumber ");

     put "< shipnumber >" value /if cmp(name," shipnumber ");

     put "</ shipnumber >" /if cmp(name," shipnumber ");

     eval $currentpk "3";

     eval $previouspk "4";

     done;

     set $currentpk value /if cmp(name,"pack");

     do /if (cmp($currentvalue, $previousvalue) and ^cmp($currentpk, $ previouspk)) or (^cmp($currentvalue, $previousvalue));

     put "</ pack >" /if (cmp(name," pack ")and cmp($currentvalue, $previousvalue));

     put "< pack >" /if cmp(name," pack ");

     put "<packnumber>" value /if cmp(name," pack ");

     put "</packnumber>" /if cmp(name," pack ");

     put "<volume>" value /if cmp(name,"volume");

     put "</volume>" /if cmp(name,"volume");

     put "<weight>" value /if cmp(name,"weight");

     put "</weight>" /if cmp(name,"weight");

     done;

     set $previousitem $currentitem /if cmp(name,"weight");

     set $previousvalue $currentvalue /if cmp(name," weight ");

     end;

end;                                                                       

run;

 

 

ods markup type=xmlexport file='/shared/data/test_ods.xml'; 

 

proc print data=work.set0001;

run;

ods markup close;

Trusted Advisor
Posts: 1,822

Re: split dataset and export into multiple files

I'm not familiar with proc template, but if  -  proc print data=work.set0001;  - is the only line to change

then replace the work.set0001 with  work.dsn_&i.

 

You should consider:

 - are there other lines or arguments addapted specifically to set0001 ?

If yes then you need add those arguments, somehow, to the macro,

otherwise just copy the whole code inside the macro - as I posted before - with the above change.

 

If need, you can add the export code following the proc template code inside the macro.

 

If possible just try it and check results.

Occasional Contributor
Posts: 13

Re: split dataset and export into multiple files

[ Edited ]

hi Shmuel,

 

no all the splits of the dataset would have identical structures and column names etc..so no adaptation issue..

 

and while I could replace the data=work.set0001 with  work.dsn_&i....but what about the xml file naming..?

 

 

Trusted Advisor
Posts: 1,822

Re: split dataset and export into multiple files

[ Edited ]

AS I said, I'm not familiar with proc template but if you point at the line or lines where is the xml file name

then you can:

- either change the xml file name to <prefix>_&i   

- or write each file in a separate subfolder:  <path>_&i./<xml-filename>   (use / in linux or \ in windows)
    which may need create the subfolders in advance - maybe programatically.

 

Just guessing - is next line containing the xml file name and is it the only one ?

  

ods markup type=xmlexport file='/shared/data/test_ods.xml'; 

if it is - change to:

  

ods markup type=xmlexport file="/shared/data/test_ods_&i.xml"; 

pay attention - I changed the single quotes into double quotes - in order that &i will be resolved.

Ask a Question
Discussion stats
  • 8 replies
  • 385 views
  • 3 likes
  • 3 in conversation