I have a dataset like this: parent id var1 var2 var3 1 a b c 1 a b c 1 a b c 2 a b c 3 a b c 1 a b c which is about 350k rows in total. I want to split it into files approximately 20k rows long, but, I need to make certain that the parent id field is considered. In the above dataset example, if the 20kth record was the one with parent id of 2, I would not want the split to occur there, because that record with 2 needs to stay with the record above it (1) and the one after it (3). So, I think the logic could be that SAS could look at the 20k record mark, advance until it finds a record where parent ID is 1, then go back one record, which would ensure that I am getting a complete "group" (which could either be a record with only a parent id of 1, or a grouping like 1 2 and 3). I have no idea how to program this.. it seems tougher because the program would also have to know that it didn't stop at row 20k the previous time, since it had to find a value of 1 (which it might find at row 20,002), then it would have to go back one row and include only rows 1-20,001. Then the next iteration would have to know to start with 20,002, go another ~20k, find a 1, then go back 1. Perhaps this would be a 2 step process of first making a table containing the cutoffs, then using those values in the splitting code? In googling for splitting datasets in SAS, I've found the following code which I modified slightly and notated for my own understanding. But I don't know how this can be modified to have the program itself find the split (as long as it's at least 20k rows per new dataset), and pass the next start row on to the next iteration. Thanks in advance for any help. -------------------- %macro split1(num); data _null_; if 0 then set fin.output_0_all nobs=count; call symput('numobs',put(count,8.)); /* 05/22/13 10:49 - reads total # of records in file */ run; %let m=%sysevalf(&numobs/&num,ceil); /* 05/22/13 10:50 - m= # of files we are going to end up with. */ data %do J=1 %to &m ; fin.orig_&J %end; ; /* 05/22/13 10:50 - create datasets orig_1 through orig_m */ set fin.output_0_all; %do I=1 %to &m; /* 05/22/13 10:52 - do the following m times, meaning once for each output dataset */ if %eval(&num*(&I-1)) <_n_ <= %eval(&num*&I) then output fin.orig_&I; /*meaning 100000*(i-1)<recordnum <=100000*i , e.g first batch is 0<rnum<100000, 2nd is 100000<rnum<200000 */ %end;/* 05/22/13 11:01 - end the 1 to m (output sets) loop. */ run; %mend split1; %split1(20000);
... View more