This "Output Data Step View" as shown by @FriedEgg looks interesting especially should it also work for Viya in-cas processing (=a view for loading data into CAS).
Reducing I/O is normally driven by performance requirements but it looks like the current implementation of these output views won't help with that (see log below).
As long as a table fits into memory I'd be using a hash object for such a requirement.
Code:
options fullstimer;
data work.cars(drop=_:);
do _i=1 to _nobs;
do _j=1 to 10000;
set sashelp.cars nobs=_nobs point=_i;
output;
end;
end;
stop;
run;
/* option 1: output view */
data cars_asia(where=(origin='Asia'))
cars_europe(where=(origin='Europe'))
cars_usa(where=(origin='USA'))
/
view=split_sort;
if 0 then set work.cars;
set split_sort;
run;
proc sort data=work.cars out=split_sort;
by msrp;
run;
/* option 2A: Hash table */
data _null_;
if 0 then set work.cars;
dcl hash h1(dataset:'work.cars', multidata:'y', ordered:'y');
h1.defineKey('origin');
h1.defineData(all:'y');
h1.defineDone();
h1.output(dataset:"cars_asia(where=(origin='Asia'))");
h1.output(dataset:"cars_asia(where=(origin='Europe'))");
h1.output(dataset:"cars_asia(where=(origin='USA'))");
run;
/* option 2B: Hash table with dynamic output */
data _null_;
dcl hash h1(dataset:'work.cars(obs=0)', multidata:'y', ordered:'y');
h1.defineKey('origin');
h1.defineData(all:'y');
h1.defineDone();
dcl hash h2(ordered:'y', multidata:'n');
h2.defineKey('origin');
h2.defineDone();
dcl hiter hh2('h2');
do _i=1 to _nobs;
set work.cars nobs=_nobs point=_i;
_rc=h1.add();
_rc=h2.ref();
end;
_rc=hh2.first();
do while(_rc=0);
_rc=h1.output(dataset: cats('cars_',origin,'(where=(origin="',origin,'"))') );
_rc=hh2.next();
end;
stop;
run;
Log:
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
68
69 options fullstimer;
70 data work.cars(drop=_:);
71 do _i=1 to _nobs;
72 do _j=1 to 10000;
73 set sashelp.cars nobs=_nobs point=_i;
74 output;
75 end;
76 end;
77 stop;
78 run;
NOTE: The data set WORK.CARS has 4280000 observations and 15 variables.
NOTE: DATA statement used (Total process time):
real time 0.86 seconds
user cpu time 0.55 seconds
system cpu time 0.31 seconds
memory 1962.25k
OS Memory 22948.00k
Timestamp 02/02/2025 01:41:13 AM
Step Count 158 Switch Count 5
Page Faults 0
Page Reclaims 234
Page Swaps 0
Voluntary Context Switches 24
Involuntary Context Switches 9
Block Input Operations 0
Block Output Operations 1272584
79
80 /* option 1: output view */
81 data cars_asia(where=(origin='Asia'))
82 cars_europe(where=(origin='Europe'))
83 cars_usa(where=(origin='USA'))
84 /
85 view=split_sort;
86
87 if 0 then set work.cars;
88 set split_sort;
89 run;
NOTE: DATA STEP view saved on file WORK.SPLIT_SORT.
NOTE: A stored DATA STEP view cannot run under a different operating system.
WARNING: The definition of an output DATA step view is an experimental feature in this release and is not intended for use in the
development of production applications.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 2415.43k
OS Memory 23208.00k
Timestamp 02/02/2025 01:41:13 AM
Step Count 159 Switch Count 2
Page Faults 0
Page Reclaims 305
Page Swaps 0
Voluntary Context Switches 11
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 264
90
91 proc sort data=work.cars out=split_sort;
92 by msrp;
93 run;
NOTE: There were 4280000 observations read from the data set WORK.CARS.
NOTE: The data set WORK.SPLIT_SORT has 4280000 observations and 15 variables.
NOTE: View WORK.SPLIT_SORT.VIEW used (Total process time):
real time 1:11.07
user cpu time 27.64 seconds
system cpu time 36.84 seconds
memory 821505.54k
OS Memory 845336.00k
Timestamp 02/02/2025 01:42:24 AM
Step Count 160 Switch Count 4280007
Page Faults 0
Page Reclaims 54606
Page Swaps 0
Voluntary Context Switches 8560583
Involuntary Context Switches 318
Block Input Operations 0
Block Output Operations 1267232
NOTE: The data set WORK.CARS_ASIA has 1580000 observations and 15 variables.
NOTE: The data set WORK.CARS_EUROPE has 1230000 observations and 15 variables.
NOTE: The data set WORK.CARS_USA has 1470000 observations and 15 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 1:11.13
user cpu time 27.64 seconds
system cpu time 36.89 seconds
memory 821505.54k
OS Memory 845336.00k
Timestamp 02/02/2025 01:42:24 AM
Step Count 160 Switch Count 4280010
Page Faults 0
Page Reclaims 54696
Page Swaps 0
Voluntary Context Switches 8560635
Involuntary Context Switches 328
Block Input Operations 0
Block Output Operations 1273920
94
95 /* option 2A: Hash table */
96 data _null_;
97 if 0 then set work.cars;
98 dcl hash h1(dataset:'work.cars', multidata:'y', ordered:'y');
99 h1.defineKey('origin');
100 h1.defineData(all:'y');
101 h1.defineDone();
102 h1.output(dataset:"cars_asia(where=(origin='Asia'))");
103 h1.output(dataset:"cars_asia(where=(origin='Europe'))");
104 h1.output(dataset:"cars_asia(where=(origin='USA'))");
105 run;
NOTE: There were 4280000 observations read from the data set WORK.CARS.
NOTE: The data set WORK.CARS_ASIA has 1580000 observations and 15 variables.
NOTE: The data set WORK.CARS_ASIA has 1230000 observations and 15 variables.
NOTE: The data set WORK.CARS_ASIA has 1470000 observations and 15 variables.
NOTE: DATA STEP stopped due to looping.
NOTE: DATA statement used (Total process time):
real time 1.77 seconds
user cpu time 1.18 seconds
system cpu time 0.58 seconds
memory 1000799.62k
OS Memory 1023376.00k
Timestamp 02/02/2025 01:42:26 AM
Step Count 161 Switch Count 9
Page Faults 0
Page Reclaims 7220
Page Swaps 0
Voluntary Context Switches 46
Involuntary Context Switches 21
Block Input Operations 0
Block Output Operations 1273120
106
107 /* option 2B: Hash table with dynamic output */
108 data _null_;
109 dcl hash h1(dataset:'work.cars(obs=0)', multidata:'y', ordered:'y');
110 h1.defineKey('origin');
111 h1.defineData(all:'y');
112 h1.defineDone();
113 dcl hash h2(ordered:'y', multidata:'n');
114 h2.defineKey('origin');
115 h2.defineDone();
116 dcl hiter hh2('h2');
117 do _i=1 to _nobs;
118 set work.cars nobs=_nobs point=_i;
119 _rc=h1.add();
120 _rc=h2.ref();
121 end;
122
123 _rc=hh2.first();
124 do while(_rc=0);
125 _rc=h1.output(dataset: cats('cars_',origin,'(where=(origin="',origin,'"))') );
126 _rc=hh2.next();
127 end;
128 stop;
129 run;
NOTE: There were 0 observations read from the data set WORK.CARS.
NOTE: The data set WORK.CARS_ASIA has 1580000 observations and 15 variables.
NOTE: The data set WORK.CARS_EUROPE has 1230000 observations and 15 variables.
NOTE: The data set WORK.CARS_USA has 1470000 observations and 15 variables.
NOTE: DATA statement used (Total process time):
real time 2.89 seconds
user cpu time 2.26 seconds
system cpu time 0.62 seconds
memory 769862.90k
OS Memory 792204.00k
Timestamp 02/02/2025 01:42:29 AM
Step Count 162 Switch Count 17
Page Faults 0
Page Reclaims 7212
Page Swaps 0
Voluntary Context Switches 71
Involuntary Context Switches 16
Block Input Operations 0
Block Output Operations 1273112
130
131
132 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
142
There aren't many use cases where splitting a table into multiple tables is the optimal design and for that reason I won't upvote the idea.
... View more