Hi All,
I have a dataset with contains million reocrds and i want to split the table into multiple tables based on one field with out using Data step.If we can do it Sql that is very much usefull for us.
why i'm preferring sql is in data step while splititng the data it is taking so much time but my intension is to reduce the execution time and workspace also.So can some one suggest if we have any other ways to split the data into mulitiple ways in a faster way.
SQL will not be any quicker that datastep. in fact, it is likely to be slower within SAS. I would suggest you start by posting your datastep code. There maybe ways to reduce the processing if we see your code. For instance have you tried, something like:
proc sql;
create table LOOP as
select distinct THE_SPLIT_VAR
from HAVE;
quit;
data _null_;
set loop;
call execute('data '||strip(the_split_var)||'; set have (where=(the_split_var="'||strip(the_strip_var)||'")); run;');
run;
The above will create a small datastep for each distinct split, with a small where on the read in.
1 million records should be nothing for SAS to process and you shouldn't need to split it up.
That being said, a hash method is usually the fastest to run.
See samples of codes of various methods here:
Split Data into Subsets - sasCommunity
If your data is sorted a call execute is another good way, but you should post your code that you've tried so far.
Below is the code i've used for spliting the table into multiple tables.
DATA ONE;
INPUT ID $ ;
CARDS;
1
2
3
4
;
RUN;
DATA TWO THREE FOUR;
SET ONE;
IF ID='1' THEN OUTPUT WORK.TWO;
IF ID='2' THEN OUTPUT WORK.THREE;
IF ID='3' THEN OUTPUT WORK.FOUR;
RUN;
Can some one suggest faster way to split the tables in Data step or in Sql.
That's basically how you would do it and nothing will be faster - except you would want to use "ELSE" as well and if you want to take it to the "extreme" you would order your "if.. then... else.." statements along "probability" having the most likely cases first.
...but as others already wrote: A million records is "nothing" so splitting up the data set feels like the wrong thing to do.
I don't know if data set options would run faster than if then else
DATA TWO (where=(id='1')) THREE (where=(id='2')) FOUR (where=(id='3'));
SET ONE;
RUN;
Or a slightly less verbose way to get the equivalent of if then else which gets better looking if doing 10 statements
DATA TWO THREE FOUR;
SET ONE;
select (id);
when('1') OUTPUT TWO;
when('2') OUTPUT THREE;
when('3') output FOUR;
otherwise Put "Unexpected value for ID" id;
end;
RUN;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.