07-27-2015 09:17 AM
I have a dataset with contains million reocrds and i want to split the table into multiple tables based on one field with out using Data step.If we can do it Sql that is very much usefull for us.
why i'm preferring sql is in data step while splititng the data it is taking so much time but my intension is to reduce the execution time and workspace also.So can some one suggest if we have any other ways to split the data into mulitiple ways in a faster way.
07-27-2015 09:55 AM
SQL will not be any quicker that datastep. in fact, it is likely to be slower within SAS. I would suggest you start by posting your datastep code. There maybe ways to reduce the processing if we see your code. For instance have you tried, something like:
create table LOOP as
select distinct THE_SPLIT_VAR
call execute('data '||strip(the_split_var)||'; set have (where=(the_split_var="'||strip(the_strip_var)||'")); run;');
The above will create a small datastep for each distinct split, with a small where on the read in.
07-27-2015 10:51 AM
1 million records should be nothing for SAS to process and you shouldn't need to split it up.
That being said, a hash method is usually the fastest to run.
See samples of codes of various methods here:
If your data is sorted a call execute is another good way, but you should post your code that you've tried so far.
07-28-2015 07:15 AM
Below is the code i've used for spliting the table into multiple tables.
INPUT ID $ ;
DATA TWO THREE FOUR;
IF ID='1' THEN OUTPUT WORK.TWO;
IF ID='2' THEN OUTPUT WORK.THREE;
IF ID='3' THEN OUTPUT WORK.FOUR;
Can some one suggest faster way to split the tables in Data step or in Sql.
07-28-2015 07:38 AM
That's basically how you would do it and nothing will be faster - except you would want to use "ELSE" as well and if you want to take it to the "extreme" you would order your "if.. then... else.." statements along "probability" having the most likely cases first.
...but as others already wrote: A million records is "nothing" so splitting up the data set feels like the wrong thing to do.
07-28-2015 12:03 PM
I don't know if data set options would run faster than if then else
DATA TWO (where=(id='1')) THREE (where=(id='2')) FOUR (where=(id='3'));
Or a slightly less verbose way to get the equivalent of if then else which gets better looking if doing 10 statements
DATA TWO THREE FOUR;
when('1') OUTPUT TWO;
when('2') OUTPUT THREE;
when('3') output FOUR;
otherwise Put "Unexpected value for ID" id;