BookmarkSubscribeRSS Feed
SonyTrinesh
Calcite | Level 5

Hi All,

I have a dataset with contains million reocrds and i want to split the table into multiple tables  based on one field with out using Data step.If we can do it Sql that is very much usefull for us.

why i'm preferring sql is in data step while splititng the data it is taking so much time but my intension is to reduce the execution time and workspace also.So can some one suggest if we have any other ways to split the data into mulitiple ways in a faster way.


7 REPLIES 7
RW9
Diamond | Level 26 RW9
Diamond | Level 26

SQL will not be any quicker that datastep.  in fact, it is likely to be slower within SAS.  I would suggest you start by posting your datastep code.  There maybe ways to reduce the processing if we see your code.  For instance have you tried, something like:

proc sql;

     create table LOOP as

     select     distinct THE_SPLIT_VAR

     from        HAVE;

quit;

data _null_;

     set loop;

     call execute('data '||strip(the_split_var)||'; set have (where=(the_split_var="'||strip(the_strip_var)||'")); run;');

run;

The above will create a small datastep for each distinct split, with a small where on the read in.

Reeza
Super User

1 million records should be nothing for SAS to process and you shouldn't need to split it up.

That being said, a hash method is usually the fastest to run.

See samples of codes of various methods here:

Split Data into Subsets - sasCommunity

If your data is sorted a call execute is another good way, but you should post your code that you've tried so far.

SonyTrinesh
Calcite | Level 5

Below is the code i've used for spliting the table into multiple tables.

DATA ONE;

INPUT ID $ ;

CARDS;

1

2

3

4

;

RUN;

DATA TWO THREE FOUR;

SET ONE;

IF ID='1' THEN OUTPUT WORK.TWO;

IF ID='2' THEN OUTPUT WORK.THREE;

IF ID='3' THEN OUTPUT WORK.FOUR;

RUN;

Can some one suggest faster way to split the tables in Data step or  in Sql.

Patrick
Opal | Level 21

That's basically how you would do it and nothing will be faster - except you would want to use "ELSE" as well and if you want to take it to the "extreme" you would order your "if.. then... else.." statements along "probability" having the most likely cases first.

...but as others already wrote: A million records is "nothing" so splitting up the data set feels like the wrong thing to do.

ballardw
Super User

I don't know if data set options would run faster than if then else

DATA TWO (where=(id='1')) THREE (where=(id='2')) FOUR (where=(id='3'));

SET ONE;

RUN;

Or a slightly less verbose way to get the equivalent of if then else which gets better looking if doing 10 statements

DATA TWO THREE FOUR;

SET ONE;

select (id);

when('1') OUTPUT TWO;

when('2') OUTPUT THREE;

when('3') output  FOUR;

otherwise Put "Unexpected value for ID"  id;

end;

RUN;

Demarez
Calcite | Level 5
What if my variable that I want to distinct is a date ? Is there any specific program for that ?
Astounding
PROC Star
One additional step that sometimes helps: fix the data. You will have to check whether this applies here.

Depending on how the original data was created, it's possible that character variables use much more space than they should. For example, perhaps a variable named GENDER should be "F" or "M" but is actually defined as $200 characters long. That's a problem. Moving around data that is 200 times bigger than it needs to be takes significantly longer.

Compressing the data can reclaim the storage space, but actually takes longer to process. Each time you use the data it needs to be uncompressed, adding to the processing time. Also worth noting: even if the original data set is compressed, that doesn't mean that the subsets will be compressed.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 13315 views
  • 3 likes
  • 7 in conversation