Dear Team,
I want to split my DataFlux data into chunks, For example, How to split the 60 million rows into 4,15 million row tables in Data Management Studio ?
Thank You
Shakti
Hey Shakti!
Have you tried using the Data Validation node? You can use that to filter your data. For example, you could filter on an expression like Profit > 1000. Let me know if that's the sort of thing you're trying to do.
It depends if the split should be done randomly or according to the order rows are read. If this is the later, then add a sequencer node for numbering each row, and next an expression with something like
integer mygroup
integer groups
groups = 4
mygroup = mysequencer % groups
Data splitting is when data is divided into two or more subsets. Typically, with a two-part split, one part is used to evaluate or test the data and the other to train the model.
Data splitting is an important aspect of data science, particularly for creating models based on data. This technique helps ensure the creation of data models and processes that use data models -- such as machine learning -- are accurate.
This may help you,
Rachel Gomez
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.