Exploring, predicting and reporting with SAS Visual Analytics and SAS Visual Statistics

Partitioning Large Dataset in SAS VA

Reply
Contributor
Posts: 24

Partitioning Large Dataset in SAS VA

[ Edited ]

Hi All ,

 

I have a dataset of size 160 GB , I am trying to partition it on the basis of a column . 

I tried doing it through VA but , it becomes non responsive and does not yield any result.

I tried Enterprise Guide , but again the partition step took the whole day . Though the work area(where the partioned dataset was directed)  was gradually showing signs of being occupied but the datastep never ended.

 

Can there be a work around in SAS VA/EG ,like ,spliting the datasets -> partioning -> then combining...

 

Please let me know if anyone has any inputs on this.

Super User
Posts: 7,371

Re: Partitioning Large Dataset in SAS VA

If you did it with EG, could you please show us the code?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 24

Re: Partitioning Large Dataset in SAS VA

Hi Kurt ,

 

This is the code (details modified a bit).

 

libname mylibhe sashdat path="/hps/expl/mylib" server="xyz560n.abc.xyz.net" install="/opt/sas/software/TKGrid" ;
libname mylible sasiola port=10404 tag="hps.expl.mylib" host="xyz560n.abc.xyz.net";

data myds_part;
set mylible.myds;
run;

data mylibhe.myds_part(partition=(my_col));
set work.myds_part;
run;

proc metalib;
omr (library="VA HDFS MyLib Explore" );
update_rule=(noupdate);
report;
run;

Super User
Posts: 7,371

Re: Partitioning Large Dataset in SAS VA


pratikjageera wrote:

Hi Kurt ,

 

This is the code (details modified a bit).

 

libname mylibhe sashdat path="/hps/expl/mylib" server="xyz560n.abc.xyz.net" install="/opt/sas/software/TKGrid" ;
libname mylible sasiola port=10404 tag="hps.expl.mylib" host="xyz560n.abc.xyz.net";

data myds_part;
set mylible.myds;
run;

data mylibhe.myds_part(partition=(my_col));
set work.myds_part;
run;

proc metalib;
omr (library="VA HDFS MyLib Explore" );
update_rule=(noupdate);
report;
run;


So both mylible and mylibhe reside on the remote server?

In that case, think of 150GB divided by network bandwidth, and you have your answer.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 3,233

Re: Partitioning Large Dataset in SAS VA

What is the purpose of splitting it in the first place? First off I would check how much free memory you have in the SAS VA server you are trying to load to. If you don't have at least 160GB free then partitioning it will won't reduce the space needed.

 

Also consider why you need to load such a large table. Can you reduce space by reducing character column lengths?

Contributor
Posts: 24

Re: Partitioning Large Dataset in SAS VA

Well , partitioning is done for faster access. We have ample space on the SAS VA server.Space is not an issue.Loading such large data is the business requirement.Reducing the space by reducing char column lengths will definitely reduce the dataset size. But the question will remain unanswered that , Why does it take so long to partition a 150+ GB data, and what methos need to be applied to make sure partition happens
Ask a Question
Discussion stats
  • 5 replies
  • 446 views
  • 0 likes
  • 3 in conversation