BookmarkSubscribeRSS Feed
pratikjageera
Obsidian | Level 7

Hi All ,

 

I have a dataset of size 160 GB , I am trying to partition it on the basis of a column . 

I tried doing it through VA but , it becomes non responsive and does not yield any result.

I tried Enterprise Guide , but again the partition step took the whole day . Though the work area(where the partioned dataset was directed)  was gradually showing signs of being occupied but the datastep never ended.

 

Can there be a work around in SAS VA/EG ,like ,spliting the datasets -> partioning -> then combining...

 

Please let me know if anyone has any inputs on this.

5 REPLIES 5
pratikjageera
Obsidian | Level 7

Hi Kurt ,

 

This is the code (details modified a bit).

 

libname mylibhe sashdat path="/hps/expl/mylib" server="xyz560n.abc.xyz.net" install="/opt/sas/software/TKGrid" ;
libname mylible sasiola port=10404 tag="hps.expl.mylib" host="xyz560n.abc.xyz.net";

data myds_part;
set mylible.myds;
run;

data mylibhe.myds_part(partition=(my_col));
set work.myds_part;
run;

proc metalib;
omr (library="VA HDFS MyLib Explore" );
update_rule=(noupdate);
report;
run;

Kurt_Bremser
Super User

@pratikjageera wrote:

Hi Kurt ,

 

This is the code (details modified a bit).

 

libname mylibhe sashdat path="/hps/expl/mylib" server="xyz560n.abc.xyz.net" install="/opt/sas/software/TKGrid" ;
libname mylible sasiola port=10404 tag="hps.expl.mylib" host="xyz560n.abc.xyz.net";

data myds_part;
set mylible.myds;
run;

data mylibhe.myds_part(partition=(my_col));
set work.myds_part;
run;

proc metalib;
omr (library="VA HDFS MyLib Explore" );
update_rule=(noupdate);
report;
run;


So both mylible and mylibhe reside on the remote server?

In that case, think of 150GB divided by network bandwidth, and you have your answer.

SASKiwi
PROC Star

What is the purpose of splitting it in the first place? First off I would check how much free memory you have in the SAS VA server you are trying to load to. If you don't have at least 160GB free then partitioning it will won't reduce the space needed.

 

Also consider why you need to load such a large table. Can you reduce space by reducing character column lengths?

pratikjageera
Obsidian | Level 7
Well , partitioning is done for faster access. We have ample space on the SAS VA server.Space is not an issue.Loading such large data is the business requirement.Reducing the space by reducing char column lengths will definitely reduce the dataset size. But the question will remain unanswered that , Why does it take so long to partition a 150+ GB data, and what methos need to be applied to make sure partition happens

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Tips for filtering data sources in SAS Visual Analytics

See how to use one filter for multiple data sources by mapping your data from SAS’ Alexandria McCall.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2013 views
  • 0 likes
  • 3 in conversation