BookmarkSubscribeRSS Feed
pratikjageera
Obsidian | Level 7

Hi All ,

 

I have a dataset of size 160 GB , I am trying to partition it on the basis of a column . 

I tried doing it through VA but , it becomes non responsive and does not yield any result.

I tried Enterprise Guide , but again the partition step took the whole day . Though the work area(where the partioned dataset was directed)  was gradually showing signs of being occupied but the datastep never ended.

 

Can there be a work around in SAS VA/EG ,like ,spliting the datasets -> partioning -> then combining...

 

Please let me know if anyone has any inputs on this.

5 REPLIES 5
pratikjageera
Obsidian | Level 7

Hi Kurt ,

 

This is the code (details modified a bit).

 

libname mylibhe sashdat path="/hps/expl/mylib" server="xyz560n.abc.xyz.net" install="/opt/sas/software/TKGrid" ;
libname mylible sasiola port=10404 tag="hps.expl.mylib" host="xyz560n.abc.xyz.net";

data myds_part;
set mylible.myds;
run;

data mylibhe.myds_part(partition=(my_col));
set work.myds_part;
run;

proc metalib;
omr (library="VA HDFS MyLib Explore" );
update_rule=(noupdate);
report;
run;

Kurt_Bremser
Super User

@pratikjageera wrote:

Hi Kurt ,

 

This is the code (details modified a bit).

 

libname mylibhe sashdat path="/hps/expl/mylib" server="xyz560n.abc.xyz.net" install="/opt/sas/software/TKGrid" ;
libname mylible sasiola port=10404 tag="hps.expl.mylib" host="xyz560n.abc.xyz.net";

data myds_part;
set mylible.myds;
run;

data mylibhe.myds_part(partition=(my_col));
set work.myds_part;
run;

proc metalib;
omr (library="VA HDFS MyLib Explore" );
update_rule=(noupdate);
report;
run;


So both mylible and mylibhe reside on the remote server?

In that case, think of 150GB divided by network bandwidth, and you have your answer.

SASKiwi
PROC Star

What is the purpose of splitting it in the first place? First off I would check how much free memory you have in the SAS VA server you are trying to load to. If you don't have at least 160GB free then partitioning it will won't reduce the space needed.

 

Also consider why you need to load such a large table. Can you reduce space by reducing character column lengths?

pratikjageera
Obsidian | Level 7
Well , partitioning is done for faster access. We have ample space on the SAS VA server.Space is not an issue.Loading such large data is the business requirement.Reducing the space by reducing char column lengths will definitely reduce the dataset size. But the question will remain unanswered that , Why does it take so long to partition a 150+ GB data, and what methos need to be applied to make sure partition happens

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Tips for filtering data sources in SAS Visual Analytics

See how to use one filter for multiple data sources by mapping your data from SAS’ Alexandria McCall.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1951 views
  • 0 likes
  • 3 in conversation