BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AK23
Obsidian | Level 7

Without purge


40           execute (
41           drop table ext_test_default_ak
42           )by hadoop;
 
HADOOP_26: Executed: on connection 2
drop table ext_test_default_ak
 
43          disconnect from hadoop;
44         quit;

 

 

With purge

 


40           execute (
41           drop table ext_test_default_ak purge
42           )by hadoop;
 
HADOOP_26: Executed: on connection 2
drop table ext_test_default_ak purge
 
43          disconnect from hadoop;
44         quit;

BrunoMueller
SAS Super FREQ

Thanks for the feedback, since you use SQL pass through, we see exactly the same statement from the trace as the one you have written.

 

What impact did the "drop table ..." without the purge option have on the HDFS file, was it deleted as well?

 

When you use the Proc DELETE data=, a DROP TABLE statement will be passed to Hive.

AK23
Obsidian | Level 7

* proc delete / proc dataset - delete / proc sql drop table - none of them are deleting the underlying hdfs file *


What I think is, there is an issue whe using proc sql -drop / proc delete / proc dataset with hive tables.

 

My theory is below.

 

According to Hive documentation.
If you drop an EXTERNAL TABLE, the Hive engine will drop the table metadata and does not delete the hdfs data.

If you drop a MANAGED TABLE, the Hive engine will drop the table metadata and deletes the hdfs data.

 

According to SAS documentation.

DBCREATE_TABLE_EXTERNAL=YES -> creates an external table—one that is stored outside of the Hive warehouse.

DBCREATE_TABLE_EXTERNAL=NO   -> creates a managed table—one that is managed within the Hive warehouse. 

Source : http://support.sas.com/documentation/cdl/en/acreldb/69580/HTML/default/viewer.htm#n0k3b8dw0vz3jxn1jj...

 

By default the DBCREATE_TABLE_EXTERNAL is NO, which means SAS will create a managed table i.e. Deleting the table should drop both metadata and deletes the hdfs data. But I think this is not the case (at least in my case), the default option is dropping the hive table structure and not the underlying hdfs file using sas procs.

 

It works using "sql pass through" using "purge" option.

 

Note : In the libname, I also have hive.warehouse.data.skipTrash to true and also tried setting the DBCREATE_TABLE_EXTERNAL=NO in the data step.

 

BrunoMueller
SAS Super FREQ

So you are saying:

  • drop table tableName purge, does delete the Hive metadata as well as the HDFS file
  • drop table tableName, does delete the Hive metadata, BUT NOT the HDFS file

correct?

 

AK23
Obsidian | Level 7

SQL - Pass through

----------------------------

Case 1 : drop table tableName purge -- deletes the Hive metadata as well as the HDFS file. (WORKS !)

Case 2 : drop table tableName -- deletes the Hive metadata but not the HDFS file. (NOT WORKING)

 

Data Step

--------------

 

Case 1 :

data hive_lib.table;

set sashelp.cars;

run;

 

1. Proc delete  -- Deletes the hive metadata but not the HDFS file.(NOT WORKING)

2. Proc dataset  delete -- Deletes the hive metadata but not the HDFS file.(NOT WORKING)

3. Proc SQL  drop table -- Deletes the hive metadata but not the HDFS file.(NOT WORKING)

 

Case 2 :

proc sql;

create table hive_lib.table as select * from sashelp.cars;

run;

 

1. Proc delete  -- Deletes the hive metadata but not the HDFS file.(NOT WORKING)

2. Proc dataset  delete -- Deletes the hive metadata but not the HDFS file.(NOT WORKING)

3. Proc SQL  drop table -- Deletes the hive metadata but not the HDFS file.(NOT WORKING)

 

Note : In all the above cases, skipTrash is set in hive library.

 

Summary : In Data step - both in case 1 and 2 - I am not able to delete the underlying HDFS file.  I can suceessfully drop hive table in hive cli / beeline so no permission issue.

BrunoMueller
SAS Super FREQ

I would talk with your Hadoop person, why the drop table tablename does not delete a managed table HDFS data.

Maybe you also need to contact SAS Tech Support for further analysis.

 

 

To work around this, use the SQL Passthrough drop table tablename PURGE as this does everything.

 

Bruno

AK23
Obsidian | Level 7
Sure Bruno, I will raise this with SAS support.

This limitation means - developers have to rewrite all the SAS jobs using sql pass through and will not be able to use data steps / proc SQL.

SQL pass through has its own limitation where analysts to remember hive commands / SAS developers cannot seamlessly migrate existing code base to use Hadoop engine or use data step / proc sql with Hive datasets.

Thanks for your help again ! Appreciate it.


ChrisNZ
Tourmaline | Level 20

SQL - Pass through

Case 2 : drop table tableName -- deletes the Hive metadata but not the HDFS file. (NOT WORKING)

 

shows a Hadoop issue. Once it's resolved all the other code should work.

AK23
Obsidian | Level 7

This issue will happen only when dropping table in an encrypted zone whcih will remove the table metadata but not its underlying data.

 

In Hive, there is an option to use purge in the drop table command.

 

In SAS, when using sql pass through, purge option can be used but what if the user uses proc sql / data step ?

 

1. Set HIVE properties (skipTrash to true) in libname but that will not make this error go away in an EZ zone.

2. Set auto.purge to true in proc sql - create table TBL PROPERTIES - Still the same outcome - not deleting the hdfs file.

 

Afaik, there is no other hive option apart from including "purge" keyword for drop table statements but how to this option in data step ?

 

ChrisNZ
Tourmaline | Level 20

So the word purge is always needed, but SAS implicit pass-through will not insert it in its queries?

 

My next guess is contact tech support see if there is an undocumented option somewhere (unlikely) and see if R&D can add either

1- Always add purge after delete (if there is no downside)

2- Create an option that purge is added when the option is turned on

 

Please update this post. If encrypted zones don't behave the same as non-encrypted zones, Hadoop users here will be interested for sure.

 

AK23
Obsidian | Level 7

Known Defect. Suggested hotfix

 

http://support.sas.com/kb/58727

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 25 replies
  • 4097 views
  • 0 likes
  • 4 in conversation