BookmarkSubscribeRSS Feed
jrsousa2
Obsidian | Level 7

I did a google search to see if it was possible for SAS to read XML files and was really happy to learn that it is.

However, I have a complicator.

My XML file is compressed in a gz file, which is only 340Mb (there's a single file in this gz, which is the XML).

However, uncompressed it's more than 2GB.

 

So, the question I have is, is it possible to read the gz, uncompress it and then read the XML out of the uncompressed file?

Is it possible to read the XML straight from the gz? That would be even better!

I plan to do this from EG on a remote server, that's why I need to work with the gz file, instead of the XML file, which is very big.

 

PS I clicked on the topic and for an instant I thought it would've erased my text, thank God it didn't lol.

9 REPLIES 9
ChrisHemedinger
Community Manager

You will need the available 2GB of space to hold the uncompressed file -- no getting around that.  

 

If you intend to read the XML file using the XMLv2 libname engine, then you'll definitely need to extract the complete file to a location where you can point the engine path directly to the plain XML file.  I assume you saw my example in this post.

 

If using the GZIP method on FILENAME ZIP, you need SAS 9.4 Maint 5 -- that's where this capability was added.  I think that the SAS OnDemand environment is at least at that level -- you indicated in a another post that you're working there.  However, the 2GB of scratch space you need might not be available to your account.  The SAS OnDemand for Academics environment is not designed for such large data and account usage -- it's primarily an environment for learning and research.  SAS University Edition might work though -- that's a version you install and file space isn't such a constraint.

Check out SAS Innovate on-demand content! Watch the main stage sessions, keynotes, and over 20 technical breakout sessions!
jrsousa2
Obsidian | Level 7

Thank you Chris.

 

I just ran a test on a small XML file generated from a playlist in itunes (which usually stores info such as file name, path, artist, album artist, title, year, rating, cover art flag, number of plays, etc.)

 

However, what I'm seeing after importing this is non-sense, it gave me a lot of individual tables, as opposed to the single table I was hoping for. 

How do I get this XML import to create a useful table, instead of a bunch of non-sensical files?

Have to say I got really disappointed with this output.

 

Here the code I used:

filename Youtube '/home/jrsousa2/my_content/iTunes.xml';
filename SXLEMAP '/home/jrsousa2/my_content/iTunes.map';
libname Youtube xml xmlmap=SXLEMAP access=READONLY;

 

proc copy in=Youtube out=work;
run;

 

Below the tables I got from importing this test file:

 

#NameMember TypeFile SizeLast Modified
1ARRAYDATA256KB08/25/2019 20:14:29
2ARRAY1DATA256KB08/25/2019 20:14:36
3DATEDATA3MB08/25/2019 20:14:48
4DICTDATA256KB08/25/2019 20:14:56
5DICT1DATA256KB08/25/2019 20:15:03
6DICT2DATA1MB08/25/2019 20:15:12
7DICT3DATA256KB08/25/2019 20:15:20
8DICT4DATA28MB08/25/2019 20:15:39
9INTEGERDATA256KB08/25/2019 20:15:46
10INTEGER1DATA11MB08/25/2019 20:16:14
11KEYDATA256KB08/25/2019 20:16:21
12KEY1DATA1MB08/25/2019 20:16:31
13KEY2DATA22MB08/25/2019 20:17:22
14KEY3DATA256KB08/25/2019 20:17:29
15PLISTDATA256KB08/25/2019 20:17:36
16REGSTRYITEMSTOR32KB08/25/2019 17:50:11
17SASGOPTCATALOG12KB08/25/2019 18:17:11
18SASMAC1CATALOG96KB08/25/2019 17:50:30
19SASMAC4CATALOG20KB08/25/2019 20:46:03
20STRINGDATA256KB08/25/2019 20:17:44
21STRING1DATA9MB08/25/2019 20:18:08
22STRING2DATA256KB08/25/2019 20:18:15
23TRUEDATA1MB08/25/2019 20:18:24



Patrick
Opal | Level 21

@jrsousa2 

A single XML file can describe and hold data for a whole bunch of tables. Looks like that's the case with your iTunes file and the SAS xml engine did actually quite a good job in extracting all these tables at once.

Btw: I'd be using the XMLv2 engine instead of the XML engine. It's newer.

 

jrsousa2
Obsidian | Level 7

Mmm, not really.

 

What I'm trying to do as I said is not to see the metadata of the dimensions, but to build a single useful\meaningful table.

 

I'm using the SAS XML Mapper tool.

 

It seems I need to create a map for this purpose. Would you know a good reference or guide for using the XML mapper to extract data from the XML?

jrsousa2
Obsidian | Level 7

Ok, I just watched a video tutorial from SAS, and noticed that unlike in their example, my iTunes XML doesn't seem to display

meaningful values for each one of the separate tables (most of the tables have numbers, not the strings that I would expect.)

 

Have a look at the values showing in Full view, in the next picture (condensed view), and to the right (auto generated), the tables

extracted from the XML don't show any of the alpha values I would expect (what a bummer!, learning new things always have these

disappointments, it's never easy at first.)

 

What is the issue here?If any of you experts could solve this mystery for me, I really appreciate it.

 

Full viewFull viewMost tables have numeric fields, not the values I would expect from the full view.Most tables have numeric fields, not the values I would expect from the full view.

ChrisHemedinger
Community Manager

The XML representation, as you discovered, is not a single table.  Instead -- in the best case -- it's a collection of normalized tables that you can then combine/join using common keys to create the table you need.

 

I have a simple example here.  You can use the XML Mapper application to create a map to tell SAS how to read/assemble the data, but that might not be enough.  When you have some parts of the data that contain 1:Many relationships, you will need to post-process the tables to combine as needed.  Same is true when you have levels of indirection (field A contains an ID that's a lookup value for a key that then relates to field B in another table).  Such relationships can be resolved in a multitable SQL join, but not the XML Map.

Check out SAS Innovate on-demand content! Watch the main stage sessions, keynotes, and over 20 technical breakout sessions!
jrsousa2
Obsidian | Level 7

I already know all of that. 

 

What I'm asking now is why I'm not seeing any of the alpha values that are supposed to be contained in the data, only numbers. Where did all the alpha values go when SAS created the relational tables?

 

Have a look at the pics that I attached (the first pic has the alpha, the 2nd has mostly numbers, making it impossible for me to obtain the data I want.).

Do you know how to solve this issue?

ChrisHemedinger
Community Manager

Are you wondering why the automap doesn't have the values you want? 

 

Usually, I don't rely on the automap for XML.  I build the output table I want by starting from scratch in the XML Mapper app.  It looks like you do have some string values there (dict->string).  But I'd need to see the XML file myself to know for sure.

 

It's also possible that this particular XML isn't well structured for transformation to a table.  I've encountered similar files that require two passes: one to read the data into records in SAS, and a second to examine each record and its metadata and convert to values I want to store in a table.

 

Check out SAS Innovate on-demand content! Watch the main stage sessions, keynotes, and over 20 technical breakout sessions!
jrsousa2
Obsidian | Level 7

The pics I attached are from the XML Mapper, I start with the automap to view the tables, and then I create my own map with the fields

that interest me.

 

Then I erase the automap and build my own "schema" or map, so that SAS takes care of the joins, and brings in only the fields that I want to select.

 

I'm not sure how to recover the various alpha values here though. Btw, for this case, if it's even possible, do I have to use the full view or the condensed view to drag and drop the fields I need?

 

I don't think the condensed view is useful here since it's only giving me numeric values and the alphas are all gone. But oddly, when I drag fields from the full view to create a customized map, it seems all of the fields are resulting in the location of the file.

 

Not sure if this XML is convertible.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 3396 views
  • 2 likes
  • 3 in conversation