BookmarkSubscribeRSS Feed
morglum
Quartz | Level 8

Hi everyone,

I would like to download the entire table by mother tongue, age, sex and dissemination area found on this page.   I am not sure how to import the resulting data into a SAS database though.

I tried using

libname myxml xml "filepath/filename.xml", but SAS crashed after a little more than an hour.

Any idea?

thanks,

20 REPLIES 20
jakarman
Barite | Level 11

The links do not work, probably because they are having your session/user in it.
Anyway there is not information you wanted to show us that is needing for any ideas....

---->-- ja karman --<-----
morglum
Quartz | Level 8

Interesting - I tried the links on a different computer and they worked in IE, Firefox and Chrome.  They also worked on a third computer, an iPad.  Looks like your environment may be too much secure Smiley Wink

Seriously though, this is very interesting and free data.  I would really like to get it into SAS, but can't seem to get it to work under SAS 9.2.  Any help (or idea) would be greatly appreciated.

jakarman
Barite | Level 11

It was a script blocker active or some old version checking, indeed old version chrome is giving the data.
Have downloaded that and seeing after unzipping two files. One named generic_98 (15Gb) and one named Structure_98 (47Mb).

There are many XML-file conventions and for that to read a xml-file you sometimes need to use a mapfile (structure) the XMLmapper is found at SAS (Atlas)

at SAS Support Downloads Browse.  When the structure file is the same as the mapper it must be.

The XML users guide is mentioning this.  SAS(R) 9.4 XML LIBNAME Engine: User's Guide  

As far I can see the structure-xml file is not a mapper file but real data.

---->-- ja karman --<-----
morglum
Quartz | Level 8

Thanks for the link, I'll see what I can come up with.  cheers!

Reeza
Super User

If you download the IVT you can then use Beyond20/20, which is free, to export to CSV.

morglum
Quartz | Level 8

Reeza, that's clever, I didnt think about that.  good thinking, I'll do just that.

Thanks Jaap for looking into this.  I've fired an email to statscan, if I get an answer I'll post it here.

cheers all.

S.

morglum
Quartz | Level 8

I tried exporting the IVT to CSV but ran into a limit of Beyond 2020 : only the first 18 733 lines have been exported (about a third)..

ballardw
Super User

You really expect us to download a 166MB file at that first link??

The second link has download options, I typically would select the CSV format and start with Proc Import on that.

morglum
Quartz | Level 8

Hi Ballard,

The 16 KB CSV file doesnt contain the geography I want (dissemination block), only the aggregated  data at the national level.

I'm surprised that 150 MB looks so huge, it's only a couple minutes download on my admittedly slow DSL internet. thanks anyway!

jakarman
Barite | Level 11

the structure described at http://sdmx.org/ is describing a rather a specific XML structure used by census and governments. I tried the XMLmapper but he doesn't like the sizing of the XMLfiles. Would go for another approach (IVT) unless you want to be the first one solving this interface.

---->-- ja karman --<-----
tebillings
New User | Level 1

The SAS XML Mapper cannot import very large files. Somewhere between 30 and 50 meg it stops working (9.4 version). 

IF your XML file is setup in a way that a subset of the file will include all or most of the tags:  <whatever> and closing tags </whatever>.  I was able to make a 30 meg text extract (from a 500 meg xml file)  - the 1st 30 meg of lines, paying attention to get full tag sets - and closed it with a single extra hand-typed </tag>.  Presto, the XML Mapper could parse it and read it and produce the xmlmap.  I then used that map to read the entire 500 meg of xml.  A few character fields are too short but it is easy to modify the xmlmap and rerun to fix that.

I'm not thrilled with the xmlmap - it introduced ordinals that may be unnecessary, but I'm relieved that I could read 500 meg of what will eventually be 9-10 Gig of input XML files. Now I can take the files and reconstruct the desired data.

However, if the XML file structure in your huge files cannot be reduced to a small extract, you can't read it in the XML Mapper.

An alternative:  try the XML handling features of the Groovy language, available via PROC GROOVY (yes, it's a real language and a real SAS PROC).

morglum
Quartz | Level 8

Hi Tebillings,

Thanks for writing this up!   I managed to export everything from beyond 20/20 this time around, but next time I would be very curious to learn about a language with such a funny name.. Smiley Happy

Vince28_Statcan
Quartz | Level 8

Hi Morglum,

I am glad to see non governmental users tap into our dissemination data. I am not posting on behalf of Statscan but myself only. Fereeza kindly pointed out the issue via e-mail. Our dissemination formats for complete files are indeed not the simplest to load directly into SAS. The beyond20/20 option is a good alternative for users that may not have the corporate IT infrastructure to use external XML tools and expertise to handle SDMX.

However, from personal experience on a project a few months ago where I was waiting for a database access and ran out of patience and extracted a SDMX which contained the specific information I needed at that point in time, I can say that it is feasible to import the SDMX files directly into SAS. Our dissemination website provides links to SDMX documentation but the basic gist is that the STRUCTURE file is metadata about the main file namely in terms of codesets definitions.

The SAS xml engine uses sequential access and thus shouldn't have any issue handling large files. As a matter of fact, I modified slightly my .map file from a few months back and extracted the ~35M records from the data file into SAS via only the XML engine before coming here to comment. SDMX however does not comply to the GENERIC xml in the xml libname engine documentation hence why you specifically need to build a custom XMLMAP for the job and the libname statement in your original post did not work.

A good place to get started in a scenario like this is to read the XML file as a text file via INFILE statement, put that into a large enough variable and limit yourself with OBS= infile option (I use between 100-1000). Then in another datastep output that into a different file to check the xml structure without crashing windows from out of memory exception because most default xml viewers would load the entire 15gb file into memory. Use a simple text viewer to see the structure and build your custom XMLMAP with it. There are different ways to map the file and then transform it into what you wish to obtain.

If you have tried the XML map option at some point but that it failed and that failure wasn't caused by a faulty .map file, it might be because of the XML file encoding or because you did not expand logical record length (LRECL= option) for the file and the XML structure broke down from truncation.

Proc groovy is a layer to call java from within SAS. There probably exists a free java class online that reads SDMX - I wouldn't know where to begin to look. The data step component object Java Object could've been used in a similar way but again that requires some external java expertise.

I am glad to see you managed to import the data. I just came by to share that it is possible to import into SAS via custom xmlmap with some effort.

Vincent

ptimusk
Obsidian | Level 7

I went to the web site for SDMX standards and downloaded all their scheme and created an XML map for each scheme using  a method of automapping suggested by user @BrunoMuellerin this question's thread

To import XML file (Hierarchical structured) into SAS

 

I also use sample 41880 to find all the xsd files in the folder.

http://support.sas.com/kb/41/880.html

 

Here is my program that finds all xsd files and makes maps of them. Unfinished: I am not sure what map to use for what XML dataset yet.

 

.


filename DIRLIST pipe 'dir "C:\Users\ptimusk\Documents\SDMX_ML\SDMX_2-1-1_SECTION_3B_SDMX_ML_Schemas_Samples_201108\schemas\*.xsd" ';               

 
data dirlist ;                                               
infile dirlist lrecl=200 truncover;                          
input line $200.;                                            
if input(substr(line,1,10), ?? mmddyy10.) = . then delete;   
length file_name $ 100;                                      
file_name=scan(line,-1," ");                    
keep file_name;                                                                               
   run;                                                          
data _null_;                                                 
set dirlist end=end;                                         
count+1;                                                     
call symput('read'||left(count),left(trim(file_name)));      
call symput('xsdname'||left(count),substr(file_name,1,length(file_name)-4));     
if end then call symput('max',count);                        
run;                                                         
 %put _user_;
 
options mprint symbolgen;                                    
%macro readinschemesmakemaps();                                               
%do i=1 %to &max;                                            
                  



filename myxml "C:\Users\ptimusk\Documents\SDMX_ML\SDMX_2-1-1_SECTION_3B_SDMX_ML_Schemas_Samples_201108\schemas\&&read&i..";
filename mymap "C:\Users\ptimusk\Documents\SDMX_ML\SDMX_2-1-1_SECTION_3B_SDMX_ML_Schemas_Samples_201108\schemas\&&xsdname&i...map";

libname myxml xmlv2  automap=replace xmlmap=mymap;
proc contents data=myxml._all_;
run;
%end;
%mend readinschemesmakemaps();
%readinschemesmakemaps();

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 3318 views
  • 10 likes
  • 9 in conversation