BookmarkSubscribeRSS Feed
Sean_OConnor
Fluorite | Level 6

Folks,

 

I'm currently processing very large json files in SAS, using SAS json parsing engine. For efficiency I'm doing the following.

 

1. Breaking up the very large json file into batches of 100,000

2. Using a macro running each individual json file through the SAS json parsing enginge to create a flat SAS data set.

3. Stacking all N datasets back together as I need to carry out some aggregation steps to create a usable dataset for others in my office.

 

I have two questions I'm hoping people can help me with.

 

2. In some of the json files there is address information, in others there is not. Within my macro I need to put in a clause to check if a dataset called address_addresslines is blank then to skip over a certain part and move on. How could I do this?

 

3. When I'm stacking n datasets back together I need to ensure that my character variables do not get truncated. Unfortunately, the data suppliers have not given any information in regards to the maximum length any single character variable can take. Initially, what I was doing was choosing an abritary length say $100. and stacking all datasets. I'm aware however that this can increase the size of the datasets unnecessarily. Is there some way that I could check all n of my mini SAS datasets and choose the largest length specified in each variable? 

 

I'm attaching some code for reference.

 

%macro json_map(DirectoryAndDataset= );
	/*Firstly we read in the dataset called DatasetsInADirectory and keep one record each time depending on what iteration we are in*/
	data DatasetsInADirectory_1;
		set DatasetsInADirectory;
		where DirectoryAndDataset="&DirectoryAndDataset";
	run;

	/*Use the filename function to read in the individual json file which we have broken into N batches*/
	filename json3 "\\location\loc\Migration_03\Datasets\Rev\MOD\Rec_&date\pay\&DirectoryAndDataset..json";

	/*The next three steps initialises the SAS JSON engine. real_map refers to the custom json map we create which puts the 
	parsed data into an easier to handle SAS dataset. When the json engine stops parsing the data it outputs it to the libname called json3*/
	libname json3 JSON fileref=json3;
	filename real_map "\\location\loc\Migration_03\Datasets\Rev\MODt\full_json_map_with_labels.map";
	libname json3 json fileref=json3 map=real_map;

	/* SAS is unable to directly write the parsed json files to another location. Therefore use proc copy to write the datasets
	you want into the location out_json. Exclude the dataset called alldata as it is not needed
	We are left with two datasets after this*/
	proc copy in=json3  out=out_json memtype=data;
		exclude alldata;
	run;

	/*It appears that supplying address information is not mandatory so in some of the n batches the out_json.address_addresslines will 
	be blank. Put in a clause to skip this step if there is no information;

        Need help here!!!!

	/*For the dataset out_json.address_addresslines we transpose the address information into one variable*/
	proc transpose data=out_json.address_addresslines out=addresses prefix=add_;
		by ordinal_root;
		var addressLine;
	run;

	data address;
		set addresses;
		full_address=catx(',', of add_:);
	run;

	/*Use SQL to join the two datasets called out_json.everything and work.address to create the dataset out_json.data_of_interest
	This should create a replica of what appears in the json file i.e. each payslip record in the json file will have the same record in the SAS file;
	We use Ordinal root as the linking key;*/
	proc sql;
		create table out_json.data_of_interest as select *

		from

			out_json.everything

		full join

			work.address(rename=(ordinal_root=ordinal_temp))

			on everything.ordinal_root=address.ordinal_temp;
	quit;


	/*Finally we use the proc datasets option to delete the temporary datasets address_addresslines and everything.
	We rename the dataset out_json.data_of_interest to &DirectoryAndDataset (this depends on what Json you're
	reading into) so when the macro runs through each dataset it doesn't 
	write over the previous dataset*/
	proc datasets lib=out_json nodetails nolist;
		delete address_addresslines everything;
		change data_of_interest=&DirectoryAndDataset;
	run;

	data out_json.&DirectoryAndDataset;
		length json_file $40.;
		set out_json.&DirectoryAndDataset;
		json_file="&DirectoryAndDataset";

		/*Also create a variable called json_file so you know from which record each json file comes from. This allows us to go
		back into the raw json file to check if there is an issue with the actual data or just how it was proccessed*/
	run;

%mend json_map;
4 REPLIES 4
Kurt_Bremser
Super User

Say your datasets are in LIB1, their names start with DS, and your variable is called X1, do this:

proc sql noprint;
select max(length) into :max length
from dictionary.columns
where libname = 'LIB1' and memname like 'DS%'
and upcase(name) = 'X1';
quit;

This will give you the longest defined length of a given variable in macrovar &maxlength.

Sean_OConnor
Fluorite | Level 6

Hi Kurt,

 

I'm just trying to utilize this. All my datasets are begin with the PAYE_

 

proc sql noprint;
select max(length) into :max length
from dictionary.columns
where libname = 'OUT_JSON' and memname like 'PAYE_%'
and upcase(name) = 'X1';
quit;

I'm getting the following error.

 

GOPTIONS ACCESSIBLE;
26         proc sql noprint;
27         select max(length) into :max length
                                        ______
                                        22
                                        76
ERROR 22-322: Syntax error, expecting one of the following: ',', -, FROM, SEPARATED, THROUGH, THRU, TRIMMED.  

ERROR 76-322: Syntax error, statement will be ignored.

28         from dictionary.columns
29         where libname = 'OUT_JSON' and memname like 'PAYE_%'
30         and upcase(name) = 'X1';
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.

I'm slightly confused about this. Note that I need to do this for all character variables. If I'm correct your example above just does it for the variable called 'X1'?

 

 

Kurt_Bremser
Super User

Sorry for the typo. There must not be a blank in the macro variable name, I guess the autocorrect of Safari got the better of me 😞

It has to be

proc sql noprint;
select max(length) into :maxlength
from dictionary.columns
where libname = 'OUT_JSON' and memname like 'PAYE_%'
and upcase(name) = 'X1';
quit;
Sean_OConnor
Fluorite | Level 6

I've added the following do statement to my code but I'm getting an error. Does anyone have any ideas?

 

%macro json_map(DirectoryAndDataset= );
	/*Firstly we read in the dataset called DatasetsInADirectory and keep one record each time depending on what iteration we are in*/
	data DatasetsInADirectory_1;
		set DatasetsInADirectory;
		where DirectoryAndDataset="&DirectoryAndDataset";
	run;

	/*Use the filename function to read in the individual json file which we have broken into N batches*/
	filename json3 "\\location\loc\Migration_03\Datasets\Rev\MOD\Rec_&date\pay\&DirectoryAndDataset..json";

	/*The next three steps initialises the SAS JSON engine. real_map refers to the custom json map we create which puts the 
	parsed data into an easier to handle SAS dataset. When the json engine stops parsing the data it outputs it to the libname called json3*/
	libname json3 JSON fileref=json3;
	filename real_map "\\location\loc\Migration_03\Datasets\Rev\MODt\full_json_map_with_labels.map";
	libname json3 json fileref=json3 map=real_map;

	/* SAS is unable to directly write the parsed json files to another location. Therefore use proc copy to write the datasets
	you want into the location out_json. Exclude the dataset called alldata as it is not needed
	We are left with two datasets after this*/
	proc copy in=json3  out=out_json memtype=data;
		exclude alldata;
	run;

	/*It appears that supplying address information is not mandtatory so in some of the n batches the out_json.address_addresslines will 
	be blank. Put in a clause to skip this step if there is no information*/;
	proc sql noprint;
select nobs into :nobs from dictionary.tables
where libname = "OUT_JSON" and memname = "ADDRESS_ADDRESSLINES";
quit;

%if &nobs = 0 %then %do;


	/*Also create a variable called json_file so you know from which record each json file comes from. This allows us to go
	back into the raw json file to check if there is an issue with the actual data or just how it was proccessed*/

	data out_json.&DirectoryAndDataset;
		length json_file $40;
		set out_json.everything;
		json_file="&DirectoryAndDataset";
		run;

	proc datasets lib=out_json nodetails nolist;
		delete address_addresslines everything;
	run;

	%end;

%else %do; 

		/*For the dataset out_json.address_addresslines we transpose the address information into one variable*/
		proc transpose data=out_json.address_addresslines out=addresses prefix=add_;
			by ordinal_root;
			var addressLine;
		run;

		data address;
			set addresses;
			full_address=catx(',', of add_:);
		run;

		/*Use SQL to join the two datasets called out_json.everything and work.address to create the dataset out_json.data_of_interest
		This should create a replica of what appears in the json file i.e. each payslip record in the json file will have the same record in the SAS file;
		We use Ordinal root as the linking key;*/
		proc sql;
			create table out_json.data_of_interest as select *

			from

				out_json.everything

			full join

				work.address(rename=(ordinal_root=ordinal_temp))

				on everything.ordinal_root=address.ordinal_temp;
		quit;

		/*Finally we use the proc datasets option to delete the temporary datasets address_addresslines and everything.
		We rename the dataset out_json.data_of_interest to &DirectoryAndDataset (this depends on what Json you're
		reading into) so when the macro runs through each dataset it doesn't 
		write over the previous dataset*/
		proc datasets lib=out_json nodetails nolist;
			delete address_addresslines everything;
			change data_of_interest=&DirectoryAndDataset;
		run;

		data out_json.&DirectoryAndDataset;
			length json_file $40;
			set out_json.&DirectoryAndDataset;
			json_file="&DirectoryAndDataset";

			/*Also create a variable called json_file so you know from which record each json file comes from. This allows us to go
				back into the raw json file to check if there is an issue with the actual data or just how it was proccessed*/
		run;

				%end;
%mend json_map;

 

MLOGIC(JSON_MAP):  %IF condition &nobs = 0 is TRUE
MPRINT(JSON_MAP):   data out_json.paye_86;
MPRINT(JSON_MAP):   length json_file $40;
MPRINT(JSON_MAP):   set out_json.everything;
MPRINT(JSON_MAP):   json_file="paye_86";
MPRINT(JSON_MAP):   run;

ERROR: Invalid sequence of commands for file OUT_JSON.EVERYTHING.DATA.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 632 views
  • 0 likes
  • 2 in conversation