BookmarkSubscribeRSS Feed
mmm7
Calcite | Level 5

I utilize an API to pull a json file from an application. I then use the json engine and map syntax to read in the json file and create a SAS dataset. However, the data from the json file are being all mapped as character data which is not ideal. Is there any way to adjust the syntax to read in numeric data columns as numeric from the json file?

 

%let Path = ;

*** Project- and user-specific token***;
%let mytoken = ;

filename my_in "&Path\API\api_parameter_json.txt";
*** .CSV output file to contain the exported data ***;
filename my_out "&Path\Raw Data\redcap_data_&sysdate9..json"; 

*** Output file to contain PROC HTTP status information returned from
REDCap API (this is optional) ***;
filename status "&Path\API\redcap_status_&sysdate9..txt";

 /**********************************************************
Step 2. Request all observations (CONTENT=RECORDS) with one
row per record (TYPE=FLAT).
******************************************************/
*** Create the text file to hold the API parameters. ***;
data _null_;
	file my_in;
	put "%NRStr(content=record&type=flat&format=json&token=)&mytoken";
run;
*/
*** PROC HTTP call. Everything except HEADEROUT= is required. ***;
proc http
	in= my_in
 	out= my_out
 	headerout = status
 	url ="" 
 	method="post";
run; 

*define file to contain a map of JSON file from REDCap;
filename map "&Path\API\sb_map_&sysdate9..";
*create a JSON library from special folder...JSON Engine so SAS knows how to read;
libname my_out json map=map automap=reuse; 
*construct SAS;
data RC_Data;
	set my_out.root;
run;

 

9 REPLIES 9
Tom
Super User Tom
Super User

Once you have the MAP file then edit it to change the way the variable is defined.  To get help provide an example of the JSON file and the MAP file that SAS generated for it.

 

But if the JSON engine determined the variable is being used as character in the JSON file why do you want to read it as a number?  You can always just convert it once you have it as a SAS dataset.

data want;
  set have;
  new_number = input(old_char,32.);
run;
ChrisHemedinger
Community Manager

I have done this both ways: created a JSON map to tell the engine how to read the data, and also post-process the data to convert the fields as needed. Often I need to post-process the data anyway so it's just as easy to do it there as @Tom suggests. But if it's going to be hands-off after you read the data, try a map. You can specify the type, informat, format, and length all at once.  Example in this article.

Check out SAS Innovate on-demand content! Watch the main stage sessions, keynotes, and over 20 technical breakout sessions!
mmm7
Calcite | Level 5

@Tom @ChrisHemedinger Thanks for you responses! I have previously implemented both of your suggestions. However, I manage numerous projects and it's not feasible to spend a vast amount of time editing the map or using the input function on projects that contain several hundred data variables.

 

I did explore using the API to pull a csv data file. However, most projects contain text data with carriage returns or line feeds. I found some system commands to attempt to the remove the CR and LF from the csv prior to import. But I was losing observations and stopped there from a troubleshooting perspective. 

 

Tom
Super User Tom
Super User

Depending on how the CSV files are created it is not hard to automate fixeing the embedded CR or LF issue.  See any of the 10,000 times I have answered that question on this forum.

 

But that does not really solve your problem.  A CSV file is just a text file, there is nothing there to tell you whether the field should be numeric or character and if character the proper maximum length to use. 

 

So sounds like you need some source that describes the intended type, length, label, format etc for the fields in your source files (JSON or CSV).  Once you have such metadata you will have something to use to automate creating the dataset with the structure you want.

mmm7
Calcite | Level 5

@Tom I have referenced your post on that topic. When I run the code, after 500 columns or so, the variable names are removed from the first row (the data appear to be intact). Which is problematic when you have a dataset containing 2,000 columns. I know, not ideal but it's my reality. I would expect 1,181 data rows in both csv files (which includes the first row of variable names). But that doesn't seem to match the log. 

 

filename old "&Path\Raw Data\old_file1.csv";
filename new "&Path\Raw Data\new_file.csv";
data _null_ ;
  if eof then put 'NOTE: Records read=' newn 'Records with missing quotes=' missq ;
  infile old lrecl=10000 end=eof ;
  file new lrecl=10000;
  nq=0;
  do until (mod(nq,2)=0 or eof );
     input;
     newn+1;
     nq = nq + countc(_infile_,'"');
     put _infile_ @;
     if mod(nq,2) then do;
       missq+1;
       put '|' @;
     end;
  end;
  put;
run;

mmm7_0-1629909688321.png

 

Tom
Super User Tom
Super User

Why did you only keep the first 10,000 bytes of each line?
Change that to 2 million or 10 million (if your system will allow) instead.

 

If you need longer lines than that you will probably have trouble reading in the resulting CSV file but you should be able to adapt the data step to process the data character by character instead and keep track of the "real" end of lines on its own instead of letting SAS determine that what sets of records constitute a "line".

mmm7
Calcite | Level 5

@Tom Unfortunately, my system allows a max byte size of 32767. 🙄 Do you have any syntax you can share that can reach into a csv and delete specified columns? 

Tom
Super User Tom
Super User

@mmm7 wrote:

@Tom Unfortunately, my system allows a max byte size of 32767. 🙄 Do you have any syntax you can share that can reach into a csv and delete specified columns? 


The maximum length that the INFILE and FILE statements support is much larger than 32,767.  But the maximum length for a character variable is 32,767.  If your INPUT lines are longer than that then you cannot use the _INFILE_ automatic variable in the version you posted.  So you will need to process the file byte by byte instead. 

 

So perhaps something like this that remove CRs that are inside quotes and replace LFs that are inside quotes with |.

data _null_ ;
  if eof then put 'NOTE: Number of Linefeeds replaced =' linefeeds '. Number of Carriage Returns removed =' crs;
  infile old lrecl=1 recfm=f end=eof;
  file new recfm=n;
  input ch $char1. @ ;
  nq + ch='"';
  nq = mod(nq,2);
  if ch='0A'x and nq then do;
    linefeeds+1;
    ch='|';
  end;
  if nq and ch='0D'x then crs+1;
  else put ch $char1.;
run;

Note also that if your header row is larger than 32,767 bytes then it can cause trouble for PROC IMPORT.  But it is possible to read a CSV file without PROC IMPORT since it is just a text file.  For example you could read it all into a TALL dataset with ROW # , COLumn # and cell VALUE with something like this.

data tall;
  infile new firstobs=2 length=ll column=cc truncover dsd ;
  row+1;
  do col=1 by 1 until(cc>ll);
     input value :$32767. @;
     output;
  end;
run;
ballardw
Super User

@mmm7 wrote:

@Tom @ChrisHemedinger Thanks for you responses! I have previously implemented both of your suggestions. However, I manage numerous projects and it's not feasible to spend a vast amount of time editing the map or using the input function on projects that contain several hundred data variables.

 

I did explore using the API to pull a csv data file. However, most projects contain text data with carriage returns or line feeds. I found some system commands to attempt to the remove the CR and LF from the csv prior to import. But I was losing observations and stopped there from a troubleshooting perspective. 

 


Does the source change the structure of the data frequently? If not, build the map file one time and reuse it. I was wondering why you had code rebuilding a map file each time.

 

If they do change the structure of the data frequently then you will be biting the bullet somewhere and adjusting things constantly anyway. Think of it as job security.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1246 views
  • 3 likes
  • 4 in conversation