DATA Step, Macro, Functions and more

import stata or xlsx file with massive text fields

Reply
Occasional Contributor
Posts: 14

import stata or xlsx file with massive text fields

Hi experts,

 

I received a xlsx dataset with some text fields that are pretty massive, for example.. brief narratives or diagnosis history from a doctors visit. I can import it to stata 14 without problems, but I can't import the xlsx to SAS with either the proc import, infile or import wizard. Any idea why? 

 

And because SAS does not import Stata 14 files, even if I replace it with the older Stata version 12, it only saves those text fields to lengths of 244. 

Thanks for any advice or SAS code! 

Super User
Super User
Posts: 9,840

Re: import stata or xlsx file with massive text fields

SAS varaibles have a limit, 2000 characters.  I would suggest dropping the data into plain text file format.  Then you can read the plain text file character after character and output to strings.  I.e.

data want;
  infile "thetextfile.txt";
  length str $2000;
  input a $char1. @@;  /*might need the at's might not */
  if lengthn(cat(str,a)) > 2000 then do;
    output;
    str=a;
  end;
  else str=cat(str,a);
run;

The question is what are you going to do with that data, its pretty much useless for anything programming or stats related.

Occasional Contributor
Posts: 14

Re: import stata or xlsx file with massive text fields

Thanks for your help, but I can't get this to work. When I proc print, the narrative field comes out empty. I tried with or without @@.

My file has 3 variables cid 8. Encid 8. narrative_assess_imp . I tried inputting all the variables, and that didn't work either. I think I'm confused what the a variable is in your code?



Does the following log help at all?


8473 data want;
8474 infile "C:\\narrative.csv";
8475 length narrative_assess_imp $2000;
8476 input a $char1. ; /*might need the at's might not */
8477 if lengthn(cat(narrative_assess_imp,a)) > 2000 then do;
8478 output;
8479 narrative_assess_imp=a;
8480 end;
8481 else narrative_assess_imp=cat(narrative_assess_imp,a);
8482 run;

NOTE: The infile "C:\narrative.csv" is:

Filename=C:\narrative.csv,
RECFM=V,LRECL=32767,
File Size (bytes)=181558671,
Last Modified=25Apr2018:10:06:11,
Create Time=25Apr2018:10:06:08

NOTE: 397651 records were read from the infile "C:\narrative.csv".
The minimum record length was 12.
The maximum record length was 2061.
NOTE: The data set WORK.WANT has 397651 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 19.19 seconds
cpu time 2.01 seconds




Super User
Super User
Posts: 9,840

Re: import stata or xlsx file with massive text fields

So your file is a csv.  Well, that adds more complexity to the whole thing then.  Its hard to say when we can't see the file (or an example).  My code was presented to run through a text file character by character, read it in, append it to a string variable and if the length is 2000 then output as a row.  I.e. cutting the string up into blocks of 2000 characters.  Not a good fit for what you now say.  I would need to know something about what the file looks like and what you intend to be the output from it as each "cell" in a dataset can only have a maximum of 2000 characters, so you will need to cut it out into blocks of this somehow, but is that data really useful to anyone or any process?

Occasional Contributor
Posts: 14

Re: import stata or xlsx file with massive text fields

Yeah, I tried exporting the variables i needed to a csv using stata and trying to import it into sas that way. there's 3 variables in the file, 2 unique ids and the character variable which is ultimately a bunch of sentences. I plan to search this text variable for key words for race, marital status, etc..  

Does that help at all?

Super User
Super User
Posts: 8,279

Re: import stata or xlsx file with massive text fields

Did you use the XLSX engine?  It should not have a problem with your file.  The limit for a character variable in a SAS dataset is 32K bytes.

libname in xlsx 'myfile.xlsx';
proc copy inlib=in outlib=work;
run;
Occasional Contributor
Posts: 14

Re: import stata or xlsx file with massive text fields

Hi, I'm trying this now, I've never used this code before, so I hope it works, but it seems like SAS is going to time out, it's been churning this whole time. I'll keep you posted. Thanks for your help!

Ask a Question
Discussion stats
  • 6 replies
  • 136 views
  • 0 likes
  • 3 in conversation