When I export a dataset to Stata format using `PROC EXPORT`, SAS 9.4 automatically expands adds an extra (empty) byte to every observation of every string variable. For example, in this data set: data test1; input cust_id $ 1 month 3-8 category $ 10-12 status $ 14-14 ; datalines; A 200003 ABC C A 200004 DEF C A 200006 XYZ 3 B 199910 ASD X B 199912 ASD C ; quit; proc export data = test1 file = "test1.dta" dbms = stata replace; quit; the variables `cust_id`, `category`, and `status` should be `str1`, `str3`, and `str1` in the final Stata file, and thus take up 1 byte, 3 bytes, and 1 byte, respectively, for every observation. However, SAS automatically adds an extra empty byte to each observation, which expands their data types to `str2`, `str4`, and `str2` data type in the outputted Stata file. This is extremely problematic because that's an extra byte added to *every* observation of *every* string variable. For large datasets (I have some with ~530 million observations and numerous string variables), this can add several gigabytes to the exported file. Once the file is loaded into Stata, the `compress` command in Stata can automatically remove these empty bytes and shrink the file, but for large datasets, `PROC EXPORT` adds so many extra bytes to the file that I don't always have enough memory to load the dataset into Stata in the first place. Is there a way to stop SAS from padding the string variables in the first place? When I export a file with a one character string variable (for example), I want that variable stored as a one character string variable in the output file.
... View more