BookmarkSubscribeRSS Feed
fortranso
Calcite | Level 5

When I export a dataset to Stata format using `PROC EXPORT`, SAS 9.4 automatically expands adds an extra (empty) byte to every observation of every string variable. For example, in this data set:

    data test1;

        input cust_id   $ 1

              month       3-8

              category  $ 10-12

              status    $ 14-14

    ;

    datalines;

    A 200003 ABC C

    A 200004 DEF C

    A 200006 XYZ 3

    B 199910 ASD X

    B 199912 ASD C

    ;

    quit;

   

    proc export data = test1

        file = "test1.dta"

        dbms = stata replace;

    quit;

the variables `cust_id`, `category`, and `status` should be `str1`, `str3`, and `str1` in the final Stata file, and thus take up 1 byte, 3 bytes, and 1 byte, respectively, for every observation. However, SAS automatically adds an extra empty byte to each observation, which expands their data types to `str2`, `str4`, and `str2` data type in the outputted Stata file.

This is extremely problematic because that's an extra byte added to *every* observation of *every* string variable. For large datasets (I have some with ~530 million observations and numerous string variables), this can add several gigabytes to the exported file.

Once the file is loaded into Stata, the `compress` command in Stata can automatically remove these empty bytes and shrink the file, but for large datasets, `PROC EXPORT` adds so many extra bytes to the file that I don't always have enough memory to load the dataset into Stata in the first place.

Is there a way to stop SAS from padding the string variables in the first place? When I export a file with a one character string variable (for example), I want that variable stored as a one character string variable in the output file.

2 REPLIES 2
Tom
Super User Tom
Super User

You should raise this with SAS support.  It looks to be part of the design of how PROC EXPORT converts strings for STATA.  There might be differences in how STATA stores strings that could make the extra space necessary to prevent data loss for all strings.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 770 views
  • 0 likes
  • 3 in conversation