BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ronein
Onyx | Level 15

Hello

Lets say I have a sas data set with 2 million rows and 200 columns.

Lets say that I consider perform transpose so there will be Long structure with 3 fields: Custt_ID,var_name, value

(so will have 400 million rows with 3 columns)

My question- Will the resulted data set have same size (in giga) as the orignal data set?

Note-

The data set include information of credit risk score model.

Each customer has geneeral information and also information of explenatory varaibles.

There are diffferent sets of explenatory varaibles depents on the model that the customer passed.

I want to create a panel that help users to analyse specific customer.

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
yabwon
Amethyst | Level 16

It is not rocket science to do some simple estimations. You could do it yourself...

/*
Let's assume we have 2 million rows and 201 columns, and
custt_ID is $ 32, and  other are (for simplicity) all numeric.
So single observation is roughly 32+8*200
Everything times 2 millions:
*/

%let custID=32;

data _null_;
x = 2000000 * (&custid.+8*200) ;
GB = x / 1024**3;
put "1) " GB=;
run;

/*
After transposition, let's assume value stays numeric
but varname is $ 32 (to cover all possible variable names values)
every row changes into 200 rows, so with 2 millions: 
*/

%let varname=32; 

data _null_;
  x = 2000000 * 200 * (&custid. + &varname. + 8) ;
  GB = x / 1024**3;
  put "2) " GB=;
run;

%let varname=8; /* if varname is shorter, say 8 */

data _null_;
  x = 2000000 * 200 * (&custid. + &varname. + 8) ;
  GB = x / 1024**3;
  put "3) " GB=;
run;

LOG:

1) GB=3.0398368835

2) GB=26.822090149

3) GB=17.881393433

 

Bart

 

 

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

Lets say that I consider perform transpose so there will be Long structure with 3 fields: Custt_ID,var_name, value

 

I don't know the answer to your question about file size, but this arrangement of the data after the transpose will be a nightmare to use in further programming. Furthermore, when we talk about long data sets in SAS, what you describe isn't what we mean (although in non-SAS usage of the word "long" it certainly is long).

--
Paige Miller
LinusH
Tourmaline | Level 20

You might find some formulas that calculates this. But the easiest way is to take a subset and test yourself.

Things might affect result if you have compression on your data and indexing.

I'm usually pro long dataset over wide, but this is an extreme, key-value-pair model which is usually not useful for human usage.

What is the purpose of this re-modelling?

Data never sleeps
yabwon
Amethyst | Level 16

It is not rocket science to do some simple estimations. You could do it yourself...

/*
Let's assume we have 2 million rows and 201 columns, and
custt_ID is $ 32, and  other are (for simplicity) all numeric.
So single observation is roughly 32+8*200
Everything times 2 millions:
*/

%let custID=32;

data _null_;
x = 2000000 * (&custid.+8*200) ;
GB = x / 1024**3;
put "1) " GB=;
run;

/*
After transposition, let's assume value stays numeric
but varname is $ 32 (to cover all possible variable names values)
every row changes into 200 rows, so with 2 millions: 
*/

%let varname=32; 

data _null_;
  x = 2000000 * 200 * (&custid. + &varname. + 8) ;
  GB = x / 1024**3;
  put "2) " GB=;
run;

%let varname=8; /* if varname is shorter, say 8 */

data _null_;
  x = 2000000 * 200 * (&custid. + &varname. + 8) ;
  GB = x / 1024**3;
  put "3) " GB=;
run;

LOG:

1) GB=3.0398368835

2) GB=26.822090149

3) GB=17.881393433

 

Bart

 

 

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



qingy
Fluorite | Level 6

You may be interested in the compress option (link to SAS documentation page) if you are concerned about the size of the file. It would help claw back some of the wasted disk space reserved for blank cells that will be inevitably generated in the conversion if in an uncompressed format.

 

You can also estimate the size of your uncompressed table by taking the width of your table (set manually or calculated using SQL as shown in my reply to your question, and multiply by the sum of the size of the repeated variables you want to include (I would default to 8 bytes for each numeric variable and 200 for each character variable), then multiply by the number of distinct IDs.

PaigeMiller
Diamond | Level 26

COMPRESS option does save space, but it requires more CPU time.

 

Regarding setting the length to 3, if all the values of every one of the 200 variables wind up in the new dataset value column, you can't have length 3, you will most likely need length 8 unless ALL original variables fit in that length.

 

IMHO, estimating the new size of the dataset is a waste of time, as I said, the new data set is almost useless in that form.

--
Paige Miller
AhmedAl_Attar
Ammonite | Level 13

Hi @Ronein 

One technique I've been using to reduce my SAS data set size, is setting my Integer numeric variable to length of 3 instead of the default 8!

Where these integer values are typically used to store Boolean flags (0/1, True/False) for modeling purposes. But if you intent to use these integers for aggregations, then keeping them with the default length of 8, would preserve their precision.

 

Check this SAS on-line help SAS Help Center: Numeric Precision for more details.

 

Hope this helps

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 376 views
  • 5 likes
  • 7 in conversation