BookmarkSubscribeRSS Feed
SIgnificatif
Quartz | Level 8

Hi all

 

I have a table that I need to transpose, the format of the table is like this

 

ID Varname value
5642121 age 50
5642121 city New york
5642121 firstname john
5642121 lastname smith
5642125 age 40
5642125 city New jersey
5642125 firstname Paul
5642125 lastname lennon
5642125 children yes
5642125 car ferrari
............
......

 

and so on

 

each ID have different number of variables , one subject could have 10, the other one 30 etc,

 

  • I use sorting and transposing successfully.

But after the creation of the table, I found that the space was running low: the initial table was 2GB when the 'transposed' table was 20-25 GB. ( because there are many 'empty variable values' .... 

 

  • I format the variables after the transposition to preserve the lenght

I cut the big table into small ones for better analysis.

 

are there other ways ( use queries, / SQL in this case ? ) or the transposition is the only way in this situation,
the only purpose is to economize the disk space ( even if I can just delete the table after creation and treatment..) 

 

interesting, when I export data into excel, it takes only 5 MB.

 

Thank you for answer.

 

5 REPLIES 5
andreas_lds
Jade | Level 19

@SIgnificatif wrote:

Hi all

 

[...]

 

each ID have different number of variables , one subject could have 10, the other one 30 etc,

 

  • I use sorting and transposing successfully.

But after the creation of the table, I found that the space was running low: the initial table was 2GB when the 'transposed' table was 20-25 GB. ( because there are many 'empty variable values' .... 

 

  • I format the variables after the transposition to preserve the lenght

I cut the big table into small ones for better analysis.

 

are there other ways ( use queries, / SQL in this case ? ) or the transposition is the only way in this situation,
the only purpose is to economize the disk space ( even if I can just delete the table after creation and treatment..) 

 

interesting, when I export data into excel, it takes only 5 MB.

 

Thank you for answer.

 


Please post a small excerpt of the dataset before and after transposing using datalines-statement and with the types and lengths you have your datasets.

 

Which analysis are performed, that require transposing the data at all? If after transposing many null-values exist, transposing should be questioned.

Kurt_Bremser
Super User

@andreas_lds the source dataset is a name-value list, and the transposition probably looks like this:

data have;
input ID $ Varname :$32. value & :$32.;
datalines;
5642121 age 50
5642121 city New york
5642121 firstname john
5642121 lastname smith
5642125 age 40
5642125 city New jersey
5642125 firstname Paul
5642125 lastname lennon
5642125 children yes
5642125 car ferrari
;
run;

proc transpose data=have out=want (drop=_name_ compress=yes);
by id;
var value;
id varname;
run;

If the number of values in varname is limited, it would be better to write a data step where variable types can be set:

data want (compress=yes);
set have;
by id;
format firstname lastname city $32. age 3. car $32. children 1.;
retain firstname lastname city age car children;
if first.id
then call missing (firstname,lastname,city,age,car,children);
select (varname);
  when ('firstname') firstname = value;
  when ('lastname') lastname = value;
  when ('city') city = value;
  when ('age') age = input(value,32.);
  when ('car') car = value;
  when ('children') children = ifn(value='yes',1,0);
end;
if last.id;
run;

The compress option should take care of space wasted by empty strings. The "manual" method also lets one define correct lengths for short string values (eg if value is defined as $254, but contains only Y or N for a certain varname, you'd waste 253 bytes with the automatic transpose.

Astounding
PROC Star
Try adding this statement at the beginning of your program:

options compress=yes;
Tom
Super User Tom
Super User

Try using the COMPRESS option when creating the dataset. Without that option each record takes the full amount of space defined for each variable for every observation.  With compression you can frequently reduce the size by a lot.

 

An XLSX file is essentially a ZIP file and so has compression already. Plus other features of how it stores the data can also reduce the size as empty cells are not stored and repeated values are stored just once and referenced instead.

 

Note also that changing the FORMAT attached to a variable does nothing to change the LENGTH used to store the variable.  Make sure that your process for defining the new variables is defining them with the appropriate length.  Also if you are attaching $ formats to character variables make sure the widths match the lengths of the variables otherwise it can cause a lot of confusion and possible wrong analyses. 

RichardDeVen
Barite | Level 11

The original table design is called an "entity attribute value table" (EAV data).  Very suitable for cases of needing additional attributes, a little more difficult when you want the transpose.  Presuming a TRANSPOSE that is 

 

 

    BY ID;
    ID VARNAME;
    VAR VALUE;

 

 

the caveats are that TRANSPOSE does not minimize the width of the columns created for each attribute, nor does it evaluate values for conversion to useful types and formats.   So worst case, VALUE is something like $4096 and you have hundreds of attributes over all the IDs... the result will be hundreds of columns that are $4096.

 

Kurt's reply is the best for the tightest pivot transformation of your EAV data.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1017 views
  • 2 likes
  • 6 in conversation