BookmarkSubscribeRSS Feed
0 Likes

When running PROC TRANSPOSE on a wide dataset that is sparse, it would be nice if the output could be filtered so rows with zero values, e.g., are excluded.

 

This sample does what I want, but I'd like to add something like an "if value <> 0" or "where value ne 0" to the proc transpose, so the output is limited to rows where value is not zero. Otherwise the resulting output will be huge, especially if I have about 50 "key" columns in the input and millions of records to transpose.

 

data sample_data;
    input id $ a b c d e f g;
    datalines;
01 1 0  0 0 0 0 0
02 0 1  0 0 0 0 0
03 0 0 -1 0 0 0 0
04 0 3  0 0 0 0 0
05 0 0  0 0 0 0 5
;

proc sort data = sample_data;
    by id a b c d e f g;
run;

proc transpose data = sample_data
               out  = sample_transpose(rename=(_NAME_=key
                                                 COL1=value));
    var a b c d e f g;
    by id;
run;

 

2 Comments
Quentin
Super User

You can add a WHERE option to your PROC TRANSPOSE out= dataset.  Is this what you want? :

 

proc transpose data = sample_data
               out  = sample_transpose(rename=(_NAME_=key
                                                 COL1=value)
                                       where=(value ne 0)
                                       );
    var a b c d e f g;
    by id;
run;

If you have millions of records to transpose, this wouldn't help with the efficiency of the actual transposing (I would think all those values of 0 would still be transposed when the PROC runs), but at least they would not be written to the output dataset, so you would save time in writing the output. 

LinusH
Tourmaline | Level 20

An option in the same area could be to opt for excluding missing values, this would IMO make some sense as pre defined options.

Other filtering should be fine using where= ds options as @Quentin suggested.