Re: Need ideas to transpose a dataset using minimum coding effort

munitech4u · Posted 06-30-2014 04:56 AM

I have a dataset that contain data related to a particular company in a row, over the period of time, say 20 years. So it has observed values of say 10 variables over period of 20 years.

So Like a1-a20 variables for 20 years data and then say b1-b20 variables for 20 years and so on. Like this there are rows related to n companies.

I want to transpose in such a way that data related to "a" transpose and keeps under "a" for 20 years.then next variable "b" will have 20 years of data transposed and so on. The company say C1 will get repeated 20 times for 20 years of data.

Suggestions?

Let me know if you have any questions.

RW9 · Posted 06-30-2014 05:12 AM

Sorry, test data and output would probably clarify somewhat. You want something like:

id a1 a2 a3...a20 b1 b2...b20 c1..

If so then transpose data for the A variables, then the B variables, then the C variables and merge them together.

munitech4u · Posted 06-30-2014 05:27 AM

Transposing all variables using n datasteps for n variables and then merging is quite some time taking task, which I have on my last option. I was looking for any better method.

RW9 · Posted 06-30-2014 05:32 AM

Well, the only other thing I can think of is array processing with by groups. So:

data want;

set have;

by id;

array a{20};

array b{20};

...

process result into next available array based on spec for a or b etc.

if last.id then output;

run;

Reeza · Posted 06-30-2014 10:39 AM

Here's a good reference for that.

Example 3 is your situation.

SAS Learning Module: Reshaping wide to long using a data step

Scott_Mitchell · Posted 07-01-2014 12:16 AM

I am not entirely sure if the example you have given reflects the actual state of play in regards to your data or whether it is merely a best case scenario. The following will work if each set of variable follows the convention you have displayed in your example, chiefly that there will always be 20 variables for each grouping A1-20 B1-20 and so on.

PROC SQL NOPRINT;

SELECT

DISTINCT SUBSTR(NAME,1,1) INTO : OUTVARS SEPARATED BY " "

FROM DICTIONARY.COLUMNS

WHERE UPCASE(MEMNAME) = "HAVE"

AND NAME NOT IN ("COMPANY")

;

SELECT

DISTINCT NAME INTO : VARNAMES SEPARATED BY " "

FROM DICTIONARY.COLUMNS

WHERE UPCASE(MEMNAME) = "HAVE"

AND NAME NOT IN ("COMPANY")

ORDER BY NAME

;

QUIT;

%LET DIM1 = %SYSFUNC(LENGTH(%SYSFUNC(COMPRESS(&OUTVARS.))));

%LET DIM2 = %EVAL(%SYSFUNC(COUNTW(&VARNAMES.))/&DIM1.);

DATA WANT;

RETAIN COMPANY PERIOD;

LENGTH PERIOD $2;

SET HAVE;

ARRAY OUTVARS (&DIM1.) &OUTVARS.;

ARRAY INVARS (&DIM1.,&DIM2.) &VARNAMES.;

DO J = 1 TO &DIM2.;

DO I = 1 TO &DIM1.;

OUTVARS{I} = INVARS{I,J};

IF I = 1 THEN PERIOD = COMPRESS(VNAME(INVARS{I,J}),'0123456789','K');

END;

OUTPUT;

END;

KEEP COMPANY PERIOD &OUTVARS.;

RUN;

Tom · Posted 07-01-2014 01:14 AM

If the variables are all of the same type (numeric vs character) then just use a two dimensional array.

So if you know the variables are a1-a10, b1-b10, c1-c10 then code like this will work.

data want ;

set have ;

array inx(3,10) a1-a10 b1-b10 c1-c10 ;

array outx a b c ;

do year = 1 to 10 ;

do var=1 to 3;

outx(var) = inx(var,year) ;

end;

output;

end;

keep company year a b c ;

run;

SAS Training: Just a Click Away