DATA Step, Macro, Functions and more

DATA step transpose question

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

DATA step transpose question

I've been tasked with doing a complicated transpose of a dataset that I have never done before and keep getting hung up.

I need to do this transpose for over 600 different datasets and then stack them on top of one another for later use.

Each dataset has about 15 unique variables but it varies across data sets.

Currently the data looks like this

key
report date
state
animal
cost
homes
rcosttohome
startdate
123401/01/01CAcats2030.5002/03/04
2356

01/01/01

NCdogs3034.7503/04/06
128801/01/01TXcows4086.8307/08/09

I need to to look like this

keyreport_date
VAR_NAME
ALL_CHARACTER
ALL_Number
ALL_RATIO
ALL_DATE
1234

01/01/01

stateCA...
123401/01/01animalcats...
123401/01/01cost.20
123401/01/01homes30
123401/01/01rcosttohome.5
123401/01/01startdate02/03/04

and it repeats for each key.  The data table will eventually be very very long.

I already have created a macro variable &varlist. that identifies all the variables that need to become rows in the transpose.  But I am having a hard time consolidating like values into their respective columns.  

Thank you for any help you can offer!


Accepted Solutions
Solution
‎07-03-2012 01:51 AM
Super User
Posts: 10,023

Re: DATA step transpose question

Hohooo. That is really not easy, especially you need so many datasets to process.

HaiKuo has pointed you a right direction .

data have;
input key report_date :mmddyy10. state$    animal$    cost homes    rcosttohome    startdate:mmddyy8.;
format report_date startdate :mmddyy10.;
cards;
1234    01/01/01    CA    cats    20    30    .50    02/03/04
2356    01/01/01    NC    dogs    30    34    .75    03/04/06
1288    01/01/01    TX    cows    40    86    .83    07/08/09
;
run;
data want(keep=key report_date var_name all_character all_ratio all_date all_number where=(var_name not in ('key' 'report_date') ));
 set have;
 array _a{*} $ _character_;
 array _b{*}  _numeric_;
 length var_name all_character $ 400;
 do i=1 to dim(_a);
  var_name=vname(_a{i});
  all_character=_a{i};
  output; all_character=' ';
 end;
 do i=1 to dim(_b);
  var_name=vname(_b{i});
  if _b{i} gt 0 and  _b{i} lt 1 then do;
                                       all_ratio=_b{i};
                                       output;all_ratio=.;
                                      end;
    else if vformat(_b{i}) in: ('MMDDYY' 'YYMMDD' 'DDMMYY' 'DATE') then do;
                                                                  all_date=_b{i};
                                                                  output;all_date=.;
                                                                end;
      else do;
             all_number=_b{i};
             output; all_number=.;
           end;
 end;
 format all_date yymmdd10.;
run;






Ksharp

View solution in original post


All Replies
Respected Advisor
Posts: 3,156

Re: DATA step transpose question

Hi,

You mentioned that you have over 600 different datasets with different variable names. if they share 'key' and 'report_date', or if they don't share any variables consistently, but the first two variables will remain not transposed, then there maybe a dynamic solution available involving metadata and macro variable. Otherwise, it probably needs a lot hard coding.

The following code uses array() to hopefully get you started, and after this step, there are many ways to stack them together, 'datastep set', 'proc append', 'proc datasets', to name a few.

data have;

input key report_date :mmddyy10. state$    animal$    cost homes    rcosttohome    startdate:mmddyy8.;

format report_date startdate :mmddyy10.;

cards;

1234    01/01/01    CA    cats    20    30    .50    02/03/04

2356    01/01/01    NC    dogs    30    34    .75    03/04/06

1288    01/01/01    TX    cows    40    86    .83    07/08/09

;

data want (keep=key report_date var_name allSmiley Happy;

set have;

format all_date mmddyy10.;

array c _character_; length var_name $20.;

array n cost--startdate;

do over c;

  var_name=vname(c);

  all_character=c;

  output;      call missing(all_character);

end; 

do over n;

   var_name=vname(n);

   dt=substr(var_name,length(var_name)-3,4);

   if dt='date' then do;

       all_date=n;

       output;     call missing(all_date);

    end;

   else if n>=1 then do;

            all_number=n;

            output;    call missing(all_number);

    end;

    else do;

            all_ratio=n;

            output;    call missing(all_ratio);

    end;

   

end;

run;

proc print;run;

Haikuo

New Contributor
Posts: 2

Re: DATA step transpose question

Thank you!  This helped me get pointed in the right direction.

They all share a common key variable and report date. 

Super User
Posts: 5,503

Re: DATA step transpose question

Here's the hard-coded example, for the data you provided.  You'll have to make sure that the lengths for the character variables are long enough.

data want (keep=key report_date var_name all_character all_number all_ratio all_date);

   set have;

   length var_name all_character $ 32;

   var_name='state';

   all_character=state;

   output;

   var_name='animal';

   all_character=animal;

   output;

   all_character=' ';

   var_name='cost';

   all_number=cost;

   output;

   var_name='homes';

   all_number=homes;

   output;

   all_number=.;

   var_name='rcosttohome';

   var_ratio = rcosttohome;

   output;

   all_ratio=.;

   var_name='startdate';

   all_date = startdate;

   output;

run;

Variables need to be set to missing (such as all_number) when the next variable to be processed falls into a different column.

Good luck.

Solution
‎07-03-2012 01:51 AM
Super User
Posts: 10,023

Re: DATA step transpose question

Hohooo. That is really not easy, especially you need so many datasets to process.

HaiKuo has pointed you a right direction .

data have;
input key report_date :mmddyy10. state$    animal$    cost homes    rcosttohome    startdate:mmddyy8.;
format report_date startdate :mmddyy10.;
cards;
1234    01/01/01    CA    cats    20    30    .50    02/03/04
2356    01/01/01    NC    dogs    30    34    .75    03/04/06
1288    01/01/01    TX    cows    40    86    .83    07/08/09
;
run;
data want(keep=key report_date var_name all_character all_ratio all_date all_number where=(var_name not in ('key' 'report_date') ));
 set have;
 array _a{*} $ _character_;
 array _b{*}  _numeric_;
 length var_name all_character $ 400;
 do i=1 to dim(_a);
  var_name=vname(_a{i});
  all_character=_a{i};
  output; all_character=' ';
 end;
 do i=1 to dim(_b);
  var_name=vname(_b{i});
  if _b{i} gt 0 and  _b{i} lt 1 then do;
                                       all_ratio=_b{i};
                                       output;all_ratio=.;
                                      end;
    else if vformat(_b{i}) in: ('MMDDYY' 'YYMMDD' 'DDMMYY' 'DATE') then do;
                                                                  all_date=_b{i};
                                                                  output;all_date=.;
                                                                end;
      else do;
             all_number=_b{i};
             output; all_number=.;
           end;
 end;
 format all_date yymmdd10.;
run;






Ksharp

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 293 views
  • 3 likes
  • 4 in conversation