BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mbutton
Calcite | Level 5

I've been tasked with doing a complicated transpose of a dataset that I have never done before and keep getting hung up.

I need to do this transpose for over 600 different datasets and then stack them on top of one another for later use.

Each dataset has about 15 unique variables but it varies across data sets.

Currently the data looks like this

key
report date
state
animal
cost
homes
rcosttohome
startdate
123401/01/01CAcats2030.5002/03/04
2356

01/01/01

NCdogs3034.7503/04/06
128801/01/01TXcows4086.8307/08/09

I need to to look like this

keyreport_date
VAR_NAME
ALL_CHARACTER
ALL_Number
ALL_RATIO
ALL_DATE
1234

01/01/01

stateCA...
123401/01/01animalcats...
123401/01/01cost.20
123401/01/01homes30
123401/01/01rcosttohome.5
123401/01/01startdate02/03/04

and it repeats for each key.  The data table will eventually be very very long.

I already have created a macro variable &varlist. that identifies all the variables that need to become rows in the transpose.  But I am having a hard time consolidating like values into their respective columns.  

Thank you for any help you can offer!

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Hohooo. That is really not easy, especially you need so many datasets to process.

HaiKuo has pointed you a right direction .

data have;
input key report_date :mmddyy10. state$    animal$    cost homes    rcosttohome    startdate:mmddyy8.;
format report_date startdate :mmddyy10.;
cards;
1234    01/01/01    CA    cats    20    30    .50    02/03/04
2356    01/01/01    NC    dogs    30    34    .75    03/04/06
1288    01/01/01    TX    cows    40    86    .83    07/08/09
;
run;
data want(keep=key report_date var_name all_character all_ratio all_date all_number where=(var_name not in ('key' 'report_date') ));
 set have;
 array _a{*} $ _character_;
 array _b{*}  _numeric_;
 length var_name all_character $ 400;
 do i=1 to dim(_a);
  var_name=vname(_a{i});
  all_character=_a{i};
  output; all_character=' ';
 end;
 do i=1 to dim(_b);
  var_name=vname(_b{i});
  if _b{i} gt 0 and  _b{i} lt 1 then do;
                                       all_ratio=_b{i};
                                       output;all_ratio=.;
                                      end;
    else if vformat(_b{i}) in: ('MMDDYY' 'YYMMDD' 'DDMMYY' 'DATE') then do;
                                                                  all_date=_b{i};
                                                                  output;all_date=.;
                                                                end;
      else do;
             all_number=_b{i};
             output; all_number=.;
           end;
 end;
 format all_date yymmdd10.;
run;






Ksharp

View solution in original post

4 REPLIES 4
Haikuo
Onyx | Level 15

Hi,

You mentioned that you have over 600 different datasets with different variable names. if they share 'key' and 'report_date', or if they don't share any variables consistently, but the first two variables will remain not transposed, then there maybe a dynamic solution available involving metadata and macro variable. Otherwise, it probably needs a lot hard coding.

The following code uses array() to hopefully get you started, and after this step, there are many ways to stack them together, 'datastep set', 'proc append', 'proc datasets', to name a few.

data have;

input key report_date :mmddyy10. state$    animal$    cost homes    rcosttohome    startdate:mmddyy8.;

format report_date startdate :mmddyy10.;

cards;

1234    01/01/01    CA    cats    20    30    .50    02/03/04

2356    01/01/01    NC    dogs    30    34    .75    03/04/06

1288    01/01/01    TX    cows    40    86    .83    07/08/09

;

data want (keep=key report_date var_name all:);

set have;

format all_date mmddyy10.;

array c _character_; length var_name $20.;

array n cost--startdate;

do over c;

  var_name=vname(c);

  all_character=c;

  output;      call missing(all_character);

end; 

do over n;

   var_name=vname(n);

   dt=substr(var_name,length(var_name)-3,4);

   if dt='date' then do;

       all_date=n;

       output;     call missing(all_date);

    end;

   else if n>=1 then do;

            all_number=n;

            output;    call missing(all_number);

    end;

    else do;

            all_ratio=n;

            output;    call missing(all_ratio);

    end;

   

end;

run;

proc print;run;

Haikuo

mbutton
Calcite | Level 5

Thank you!  This helped me get pointed in the right direction.

They all share a common key variable and report date. 

Astounding
PROC Star

Here's the hard-coded example, for the data you provided.  You'll have to make sure that the lengths for the character variables are long enough.

data want (keep=key report_date var_name all_character all_number all_ratio all_date);

   set have;

   length var_name all_character $ 32;

   var_name='state';

   all_character=state;

   output;

   var_name='animal';

   all_character=animal;

   output;

   all_character=' ';

   var_name='cost';

   all_number=cost;

   output;

   var_name='homes';

   all_number=homes;

   output;

   all_number=.;

   var_name='rcosttohome';

   var_ratio = rcosttohome;

   output;

   all_ratio=.;

   var_name='startdate';

   all_date = startdate;

   output;

run;

Variables need to be set to missing (such as all_number) when the next variable to be processed falls into a different column.

Good luck.

Ksharp
Super User

Hohooo. That is really not easy, especially you need so many datasets to process.

HaiKuo has pointed you a right direction .

data have;
input key report_date :mmddyy10. state$    animal$    cost homes    rcosttohome    startdate:mmddyy8.;
format report_date startdate :mmddyy10.;
cards;
1234    01/01/01    CA    cats    20    30    .50    02/03/04
2356    01/01/01    NC    dogs    30    34    .75    03/04/06
1288    01/01/01    TX    cows    40    86    .83    07/08/09
;
run;
data want(keep=key report_date var_name all_character all_ratio all_date all_number where=(var_name not in ('key' 'report_date') ));
 set have;
 array _a{*} $ _character_;
 array _b{*}  _numeric_;
 length var_name all_character $ 400;
 do i=1 to dim(_a);
  var_name=vname(_a{i});
  all_character=_a{i};
  output; all_character=' ';
 end;
 do i=1 to dim(_b);
  var_name=vname(_b{i});
  if _b{i} gt 0 and  _b{i} lt 1 then do;
                                       all_ratio=_b{i};
                                       output;all_ratio=.;
                                      end;
    else if vformat(_b{i}) in: ('MMDDYY' 'YYMMDD' 'DDMMYY' 'DATE') then do;
                                                                  all_date=_b{i};
                                                                  output;all_date=.;
                                                                end;
      else do;
             all_number=_b{i};
             output; all_number=.;
           end;
 end;
 format all_date yymmdd10.;
run;






Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1021 views
  • 3 likes
  • 4 in conversation