DATA Step, Macro, Functions and more

Flatten a Data File with Dynamic Variable Names

Accepted Solution Solved
Reply
Contributor
Posts: 22
Accepted Solution

Flatten a Data File with Dynamic Variable Names

Hello. I'm stumped by the following situation and would be grateful for any help.

I have a dataset (D1) similar to this.

Metric     Open   BD1     BD2

MetricA      10     20         30

Metric2      40     50         60

and I want to transform it into another (D2) which contains one row like this.

MetricA_Open      MetricA_BD1     MetricA_BD2     Metric2_Open     Metric2_BD1     Metric2_BD2

10                         20                    30                    40                         50                    60

My actual dataset is much larger with many rows and columns, so I can't hard code it. I'm currently creating a dataset for each metric and then joining each in SQL, but this doesn't scale well.

I feel it is probably possible to do this in one pass through the data and without creating tons of intermediate datasets, but I haven't been able to find a solution. Any help would be appreciated.

Thanks,

Mike


Accepted Solutions
Solution
‎05-03-2013 10:04 PM
Super User
Super User
Posts: 6,502

Re: Flatten a Data File with Dynamic Variable Names

Here is one way.  Convert it to vertical format and use PROC TRANSPOSE to generate the resulting variable names.

data have;

  input Metric $ Open BD1 BD2 @@;

cards;

MetricA 10 20 30 Metric2 40 50 60

run;

data temp / view=temp;

  set have;

  array _n _numeric_;

  do _n_=1 to dim(_n);

     _name_ = vname(_n(_n_));

     col1 = _n(_n_);

     output;

  end;

  keep metric _name_ col1 ;

run;

proc transpose data=temp out=want delimiter=_;

  id metric _name_ ;

  var col1;

run;

View solution in original post


All Replies
Super User
Posts: 17,912

Re: Flatten a Data File with Dynamic Variable Names

How would you handle the remaining rows and columns? It seems strange to want to have all of the data in one column. Can you explain some more about what you're looking for?

Contributor
Posts: 22

Re: Flatten a Data File with Dynamic Variable Names

Additional rows would lead to additional columns whose names would begin with the value in the Metric field.

If the next row was:

Metric3       70          80          90

then D2 would have the additional columns of

Metric3_Open     Metric3_BD1     Metric3_BD2

To clarify, I want the resulting dataset to have one row, but it will have as many columns as required by the number of observations in the first dataset.

The reason I'm doing this is because I have hundreds and hundreds of datasets like D1 that each correspond to a particular date. I want to flatten each one out and then append them to each other to foster time series analysis. The number of columns in D1 is actually always going to be twenty some, so my solution can assume a limited and fixed number of columns in D1. For the sake of this question, we can just assume that Open, BD1, and BD2 are the only columns that we have to transform.

Super Contributor
Posts: 275

Re: Flatten a Data File with Dynamic Variable Names

data have;

input Metric $   Open   BD1     BD2;

cards;

MetricA      10     20         30

Metric2      40     50         60

;

run;

proc sort data=have;

  by metric;

run;

proc transpose data=have out=have1 name=trans_name;

   by metric;

run;

data have2;

  set have1;

  new_name=catx('_',metric,trans_name);

  drop metric t_name;

run;

proc transpose data=have2 out=want(drop=_name_);

  id new_name;

  run;

proc print;

run;

Contributor
Posts: 22

Re: Flatten a Data File with Dynamic Variable Names

Another great solution. Thank you.

Solution
‎05-03-2013 10:04 PM
Super User
Super User
Posts: 6,502

Re: Flatten a Data File with Dynamic Variable Names

Here is one way.  Convert it to vertical format and use PROC TRANSPOSE to generate the resulting variable names.

data have;

  input Metric $ Open BD1 BD2 @@;

cards;

MetricA 10 20 30 Metric2 40 50 60

run;

data temp / view=temp;

  set have;

  array _n _numeric_;

  do _n_=1 to dim(_n);

     _name_ = vname(_n(_n_));

     col1 = _n(_n_);

     output;

  end;

  keep metric _name_ col1 ;

run;

proc transpose data=temp out=want delimiter=_;

  id metric _name_ ;

  var col1;

run;

Contributor
Posts: 22

Re: Flatten a Data File with Dynamic Variable Names

This is amazing. Thank you so much for taking the time to help me out.

Mike Anas

Super User
Posts: 9,691

Re: Flatten a Data File with Dynamic Variable Names

or pure data step code.

data have;
  input Metric $ Open BD1 BD2 @@;
cards;
MetricA 10 20 30 Metric2 40 50 60
;
run;
proc sql noprint;
create table x as
select * from
 (select distinct metric from have),
 (select name from dictionary.columns where libname='WORK' and memname='HAVE');
 quit;
 data ;
  set x end=last;
  by metric ;
  if _n_ eq 1 then call execute('data want;merge ');
  if first.metric then call execute(cats('have(where=(metric="',metric,'") rename=('));
   else call execute(cats(name,'=',metric,'_',name));
  if last.metric then call execute('))');
  if last then call execute(';drop metric;run;');
run;

Ksharp

Contributor
Posts: 22

Re: Flatten a Data File with Dynamic Variable Names

Thank you Ksharp.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 409 views
  • 6 likes
  • 5 in conversation