- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello. I'm stumped by the following situation and would be grateful for any help.
I have a dataset (D1) similar to this.
Metric Open BD1 BD2
MetricA 10 20 30
Metric2 40 50 60
and I want to transform it into another (D2) which contains one row like this.
MetricA_Open MetricA_BD1 MetricA_BD2 Metric2_Open Metric2_BD1 Metric2_BD2
10 20 30 40 50 60
My actual dataset is much larger with many rows and columns, so I can't hard code it. I'm currently creating a dataset for each metric and then joining each in SQL, but this doesn't scale well.
I feel it is probably possible to do this in one pass through the data and without creating tons of intermediate datasets, but I haven't been able to find a solution. Any help would be appreciated.
Thanks,
Mike
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is one way. Convert it to vertical format and use PROC TRANSPOSE to generate the resulting variable names.
data have;
input Metric $ Open BD1 BD2 @@;
cards;
MetricA 10 20 30 Metric2 40 50 60
run;
data temp / view=temp;
set have;
array _n _numeric_;
do _n_=1 to dim(_n);
_name_ = vname(_n(_n_));
col1 = _n(_n_);
output;
end;
keep metric _name_ col1 ;
run;
proc transpose data=temp out=want delimiter=_;
id metric _name_ ;
var col1;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How would you handle the remaining rows and columns? It seems strange to want to have all of the data in one column. Can you explain some more about what you're looking for?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Additional rows would lead to additional columns whose names would begin with the value in the Metric field.
If the next row was:
Metric3 70 80 90
then D2 would have the additional columns of
Metric3_Open Metric3_BD1 Metric3_BD2
To clarify, I want the resulting dataset to have one row, but it will have as many columns as required by the number of observations in the first dataset.
The reason I'm doing this is because I have hundreds and hundreds of datasets like D1 that each correspond to a particular date. I want to flatten each one out and then append them to each other to foster time series analysis. The number of columns in D1 is actually always going to be twenty some, so my solution can assume a limited and fixed number of columns in D1. For the sake of this question, we can just assume that Open, BD1, and BD2 are the only columns that we have to transform.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input Metric $ Open BD1 BD2;
cards;
MetricA 10 20 30
Metric2 40 50 60
;
run;
proc sort data=have;
by metric;
run;
proc transpose data=have out=have1 name=trans_name;
by metric;
run;
data have2;
set have1;
new_name=catx('_',metric,trans_name);
drop metric t_name;
run;
proc transpose data=have2 out=want(drop=_name_);
id new_name;
run;
proc print;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Another great solution. Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is one way. Convert it to vertical format and use PROC TRANSPOSE to generate the resulting variable names.
data have;
input Metric $ Open BD1 BD2 @@;
cards;
MetricA 10 20 30 Metric2 40 50 60
run;
data temp / view=temp;
set have;
array _n _numeric_;
do _n_=1 to dim(_n);
_name_ = vname(_n(_n_));
col1 = _n(_n_);
output;
end;
keep metric _name_ col1 ;
run;
proc transpose data=temp out=want delimiter=_;
id metric _name_ ;
var col1;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is amazing. Thank you so much for taking the time to help me out.
Mike Anas
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
or pure data step code.
data have; input Metric $ Open BD1 BD2 @@; cards; MetricA 10 20 30 Metric2 40 50 60 ; run; proc sql noprint; create table x as select * from (select distinct metric from have), (select name from dictionary.columns where libname='WORK' and memname='HAVE'); quit; data ; set x end=last; by metric ; if _n_ eq 1 then call execute('data want;merge '); if first.metric then call execute(cats('have(where=(metric="',metric,'") rename=(')); else call execute(cats(name,'=',metric,'_',name)); if last.metric then call execute('))'); if last then call execute(';drop metric;run;'); run;
Ksharp
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Ksharp.