I've been tasked with doing a complicated transpose of a dataset that I have never done before and keep getting hung up.
I need to do this transpose for over 600 different datasets and then stack them on top of one another for later use.
Each dataset has about 15 unique variables but it varies across data sets.
Currently the data looks like this
key | report date | state | animal | cost | homes | rcosttohome | startdate |
---|---|---|---|---|---|---|---|
1234 | 01/01/01 | CA | cats | 20 | 30 | .50 | 02/03/04 |
2356 | 01/01/01 | NC | dogs | 30 | 34 | .75 | 03/04/06 |
1288 | 01/01/01 | TX | cows | 40 | 86 | .83 | 07/08/09 |
I need to to look like this
key | report_date | VAR_NAME | ALL_CHARACTER | ALL_Number | ALL_RATIO | ALL_DATE |
---|---|---|---|---|---|---|
1234 | 01/01/01 | state | CA | . | . | . |
1234 | 01/01/01 | animal | cats | . | . | . |
1234 | 01/01/01 | cost | . | 20 | ||
1234 | 01/01/01 | homes | 30 | |||
1234 | 01/01/01 | rcosttohome | .5 | |||
1234 | 01/01/01 | startdate | 02/03/04 |
and it repeats for each key. The data table will eventually be very very long.
I already have created a macro variable &varlist. that identifies all the variables that need to become rows in the transpose. But I am having a hard time consolidating like values into their respective columns.
Thank you for any help you can offer!
Hohooo. That is really not easy, especially you need so many datasets to process.
HaiKuo has pointed you a right direction .
data have; input key report_date :mmddyy10. state$ animal$ cost homes rcosttohome startdate:mmddyy8.; format report_date startdate :mmddyy10.; cards; 1234 01/01/01 CA cats 20 30 .50 02/03/04 2356 01/01/01 NC dogs 30 34 .75 03/04/06 1288 01/01/01 TX cows 40 86 .83 07/08/09 ; run; data want(keep=key report_date var_name all_character all_ratio all_date all_number where=(var_name not in ('key' 'report_date') )); set have; array _a{*} $ _character_; array _b{*} _numeric_; length var_name all_character $ 400; do i=1 to dim(_a); var_name=vname(_a{i}); all_character=_a{i}; output; all_character=' '; end; do i=1 to dim(_b); var_name=vname(_b{i}); if _b{i} gt 0 and _b{i} lt 1 then do; all_ratio=_b{i}; output;all_ratio=.; end; else if vformat(_b{i}) in: ('MMDDYY' 'YYMMDD' 'DDMMYY' 'DATE') then do; all_date=_b{i}; output;all_date=.; end; else do; all_number=_b{i}; output; all_number=.; end; end; format all_date yymmdd10.; run;
Ksharp
Hi,
You mentioned that you have over 600 different datasets with different variable names. if they share 'key' and 'report_date', or if they don't share any variables consistently, but the first two variables will remain not transposed, then there maybe a dynamic solution available involving metadata and macro variable. Otherwise, it probably needs a lot hard coding.
The following code uses array() to hopefully get you started, and after this step, there are many ways to stack them together, 'datastep set', 'proc append', 'proc datasets', to name a few.
data have;
input key report_date :mmddyy10. state$ animal$ cost homes rcosttohome startdate:mmddyy8.;
format report_date startdate :mmddyy10.;
cards;
1234 01/01/01 CA cats 20 30 .50 02/03/04
2356 01/01/01 NC dogs 30 34 .75 03/04/06
1288 01/01/01 TX cows 40 86 .83 07/08/09
;
data want (keep=key report_date var_name all:);
set have;
format all_date mmddyy10.;
array c _character_; length var_name $20.;
array n cost--startdate;
do over c;
var_name=vname(c);
all_character=c;
output; call missing(all_character);
end;
do over n;
var_name=vname(n);
dt=substr(var_name,length(var_name)-3,4);
if dt='date' then do;
all_date=n;
output; call missing(all_date);
end;
else if n>=1 then do;
all_number=n;
output; call missing(all_number);
end;
else do;
all_ratio=n;
output; call missing(all_ratio);
end;
end;
run;
proc print;run;
Haikuo
Thank you! This helped me get pointed in the right direction.
They all share a common key variable and report date.
Here's the hard-coded example, for the data you provided. You'll have to make sure that the lengths for the character variables are long enough.
data want (keep=key report_date var_name all_character all_number all_ratio all_date);
set have;
length var_name all_character $ 32;
var_name='state';
all_character=state;
output;
var_name='animal';
all_character=animal;
output;
all_character=' ';
var_name='cost';
all_number=cost;
output;
var_name='homes';
all_number=homes;
output;
all_number=.;
var_name='rcosttohome';
var_ratio = rcosttohome;
output;
all_ratio=.;
var_name='startdate';
all_date = startdate;
output;
run;
Variables need to be set to missing (such as all_number) when the next variable to be processed falls into a different column.
Good luck.
Hohooo. That is really not easy, especially you need so many datasets to process.
HaiKuo has pointed you a right direction .
data have; input key report_date :mmddyy10. state$ animal$ cost homes rcosttohome startdate:mmddyy8.; format report_date startdate :mmddyy10.; cards; 1234 01/01/01 CA cats 20 30 .50 02/03/04 2356 01/01/01 NC dogs 30 34 .75 03/04/06 1288 01/01/01 TX cows 40 86 .83 07/08/09 ; run; data want(keep=key report_date var_name all_character all_ratio all_date all_number where=(var_name not in ('key' 'report_date') )); set have; array _a{*} $ _character_; array _b{*} _numeric_; length var_name all_character $ 400; do i=1 to dim(_a); var_name=vname(_a{i}); all_character=_a{i}; output; all_character=' '; end; do i=1 to dim(_b); var_name=vname(_b{i}); if _b{i} gt 0 and _b{i} lt 1 then do; all_ratio=_b{i}; output;all_ratio=.; end; else if vformat(_b{i}) in: ('MMDDYY' 'YYMMDD' 'DDMMYY' 'DATE') then do; all_date=_b{i}; output;all_date=.; end; else do; all_number=_b{i}; output; all_number=.; end; end; format all_date yymmdd10.; run;
Ksharp
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.