Help with kind-of-transpose problem

Accepted Solution Solved
Reply
Super Contributor
Posts: 371
Accepted Solution

Help with kind-of-transpose problem

Hello Everyone,

I have a file with ID, variable_name and associated values.

I want to "transpose" into a file where variable_name become name of variable and values become value of variables.

The file have and file want as below.

Could you please help me with that?

Thank you very much for your help.

Have a nice weekend.

HHC

data have;
input ID variable_name1 $ variable_name2 $ variable_name3 $ value1 value2 value3;
datalines;
1 aa bb cc 1 2 3
2 aa bb dd 3 4 5
3 kk bb ee 30 40 50
;;
run;

file want

idaabbccddeekk
1123...
234.5..
340..5030

Accepted Solutions
Solution
‎02-07-2014 06:38 PM
Super User
Super User
Posts: 6,500

Re: Help with kind-of-transpose problem

You need to transform it twice.  If you do it right then PROC TRANSPOSE can do the last step for you.

I have shortened names of the variables that hold the target variable names to make the code easier to type and read.

data have;

input ID name1 $ name2 $ name3 $ value1 value2 value3;

datalines;

1 aa bb cc 1 2 3

2 aa bb dd 3 4 5

3 kk bb ee 30 40 50

;;;;

data tall ;

  set have ;

  array _name name1-name3 ;

  array _val  value1-value3 ;

  do obs=1 to dim(_name);

    _name_ = _name(obs);

    _value_ = _val(obs);

    output;

  end;

run;

proc transpose data=tall out=want (drop=_name_) ;

  by id;

  id _name_;

  var _value_;

run;

View solution in original post


All Replies
Super Contributor
Posts: 307

Re: Help with kind-of-transpose problem

What you've described is not a transposition. You just need to drop the unneeded varialbles. Or, do I not understand what you are proposing?

Super Contributor
Posts: 371

Re: Help with kind-of-transpose problem

Hi Fugue,

You are right, it isn't a transposition but it is a restructure of data.

I am struggling with it.

HHC

PROC Star
Posts: 7,363

Re: Help with kind-of-transpose problem

It can be solved as a transpose problem.  The following will work with your example data:

data have;

input ID variable_name1 $ variable_name2 $ variable_name3 $ value1 value2 value3;

datalines;

1 aa bb cc 1 2 3

2 aa bb dd 3 4 5

3 kk bb ee 30 40 50

;

proc transpose data=have out=long;

  by id;

  var value1-value3;

  copy variable_name1-variable_name3;

run;

data long (drop=variable: namesSmiley Happy;

  set long;

  array names(3) $;

  retain names:;

  by id;

  if first.id then do;

    names(1)=variable_name1;

    names(2)=variable_name2;

    names(3)=variable_name3;

  end;

  _name_=names(compress(_name_,,'kd')); 

run;

proc transpose data=long out=want (drop=_Smiley Happy;

  by id;

  id _name_;

run;

data want;

  retain id aa bb cc dd ee kk;

  set want;

run;

Super Contributor
Posts: 307

Re: Help with kind-of-transpose problem

I see you've modified your description of the output. In light of the new information, just dropping variables will not work. Stay tuned.

Solution
‎02-07-2014 06:38 PM
Super User
Super User
Posts: 6,500

Re: Help with kind-of-transpose problem

You need to transform it twice.  If you do it right then PROC TRANSPOSE can do the last step for you.

I have shortened names of the variables that hold the target variable names to make the code easier to type and read.

data have;

input ID name1 $ name2 $ name3 $ value1 value2 value3;

datalines;

1 aa bb cc 1 2 3

2 aa bb dd 3 4 5

3 kk bb ee 30 40 50

;;;;

data tall ;

  set have ;

  array _name name1-name3 ;

  array _val  value1-value3 ;

  do obs=1 to dim(_name);

    _name_ = _name(obs);

    _value_ = _val(obs);

    output;

  end;

run;

proc transpose data=tall out=want (drop=_name_) ;

  by id;

  id _name_;

  var _value_;

run;

Super Contributor
Posts: 371

Re: Help with kind-of-transpose problem

Hi Arthur and Tom,

Thank you for your kind help. Both codes work perfectly.

I am also think of a solution and almost being able to implement it.

You mentor me for sometime with that approach, can you help me to correct that error, Tom?

My thought is that. I create a file called "name"

data name;

input aa bb cc dd ee kk;

datalines;

1000 1001 1002 1003 1004 1005

;;run;

(It is silly somehow to give random number but I have to do it for SAS to run "from t=1 to nobs")

Then I will use the DO to check if

name{i}=vname(x{j}) then put value to it.

The problem now is that, in the final file, the last available value is "carried forward". For example, since bb has value in the Have file, in the want file, it is ok. However, for aa, the last value is 3 in ID=2 is "carried forward" to ID=3. Same thing happen with cc since cc=3 is carried for all 3 ID.

Somehow, I have to get SAS to resent each round.

Here is the main code:

data want;
set name;

do t=1 to nobs ;

set have point=t nobs=nobs;

array name(*) name1-name3 ;
array value(*) value1-value3 ;
array x(*) aa bb cc dd ee kk;

do i=1 to dim(name);
do j=1 to dim(x);
if name{i}=vname(x{j}) then do;
x{j}=value{i};

output;
end;
end;
end;
end;
run;

data want; set want;
drop c1-c3 v1-v3 i j;run;

data want; set want;

by id;

if last.id;

run;

Super Contributor
Posts: 307

Re: Help with kind-of-transpose problem

See my post on using an array approach to solve your problem, which seems to be what you are now attempting.

Super Contributor
Posts: 371

Re: Help with kind-of-transpose problem

Thank you, Fugue,

You posted it while I was writing mine last message so I did not read it.

Yes, it is what I am trying to do.

I try to simplify by manually make that Name file and still I cannot fix the problem of value carried forward as in my code.

HHC

Super User
Super User
Posts: 6,500

Re: Help with kind-of-transpose problem

The values are carrying forward because you are preventing the normal clearing that occurs when the data step restarts. Eliminate the DO loop and POINT= option to process the input data set.  Let the normal data step looping do that for you.

Super Contributor
Posts: 371

Re: Help with kind-of-transpose problem

When I eliminate that first DO LOOP, SAS only read the first ID.

I am not sure I understand your suggestion correctly.

data want;
set name;

*do t=1 to nobs ;

set have /*point=t nobs=nobs*/;

array name(*) name1-name3 ;
array value(*) value1-value3 ;
array x(*) aa bb cc dd ee kk;

do i=1 to dim(name);
do j=1 to dim(x);
if name{i}=vname(x{j}) then do;
x{j}=value{i};

*end;

end;
end;output;
end;
run;

Super User
Super User
Posts: 6,500

Re: Help with kind-of-transpose problem

Here is a way to get the names from an existing dataset rather than hard coding them in the ARRAY statement.

You will need to clear the variables on each pass because they are "retained" because they are from an input dataset  ;


data want;

* get the names ;

if 0 then set name;

array x _all_;

call missing(of x(*));


* Get the data ;

set have /*point=t nobs=nobs*/;

array name name1-name3 ;

array value value1-value3 ;

* Find the right variable based on the name.;

do i=1 to dim(name);

  do j=1 to dim(x);

    if upcase(name{i})=upcase(vname(x{j})) then do;

      x{j}=value{i};

    end;

  end;

end;


run;

Super Contributor
Posts: 371

Re: Help with kind-of-transpose problem

Thanks, Tom.

I see, with this section "array x(*) aa bb cc dd ee kk;" manually added, there is no need for such Name file.

HHC

-----------------------

data have;

input ID name1 $ name2 $ name3 $ value1 value2 value3;

datalines;

1 aa bb cc 1 2 3

2 aa bb dd 3 4 5

3 kk bb ee 30 40 50

;;;;

data name;

input aa bb cc dd ee kk;

datalines;

1000 1001 1002 1003 1004 1005

;;

run;

data want;

set have ;

array name(*) name1-name3 ;

array value(*) value1-value3 ;

array x(*) aa bb cc dd ee kk;

do i=1 to dim(name);

  do j=1 to dim(x);

    if name{i}=vname(x{j}) then do;

      x{j}=value{i};

    end;

  end;

end;

run;

Super User
Super User
Posts: 6,500

Re: Help with kind-of-transpose problem

You need to use some code generation to get the new names.  Using macro variables is one way.

Here is an example where I build the macro variable VARS to contain the list of names.  Also generate the macro variable NAMES to contain the quoted list of names to make it easier to use the WHICHC() function to avoid manually searching for the name.

data have;

input ID name1 $ name2 $ name3 $ value1 value2 value3;

datalines;

1 aa bb cc 1 2 3

2 aa bb dd 3 4 5

3 kk bb ee 30 40 50

;;;;

proc sql noprint ;

  select distinct name

       , quote(trim(name))

  into :vars separated by ' '

     , :names separated by ','

  from (select name1 as name from have

  union select name2 as name from have

  union select name3 as name from have

       )

  ;

%let n=&sqlobs ;

quit;

data want ;

  set have ;

  array out (&n) 8 &vars ;

  array name name1-name3 ;

  array value  value1-value3 ;

  do i=1 to dim(name);

    out(whichc(name(i),&names)) = value(i);

  end;

  keep id &vars ;

run;

Super Contributor
Posts: 307

Re: Help with kind-of-transpose problem

Tom and Arthur are both correct in the PROC TRANSPOSE can be used quite effectively to solve your problem (although, strictly speaking, this is not a transposition problem). Here is another approach:

/* create a macro with the unique listing of variable names to create */

proc sql;

      select distinct var_names

            , count ( distinct ( var_names ) )

      into :var_list separated by " "

            , :var_count

      from

      (

      select distinct ( variable_name1 ) as var_names

      from have

      union

      select distinct ( variable_name2 ) as var_names

      from have

      union

      select distinct ( variable_name3 ) as var_names

      from have

      )

      order by var_names

;

quit;

%put list of variable names to output is . . . &var_list ;

%put count of variables is . . . &var_count ;

/* restructure the data */

data want ( keep = id &var_list );

      set have ;

      format &var_list best.;

      array var_list(*) &var_list;

      array var_name(*) $ variable_name1-variable_name3;

      array var_value(*) value1-value3;

            do i = 1 to dim(var_list) ;

                  do k = 1 to dim(var_name) ;

                        if vname(var_list(i))=var_name(k) then var_list(i)=var_value(k);

                  end;

            end;

run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 422 views
  • 7 likes
  • 5 in conversation