SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

A bit more than PROC TRANSPOSE, but what ?

Accepted Solution Solved
Reply
Highlighted
Contributor
Posts: 58
Accepted Solution

A bit more than PROC TRANSPOSE, but what ?

Hi lovely people  here is a challenge

 

I have a dataset (huge) that I want to transform. I built a little dummy dataset to explain, please see the test.xlsx attachement

Say we have a few people (ID) who took part in an event and are displayed in one columne.  Now, all people with the note  WINNER need to feature as a single line, all other people in that columne get their own columne, unless there are two or more of the same value, then they get moved together into their (own) columne. 

In this example ID 5, 7 and 8 build the backbone of the dataset, the rest is built around them,  that is behind it as long as they have the same ID 

So every ID with Number 5 , 7 or 8 is essentially in one line as displayed in the transformed example in the test.xlsx

 

So it not really a transformation, but rather a case where you  arrange different attributes of a variable into other columns. 

How do you tackle something like this? 

 

I am workng with Windows10 and SAS 9.4 

 

Many thanks for any help and advice. 

Regards, Anna


Accepted Solutions
Solution
‎06-07-2017 10:40 PM
Trusted Advisor
Posts: 1,015

Re: A bit more than PROC TRANSPOSE, but what ?

Instead of proc transpose, you might be better off with a double "do until (last.", as below.  It works well if you know iniadvance which values of C you are looking for (i.e.'v', 'm', 'l', 'r'):

 

data have;
input a b c :$1.;
datalines;
1	12	v
1	12	r
1	16	v
2	16	v
2	23	l
2	23	m
2	45	m
3	36	l
3	22	v
4	44	v
4	33	l
4	54	m
4	67	r
run;

data want (drop=nv);
  do nv=0 by 0 until (last.a);
    set have ;
    by a;
    length d_l e_m f_r $1;
    select (c);
      when ('v') nv=nv+1;
      when ('l') d_l='l';
      when ('m') e_m='m';
      when ('r') f_r='r';
      otherwise;
    end;
  end;
  if nv>0 then c='v';
  do until(last.a);
    set have (keep=a b);
    by a;
    if nv>0 then output;
  end;
run;

View solution in original post


All Replies
Super User
Posts: 11,338

Re: A bit more than PROC TRANSPOSE, but what ?

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

Contributor
Posts: 22

Re: A bit more than PROC TRANSPOSE, but what ?

Are you looking for something like this?

Have:

Obsab
11Winner
22Middle
33Loser
44Middle
55Winner
66Other
77Dropout
88Winner

 

Want:

Obsbval1val2val3
1Dropout7..
2Loser3..
3Middle24.
4Other6..
5Winner158

 

This code produces the above if you comment back in the lines I commented out. I like to see how it is building the result before I keep just the last record in the group.

 

 

data have;

     length b $ 10;

     input a b $;

     datalines;

1 Winner

2 Middle

3 Loser

4 Middle

5 Winner

6 Other

7 Dropout

8 Winner

      ;

run;

 

proc sort data = have;

     by b;

run;

 

proc print data = have;

     var a b;

run;

 

data want;

     set have;

     by b;

     array val[3] val1 val2 val3;

     retain val: lastval j;

     if first.b then do;

       do i = 1 to 3;

         val(i) = .;

       end;

       j = 1;

               lastval = b;

     end;

     if b = lastval then do;

               val(j) = a;

       j = j + 1;

     end;

          lastval = b;

/* I comment these out while developing so I can see how it is processing each record */

/* if last.b then output;*/

/* keep b val: */

run;

 

proc print data = want;

run;

Super User
Posts: 11,338

Re: A bit more than PROC TRANSPOSE, but what ?

Posted in reply to sschleede

Since this code:

proc transpose data=have out=want (drop=_name_) prefix=Val;
   by b;
   var a;
run;

Generates your want data set as shown with the example data you would need to provide a more complete example of the input and output if the actual result is supposed to be more complex. You initial description sounded like that any other values of 5 for varaible A are supposed to have some special handling but I cannot determine what you may have meant as there is no example of that behavior to examine.

 

 

And the code you provided was exactly what we prefer for data. Thank you. It helps to post into a code box opened using the forum {i} or "run" icons as the main forum windows are known to reformat code and on some occasions non-visible characters have been introduced that will cause errors with data step.

Contributor
Posts: 58

Re: A bit more than PROC TRANSPOSE, but what ?

Have:

a 	b	c

1	12	v
1	12	r
1	16	v
2	16	v
2	23	l
2	23	m
2	45	m
3	36	l
3	22	v
4	44	v
4	33	l
4	54	m
4	67	r


Want: 
a	b	c	d_l	e_m	f_r

1	12	v			 r
1	12	v			 r
1	16	v			 r
2	16	v	l	m	
2	23	v	l	m	
2	23	v	l	m	
2	45	v	l	m	
3	36	v	l		
3	22	v	l		
4	44	v	l	m	 r
4	33	v	l	m	 r
4	54	v	l	m	 r
4	67	v	l	m	 r

Many thanks for the contributions. Close to what I need, but still need some input.

In essence, I am trying to build new rows (d_l   e_m   f_r) from values occurring in column c and fill the other rows in column c (which have become empty) with information = v

All repeated values in column a  build one unit (same Ids so to say).

 

So maybe it is possible to first transpose the data, and then fill the spaces

 

Not only have spaces to be filled for column c, but if there was any value other than v in column c and this value has been moved to another column(d_l , e_m or f_r), then this value needs to show up for all the IDs with the same value of  column a

 

The example should clarify what I mean. .

Many thanks for any input.

Anna

 

 

Solution
‎06-07-2017 10:40 PM
Trusted Advisor
Posts: 1,015

Re: A bit more than PROC TRANSPOSE, but what ?

Instead of proc transpose, you might be better off with a double "do until (last.", as below.  It works well if you know iniadvance which values of C you are looking for (i.e.'v', 'm', 'l', 'r'):

 

data have;
input a b c :$1.;
datalines;
1	12	v
1	12	r
1	16	v
2	16	v
2	23	l
2	23	m
2	45	m
3	36	l
3	22	v
4	44	v
4	33	l
4	54	m
4	67	r
run;

data want (drop=nv);
  do nv=0 by 0 until (last.a);
    set have ;
    by a;
    length d_l e_m f_r $1;
    select (c);
      when ('v') nv=nv+1;
      when ('l') d_l='l';
      when ('m') e_m='m';
      when ('r') f_r='r';
      otherwise;
    end;
  end;
  if nv>0 then c='v';
  do until(last.a);
    set have (keep=a b);
    by a;
    if nv>0 then output;
  end;
run;
Contributor
Posts: 58

Re: A bit more than PROC TRANSPOSE, but what ?

Hi

 

Although I liked the last contribution and it addressed ythe problem, it seemed all a bit complicated,  and I finally found an elegant solution using the left join function.

I thougt I share this here:

from http://www2.sas.com/proceedings/sugi25/25/cc/25p109.pdf

 

Table LEFT: A LEFT JOIN on PATIENT between DOSING ('left' data set) and EFFICACY ('right' data set).

 

PROC SQL;

CREATE TABLE LEFT1 AS

SELECT A.*, B.EFFIC_ID, B.VISIT,

B.SCORE

FROM DOSING A LEFT JOIN EFFICACY B

ON A.PATIENT = B.PATIENT ORDER BY PATIENT;

QUIT;

 

Many thanks, Anna

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 399 views
  • 3 likes
  • 4 in conversation