BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AnnaNZ
Quartz | Level 8

Hi lovely people  here is a challenge

 

I have a dataset (huge) that I want to transform. I built a little dummy dataset to explain, please see the test.xlsx attachement

Say we have a few people (ID) who took part in an event and are displayed in one columne.  Now, all people with the note  WINNER need to feature as a single line, all other people in that columne get their own columne, unless there are two or more of the same value, then they get moved together into their (own) columne. 

In this example ID 5, 7 and 8 build the backbone of the dataset, the rest is built around them,  that is behind it as long as they have the same ID 

So every ID with Number 5 , 7 or 8 is essentially in one line as displayed in the transformed example in the test.xlsx

 

So it not really a transformation, but rather a case where you  arrange different attributes of a variable into other columns. 

How do you tackle something like this? 

 

I am workng with Windows10 and SAS 9.4 

 

Many thanks for any help and advice. 

Regards, Anna

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Instead of proc transpose, you might be better off with a double "do until (last.", as below.  It works well if you know iniadvance which values of C you are looking for (i.e.'v', 'm', 'l', 'r'):

 

data have;
input a b c :$1.;
datalines;
1	12	v
1	12	r
1	16	v
2	16	v
2	23	l
2	23	m
2	45	m
3	36	l
3	22	v
4	44	v
4	33	l
4	54	m
4	67	r
run;

data want (drop=nv);
  do nv=0 by 0 until (last.a);
    set have ;
    by a;
    length d_l e_m f_r $1;
    select (c);
      when ('v') nv=nv+1;
      when ('l') d_l='l';
      when ('m') e_m='m';
      when ('r') f_r='r';
      otherwise;
    end;
  end;
  if nv>0 then c='v';
  do until(last.a);
    set have (keep=a b);
    by a;
    if nv>0 then output;
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
ballardw
Super User

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

sschleede
Obsidian | Level 7

Are you looking for something like this?

Have:

Obsab
11Winner
22Middle
33Loser
44Middle
55Winner
66Other
77Dropout
88Winner

 

Want:

Obsbval1val2val3
1Dropout7..
2Loser3..
3Middle24.
4Other6..
5Winner158

 

This code produces the above if you comment back in the lines I commented out. I like to see how it is building the result before I keep just the last record in the group.

 

 

data have;

     length b $ 10;

     input a b $;

     datalines;

1 Winner

2 Middle

3 Loser

4 Middle

5 Winner

6 Other

7 Dropout

8 Winner

      ;

run;

 

proc sort data = have;

     by b;

run;

 

proc print data = have;

     var a b;

run;

 

data want;

     set have;

     by b;

     array val[3] val1 val2 val3;

     retain val: lastval j;

     if first.b then do;

       do i = 1 to 3;

         val(i) = .;

       end;

       j = 1;

               lastval = b;

     end;

     if b = lastval then do;

               val(j) = a;

       j = j + 1;

     end;

          lastval = b;

/* I comment these out while developing so I can see how it is processing each record */

/* if last.b then output;*/

/* keep b val: */

run;

 

proc print data = want;

run;

ballardw
Super User

Since this code:

proc transpose data=have out=want (drop=_name_) prefix=Val;
   by b;
   var a;
run;

Generates your want data set as shown with the example data you would need to provide a more complete example of the input and output if the actual result is supposed to be more complex. You initial description sounded like that any other values of 5 for varaible A are supposed to have some special handling but I cannot determine what you may have meant as there is no example of that behavior to examine.

 

 

And the code you provided was exactly what we prefer for data. Thank you. It helps to post into a code box opened using the forum {i} or "run" icons as the main forum windows are known to reformat code and on some occasions non-visible characters have been introduced that will cause errors with data step.

AnnaNZ
Quartz | Level 8
Have:

a 	b	c

1	12	v
1	12	r
1	16	v
2	16	v
2	23	l
2	23	m
2	45	m
3	36	l
3	22	v
4	44	v
4	33	l
4	54	m
4	67	r


Want: 
a	b	c	d_l	e_m	f_r

1	12	v			 r
1	12	v			 r
1	16	v			 r
2	16	v	l	m	
2	23	v	l	m	
2	23	v	l	m	
2	45	v	l	m	
3	36	v	l		
3	22	v	l		
4	44	v	l	m	 r
4	33	v	l	m	 r
4	54	v	l	m	 r
4	67	v	l	m	 r

Many thanks for the contributions. Close to what I need, but still need some input.

In essence, I am trying to build new rows (d_l   e_m   f_r) from values occurring in column c and fill the other rows in column c (which have become empty) with information = v

All repeated values in column a  build one unit (same Ids so to say).

 

So maybe it is possible to first transpose the data, and then fill the spaces

 

Not only have spaces to be filled for column c, but if there was any value other than v in column c and this value has been moved to another column(d_l , e_m or f_r), then this value needs to show up for all the IDs with the same value of  column a

 

The example should clarify what I mean. .

Many thanks for any input.

Anna

 

 

mkeintz
PROC Star

Instead of proc transpose, you might be better off with a double "do until (last.", as below.  It works well if you know iniadvance which values of C you are looking for (i.e.'v', 'm', 'l', 'r'):

 

data have;
input a b c :$1.;
datalines;
1	12	v
1	12	r
1	16	v
2	16	v
2	23	l
2	23	m
2	45	m
3	36	l
3	22	v
4	44	v
4	33	l
4	54	m
4	67	r
run;

data want (drop=nv);
  do nv=0 by 0 until (last.a);
    set have ;
    by a;
    length d_l e_m f_r $1;
    select (c);
      when ('v') nv=nv+1;
      when ('l') d_l='l';
      when ('m') e_m='m';
      when ('r') f_r='r';
      otherwise;
    end;
  end;
  if nv>0 then c='v';
  do until(last.a);
    set have (keep=a b);
    by a;
    if nv>0 then output;
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
AnnaNZ
Quartz | Level 8

Hi

 

Although I liked the last contribution and it addressed ythe problem, it seemed all a bit complicated,  and I finally found an elegant solution using the left join function.

I thougt I share this here:

from http://www2.sas.com/proceedings/sugi25/25/cc/25p109.pdf

 

Table LEFT: A LEFT JOIN on PATIENT between DOSING ('left' data set) and EFFICACY ('right' data set).

 

PROC SQL;

CREATE TABLE LEFT1 AS

SELECT A.*, B.EFFIC_ID, B.VISIT,

B.SCORE

FROM DOSING A LEFT JOIN EFFICACY B

ON A.PATIENT = B.PATIENT ORDER BY PATIENT;

QUIT;

 

Many thanks, Anna

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1124 views
  • 3 likes
  • 4 in conversation