DATA Step, Macro, Functions and more

Arrange dataset variable in group

Accepted Solution Solved
Reply
Regular Contributor
Posts: 215
Accepted Solution

Arrange dataset variable in group

[ Edited ]
Hi All,
I never had a dataset like this:
 
ID     Name    Group1     Group1_ID1   Group1_D2   Group2     Group2_ID1   Group2_D2  Group3    Group3_ID1   Group3_D2   
101   Mike     English      Eng2               Eng1             Biology     Bio1                         
102   Julia     English      Eng1               Eng2             Biology     Bio1               Bio2              Chemistry Che1             Che2             
103   Luc      English      Eng3                Eng1             Biology     Bio1              Bio2 
 
I always worked with dataset like the following. Can someone tell me the code how to convert the above dataset like below. My original dataset has 35 groups, so I need a code that is efficient.
 
ID     English  Biology   Chemistry
101  Eng2      Bio1       
101  Eng1                    Che2 
102  Eng1      Bio1
102  Eng2      Bio2
102                              Che1
102                              Che2
103  Eng3      Bio1
103  Eng1      Bio2
 
 
Thank you.
 

Accepted Solutions
Solution
‎06-05-2016 02:28 PM
Super User
Posts: 9,691

Re: Arrange dataset variable in group

Here is :
Just change 
var group-- 
into
var group:  ;
for referring to variables.







data have;
infile cards truncover;
input ID     (Name    Group1     Group1_ID1   Group1_D2   Group2     Group2_ID1   Group2_D2  Group3    Group3_ID1   Group3_D2) (: $20.);
cards;   
101   Mike     English      Eng2               Eng1             Biology     Bio1                         
102   Julia     English      Eng1               Eng2             Biology     Bio1               Bio2              Chemistry Che1             Che2             
103   Luc      English      Eng3                Eng1             Biology     Bio1              Bio2 
;
run;

proc transpose data=have out=temp;
by id;
var group: ;
run;
data data(index=(x=(id g2)) drop=_name_) group_name(keep=_name_ col1);
 set temp;
 length g1 g2 $ 40;
 g1=upcase(scan(_name_,1,'_'));
 g2=upcase(scan(_name_,2,'_'));
 if not missing(g2) then output data;
 if missing(g2) and not missing(col1) then output group_name;
run;


proc sort data=group_name nodupkey;by _name_ col1;run;
proc sql noprint;
 select cats(_name_,'=',col1) into : rename separated by ' '
  from group_name;
quit;
%put &rename;


proc sql noprint;
 select distinct catt('data(where=(g1="',g1,'") rename=(col1=',g1,'))') 
  into : merge separated by ' '
   from data;
quit;
%put &merge;
data want(rename=(&rename));
 merge &merge;
 by id g2;
 drop g1 g2;
run;





View solution in original post


All Replies
Trusted Advisor
Posts: 1,131

Re: Arrange dataset variable in group

Please try the transpose approach

 

data have;
infile cards missover;
input ID     Name$   Group1$     Group1_ID1$   Group1_D2$   Group2$     Group2_ID1$   Group2_D2$  Group3$    Group3_ID1$   Group3_D2$;
cards;
101   Mike     English      Eng2               Eng1             Biology     Bio1                         
102   Julia     English      Eng1               Eng2             Biology     Bio1               Bio2              Chemistry Che1             Che2             
103   Luc      English      Eng3                Eng1             Biology     Bio1              Bio2 
;

proc transpose data=have out=trans1;
by id;
id group1 ;
var  Group1_ID1 Group1_D2 ;
run;

proc transpose data=have out=trans2;
by id;
id group2 ;
var  Group2_ID1 Group2_D2 ;
run;

proc transpose data=have out=trans3;
by id;
id group3 ;
var  Group3_ID1 Group3_D2 ;
run;

data all(drop=_name_);
merge trans1 trans2 trans3;
run;
Thanks,
Jag
Regular Contributor
Posts: 215

Re: Arrange dataset variable in group

Thanks Jaga. I have total 35 group. So transposing 35 times is not efficient. Can you please wtite a code that is more efficient please.
Trusted Advisor
Posts: 1,131

Re: Arrange dataset variable in group

[ Edited ]

we could automate with do loop and call execute as below

 

option mprint;
%macro test(i);
proc transpose data=have out=trans&i;
where group&i ne '';
by id;
id group&i ;
var  Group&i._ID1 Group&i._D2 ;
run;
%mend;

Data _null_;
do i = 1 to 35;
call execute('%test('||i||')');
end;
run;

data all(drop=_name_);
merge trans: ;
run;

I am not sure if trans: (colon) will work. could you please test it.

Thanks,
Jag
Super User
Posts: 9,691

Re: Arrange dataset variable in group

I love this question. Why the output couldn't be :

ID	English	Biology	Chemistry
101	Eng1	 	 
101	Eng2	Bio1	 
102	Eng2	Bio2	Che2
102	Eng1	Bio1	Che1
103	Eng1	Bio2	 
103	Eng3	Bio1	 




data have;
infile cards truncover;
input ID     (Name    Group1     Group1_ID1   Group1_D2   Group2     Group2_ID1   Group2_D2  Group3    Group3_ID1   Group3_D2) (: $20.);
cards;   
101   Mike     English      Eng2               Eng1             Biology     Bio1                         
102   Julia     English      Eng1               Eng2             Biology     Bio1               Bio2              Chemistry Che1             Che2             
103   Luc      English      Eng3                Eng1             Biology     Bio1              Bio2 
;
run;

proc transpose data=have out=temp;
by id;
var group1--Group3_D2;
run;
data data(index=(x=(id g2)) drop=_name_) group_name(keep=_name_ col1);
 set temp;
 length g1 g2 $ 40;
 g1=upcase(scan(_name_,1,'_'));
 g2=upcase(scan(_name_,2,'_'));
 if not missing(g2) then output data;
 if missing(g2) and not missing(col1) then output group_name;
run;


proc sort data=group_name nodupkey;by _name_ col1;run;
proc sql noprint;
 select cats(_name_,'=',col1) into : rename separated by ' '
  from group_name;
quit;
%put &rename;


proc sql noprint;
 select distinct catt('data(where=(g1="',g1,'") rename=(col1=',g1,'))') 
  into : merge separated by ' '
   from data;
quit;
%put &merge;
data want(rename=(&rename));
 merge &merge;
 by id g2;
 drop g1 g2;
run;


Solution
‎06-05-2016 02:28 PM
Super User
Posts: 9,691

Re: Arrange dataset variable in group

Here is :
Just change 
var group-- 
into
var group:  ;
for referring to variables.







data have;
infile cards truncover;
input ID     (Name    Group1     Group1_ID1   Group1_D2   Group2     Group2_ID1   Group2_D2  Group3    Group3_ID1   Group3_D2) (: $20.);
cards;   
101   Mike     English      Eng2               Eng1             Biology     Bio1                         
102   Julia     English      Eng1               Eng2             Biology     Bio1               Bio2              Chemistry Che1             Che2             
103   Luc      English      Eng3                Eng1             Biology     Bio1              Bio2 
;
run;

proc transpose data=have out=temp;
by id;
var group: ;
run;
data data(index=(x=(id g2)) drop=_name_) group_name(keep=_name_ col1);
 set temp;
 length g1 g2 $ 40;
 g1=upcase(scan(_name_,1,'_'));
 g2=upcase(scan(_name_,2,'_'));
 if not missing(g2) then output data;
 if missing(g2) and not missing(col1) then output group_name;
run;


proc sort data=group_name nodupkey;by _name_ col1;run;
proc sql noprint;
 select cats(_name_,'=',col1) into : rename separated by ' '
  from group_name;
quit;
%put &rename;


proc sql noprint;
 select distinct catt('data(where=(g1="',g1,'") rename=(col1=',g1,'))') 
  into : merge separated by ' '
   from data;
quit;
%put &merge;
data want(rename=(&rename));
 merge &merge;
 by id g2;
 drop g1 g2;
run;





Respected Advisor
Posts: 4,662

Re: Arrange dataset variable in group

It is a good idea indeed to simplify your data structure. Another transposition approach:

 

data have;
length Group1 Group1_ID1 Group1_D2 Group2 Group2_ID1 Group2_D2 Group3 Group3_ID1 Group3_D2 $20;
infile cards missover;
input ID Name$ Group1 Group1_ID1 Group1_D2 Group2 Group2_ID1 Group2_D2 Group3 Group3_ID1 Group3_D2;
cards;
101 Mike English Eng2 Eng1 Biology Bio1 
102 Julia English Eng1 Eng2 Biology Bio1 Bio2 Chemistry Che1 Che2 
103 Luc English Eng3 Eng1 Biology Bio1 Bio2 
;

data list0;
length gName gExt gValue $20;
set have;
array g group:;
do i = 1 to dim(g);
    if not missing(g{i}) then do;
        gName = scan(vname(g{i}), 1, "_");        
        gExt = scan(vname(g{i}), 2, "_");
        gValue = g{i};
        output;
        end;
    end;
keep id gName gExt gValue;
run;

proc sort data=list0; by id gName gExt; run;

data list1;
do until(last.gName);
    set list0; by id gName;
    if first.gName then var = gValue;
    else output;
    end;
run;

proc sort data=list1; by id gExt;

proc transpose data=list1 out=want(drop=_name_);
by id gExt;
var gValue;
id var;
run;

proc print data=want noobs; run;
PG
Super User
Posts: 17,960

Re: Arrange dataset variable in group

I'd actually suggest you go one step further and include the subject as a variable. 

 

ID Subject Level

1 English Eng1

1 English Eng2

1 Chemistry Chem2

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 436 views
  • 4 likes
  • 5 in conversation