Solved: Re: Arrange dataset variable in group

mlogan · Posted 06-04-2016 10:22 PM

Hi All,

I never had a dataset like this:

ID Name Group1 Group1_ID1 Group1_D2 Group2 Group2_ID1 Group2_D2 Group3 Group3_ID1 Group3_D2

101 Mike English Eng2 Eng1 Biology Bio1

102 Julia English Eng1 Eng2 Biology Bio1 Bio2 Chemistry Che1 Che2

103 Luc English Eng3 Eng1 Biology Bio1 Bio2

I always worked with dataset like the following. Can someone tell me the code how to convert the above dataset like below. My original dataset has 35 groups, so I need a code that is efficient.

ID English Biology Chemistry

101 Eng2 Bio1

101 Eng1 Che2

102 Eng1 Bio1

102 Eng2 Bio2

102 Che1

102 Che2

103 Eng3 Bio1

103 Eng1 Bio2

Thank you.

Ksharp · Posted 06-05-2016 12:16 AM

Here is :
Just change 
var group-- 
into
var group:  ;
for referring to variables.







data have;
infile cards truncover;
input ID     (Name    Group1     Group1_ID1   Group1_D2   Group2     Group2_ID1   Group2_D2  Group3    Group3_ID1   Group3_D2) (: $20.);
cards;   
101   Mike     English      Eng2               Eng1             Biology     Bio1                         
102   Julia     English      Eng1               Eng2             Biology     Bio1               Bio2              Chemistry Che1             Che2             
103   Luc      English      Eng3                Eng1             Biology     Bio1              Bio2 
;
run;

proc transpose data=have out=temp;
by id;
var group: ;
run;
data data(index=(x=(id g2)) drop=_name_) group_name(keep=_name_ col1);
 set temp;
 length g1 g2 $ 40;
 g1=upcase(scan(_name_,1,'_'));
 g2=upcase(scan(_name_,2,'_'));
 if not missing(g2) then output data;
 if missing(g2) and not missing(col1) then output group_name;
run;


proc sort data=group_name nodupkey;by _name_ col1;run;
proc sql noprint;
 select cats(_name_,'=',col1) into : rename separated by ' '
  from group_name;
quit;
%put &rename;


proc sql noprint;
 select distinct catt('data(where=(g1="',g1,'") rename=(col1=',g1,'))') 
  into : merge separated by ' '
   from data;
quit;
%put &merge;
data want(rename=(&rename));
 merge &merge;
 by id g2;
 drop g1 g2;
run;

View solution in original post

Jagadishkatam · Posted 06-04-2016 11:37 PM

Please try the transpose approach

data have;
infile cards missover;
input ID     Name$   Group1$     Group1_ID1$   Group1_D2$   Group2$     Group2_ID1$   Group2_D2$  Group3$    Group3_ID1$   Group3_D2$;
cards;
101   Mike     English      Eng2               Eng1             Biology     Bio1                         
102   Julia     English      Eng1               Eng2             Biology     Bio1               Bio2              Chemistry Che1             Che2             
103   Luc      English      Eng3                Eng1             Biology     Bio1              Bio2 
;

proc transpose data=have out=trans1;
by id;
id group1 ;
var  Group1_ID1 Group1_D2 ;
run;

proc transpose data=have out=trans2;
by id;
id group2 ;
var  Group2_ID1 Group2_D2 ;
run;

proc transpose data=have out=trans3;
by id;
id group3 ;
var  Group3_ID1 Group3_D2 ;
run;

data all(drop=_name_);
merge trans1 trans2 trans3;
run;

Thanks,
Jag

mlogan · Posted 06-04-2016 11:52 PM

Thanks Jaga. I have total 35 group. So transposing 35 times is not efficient. Can you please wtite a code that is more efficient please.

Jagadishkatam · Posted 06-05-2016 01:27 AM

we could automate with do loop and call execute as below

option mprint;
%macro test(i);
proc transpose data=have out=trans&i;
where group&i ne '';
by id;
id group&i ;
var  Group&i._ID1 Group&i._D2 ;
run;
%mend;

Data _null_;
do i = 1 to 35;
call execute('%test('||i||')');
end;
run;

data all(drop=_name_);
merge trans: ;
run;

I am not sure if trans: (colon) will work. could you please test it.

Thanks,
Jag

Ksharp · Posted 06-05-2016 12:13 AM

I love this question. Why the output couldn't be :

ID	English	Biology	Chemistry
101	Eng1	 	 
101	Eng2	Bio1	 
102	Eng2	Bio2	Che2
102	Eng1	Bio1	Che1
103	Eng1	Bio2	 
103	Eng3	Bio1	 




data have;
infile cards truncover;
input ID     (Name    Group1     Group1_ID1   Group1_D2   Group2     Group2_ID1   Group2_D2  Group3    Group3_ID1   Group3_D2) (: $20.);
cards;   
101   Mike     English      Eng2               Eng1             Biology     Bio1                         
102   Julia     English      Eng1               Eng2             Biology     Bio1               Bio2              Chemistry Che1             Che2             
103   Luc      English      Eng3                Eng1             Biology     Bio1              Bio2 
;
run;

proc transpose data=have out=temp;
by id;
var group1--Group3_D2;
run;
data data(index=(x=(id g2)) drop=_name_) group_name(keep=_name_ col1);
 set temp;
 length g1 g2 $ 40;
 g1=upcase(scan(_name_,1,'_'));
 g2=upcase(scan(_name_,2,'_'));
 if not missing(g2) then output data;
 if missing(g2) and not missing(col1) then output group_name;
run;


proc sort data=group_name nodupkey;by _name_ col1;run;
proc sql noprint;
 select cats(_name_,'=',col1) into : rename separated by ' '
  from group_name;
quit;
%put &rename;


proc sql noprint;
 select distinct catt('data(where=(g1="',g1,'") rename=(col1=',g1,'))') 
  into : merge separated by ' '
   from data;
quit;
%put &merge;
data want(rename=(&rename));
 merge &merge;
 by id g2;
 drop g1 g2;
run;

Ksharp · Posted 06-05-2016 12:16 AM

Here is :
Just change 
var group-- 
into
var group:  ;
for referring to variables.







data have;
infile cards truncover;
input ID     (Name    Group1     Group1_ID1   Group1_D2   Group2     Group2_ID1   Group2_D2  Group3    Group3_ID1   Group3_D2) (: $20.);
cards;   
101   Mike     English      Eng2               Eng1             Biology     Bio1                         
102   Julia     English      Eng1               Eng2             Biology     Bio1               Bio2              Chemistry Che1             Che2             
103   Luc      English      Eng3                Eng1             Biology     Bio1              Bio2 
;
run;

proc transpose data=have out=temp;
by id;
var group: ;
run;
data data(index=(x=(id g2)) drop=_name_) group_name(keep=_name_ col1);
 set temp;
 length g1 g2 $ 40;
 g1=upcase(scan(_name_,1,'_'));
 g2=upcase(scan(_name_,2,'_'));
 if not missing(g2) then output data;
 if missing(g2) and not missing(col1) then output group_name;
run;


proc sort data=group_name nodupkey;by _name_ col1;run;
proc sql noprint;
 select cats(_name_,'=',col1) into : rename separated by ' '
  from group_name;
quit;
%put &rename;


proc sql noprint;
 select distinct catt('data(where=(g1="',g1,'") rename=(col1=',g1,'))') 
  into : merge separated by ' '
   from data;
quit;
%put &merge;
data want(rename=(&rename));
 merge &merge;
 by id g2;
 drop g1 g2;
run;

PGStats · Posted 06-05-2016 12:48 AM

It is a good idea indeed to simplify your data structure. Another transposition approach:

data have;
length Group1 Group1_ID1 Group1_D2 Group2 Group2_ID1 Group2_D2 Group3 Group3_ID1 Group3_D2 $20;
infile cards missover;
input ID Name$ Group1 Group1_ID1 Group1_D2 Group2 Group2_ID1 Group2_D2 Group3 Group3_ID1 Group3_D2;
cards;
101 Mike English Eng2 Eng1 Biology Bio1 
102 Julia English Eng1 Eng2 Biology Bio1 Bio2 Chemistry Che1 Che2 
103 Luc English Eng3 Eng1 Biology Bio1 Bio2 
;

data list0;
length gName gExt gValue $20;
set have;
array g group:;
do i = 1 to dim(g);
    if not missing(g{i}) then do;
        gName = scan(vname(g{i}), 1, "_");        
        gExt = scan(vname(g{i}), 2, "_");
        gValue = g{i};
        output;
        end;
    end;
keep id gName gExt gValue;
run;

proc sort data=list0; by id gName gExt; run;

data list1;
do until(last.gName);
    set list0; by id gName;
    if first.gName then var = gValue;
    else output;
    end;
run;

proc sort data=list1; by id gExt;

proc transpose data=list1 out=want(drop=_name_);
by id gExt;
var gValue;
id var;
run;

proc print data=want noobs; run;

PG

Reeza · Posted 06-05-2016 04:34 AM

I'd actually suggest you go one step further and include the subject as a variable.

ID Subject Level

1 English Eng1

1 English Eng2

1 Chemistry Chem2

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away