BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mlogan
Lapis Lazuli | Level 10
Hi All,
I never had a dataset like this:
 
ID     Name    Group1     Group1_ID1   Group1_D2   Group2     Group2_ID1   Group2_D2  Group3    Group3_ID1   Group3_D2   
101   Mike     English      Eng2               Eng1             Biology     Bio1                         
102   Julia     English      Eng1               Eng2             Biology     Bio1               Bio2              Chemistry Che1             Che2             
103   Luc      English      Eng3                Eng1             Biology     Bio1              Bio2 
 
I always worked with dataset like the following. Can someone tell me the code how to convert the above dataset like below. My original dataset has 35 groups, so I need a code that is efficient.
 
ID     English  Biology   Chemistry
101  Eng2      Bio1       
101  Eng1                    Che2 
102  Eng1      Bio1
102  Eng2      Bio2
102                              Che1
102                              Che2
103  Eng3      Bio1
103  Eng1      Bio2
 
 
Thank you.
 
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
Here is :
Just change 
var group-- 
into
var group:  ;
for referring to variables.







data have;
infile cards truncover;
input ID     (Name    Group1     Group1_ID1   Group1_D2   Group2     Group2_ID1   Group2_D2  Group3    Group3_ID1   Group3_D2) (: $20.);
cards;   
101   Mike     English      Eng2               Eng1             Biology     Bio1                         
102   Julia     English      Eng1               Eng2             Biology     Bio1               Bio2              Chemistry Che1             Che2             
103   Luc      English      Eng3                Eng1             Biology     Bio1              Bio2 
;
run;

proc transpose data=have out=temp;
by id;
var group: ;
run;
data data(index=(x=(id g2)) drop=_name_) group_name(keep=_name_ col1);
 set temp;
 length g1 g2 $ 40;
 g1=upcase(scan(_name_,1,'_'));
 g2=upcase(scan(_name_,2,'_'));
 if not missing(g2) then output data;
 if missing(g2) and not missing(col1) then output group_name;
run;


proc sort data=group_name nodupkey;by _name_ col1;run;
proc sql noprint;
 select cats(_name_,'=',col1) into : rename separated by ' '
  from group_name;
quit;
%put &rename;


proc sql noprint;
 select distinct catt('data(where=(g1="',g1,'") rename=(col1=',g1,'))') 
  into : merge separated by ' '
   from data;
quit;
%put &merge;
data want(rename=(&rename));
 merge &merge;
 by id g2;
 drop g1 g2;
run;





View solution in original post

7 REPLIES 7
Jagadishkatam
Amethyst | Level 16

Please try the transpose approach

 

data have;
infile cards missover;
input ID     Name$   Group1$     Group1_ID1$   Group1_D2$   Group2$     Group2_ID1$   Group2_D2$  Group3$    Group3_ID1$   Group3_D2$;
cards;
101   Mike     English      Eng2               Eng1             Biology     Bio1                         
102   Julia     English      Eng1               Eng2             Biology     Bio1               Bio2              Chemistry Che1             Che2             
103   Luc      English      Eng3                Eng1             Biology     Bio1              Bio2 
;

proc transpose data=have out=trans1;
by id;
id group1 ;
var  Group1_ID1 Group1_D2 ;
run;

proc transpose data=have out=trans2;
by id;
id group2 ;
var  Group2_ID1 Group2_D2 ;
run;

proc transpose data=have out=trans3;
by id;
id group3 ;
var  Group3_ID1 Group3_D2 ;
run;

data all(drop=_name_);
merge trans1 trans2 trans3;
run;
Thanks,
Jag
mlogan
Lapis Lazuli | Level 10
Thanks Jaga. I have total 35 group. So transposing 35 times is not efficient. Can you please wtite a code that is more efficient please.
Jagadishkatam
Amethyst | Level 16

we could automate with do loop and call execute as below

 

option mprint;
%macro test(i);
proc transpose data=have out=trans&i;
where group&i ne '';
by id;
id group&i ;
var  Group&i._ID1 Group&i._D2 ;
run;
%mend;

Data _null_;
do i = 1 to 35;
call execute('%test('||i||')');
end;
run;

data all(drop=_name_);
merge trans: ;
run;

I am not sure if trans: (colon) will work. could you please test it.

Thanks,
Jag
Ksharp
Super User
I love this question. Why the output couldn't be :

ID	English	Biology	Chemistry
101	Eng1	 	 
101	Eng2	Bio1	 
102	Eng2	Bio2	Che2
102	Eng1	Bio1	Che1
103	Eng1	Bio2	 
103	Eng3	Bio1	 




data have;
infile cards truncover;
input ID     (Name    Group1     Group1_ID1   Group1_D2   Group2     Group2_ID1   Group2_D2  Group3    Group3_ID1   Group3_D2) (: $20.);
cards;   
101   Mike     English      Eng2               Eng1             Biology     Bio1                         
102   Julia     English      Eng1               Eng2             Biology     Bio1               Bio2              Chemistry Che1             Che2             
103   Luc      English      Eng3                Eng1             Biology     Bio1              Bio2 
;
run;

proc transpose data=have out=temp;
by id;
var group1--Group3_D2;
run;
data data(index=(x=(id g2)) drop=_name_) group_name(keep=_name_ col1);
 set temp;
 length g1 g2 $ 40;
 g1=upcase(scan(_name_,1,'_'));
 g2=upcase(scan(_name_,2,'_'));
 if not missing(g2) then output data;
 if missing(g2) and not missing(col1) then output group_name;
run;


proc sort data=group_name nodupkey;by _name_ col1;run;
proc sql noprint;
 select cats(_name_,'=',col1) into : rename separated by ' '
  from group_name;
quit;
%put &rename;


proc sql noprint;
 select distinct catt('data(where=(g1="',g1,'") rename=(col1=',g1,'))') 
  into : merge separated by ' '
   from data;
quit;
%put &merge;
data want(rename=(&rename));
 merge &merge;
 by id g2;
 drop g1 g2;
run;


Ksharp
Super User
Here is :
Just change 
var group-- 
into
var group:  ;
for referring to variables.







data have;
infile cards truncover;
input ID     (Name    Group1     Group1_ID1   Group1_D2   Group2     Group2_ID1   Group2_D2  Group3    Group3_ID1   Group3_D2) (: $20.);
cards;   
101   Mike     English      Eng2               Eng1             Biology     Bio1                         
102   Julia     English      Eng1               Eng2             Biology     Bio1               Bio2              Chemistry Che1             Che2             
103   Luc      English      Eng3                Eng1             Biology     Bio1              Bio2 
;
run;

proc transpose data=have out=temp;
by id;
var group: ;
run;
data data(index=(x=(id g2)) drop=_name_) group_name(keep=_name_ col1);
 set temp;
 length g1 g2 $ 40;
 g1=upcase(scan(_name_,1,'_'));
 g2=upcase(scan(_name_,2,'_'));
 if not missing(g2) then output data;
 if missing(g2) and not missing(col1) then output group_name;
run;


proc sort data=group_name nodupkey;by _name_ col1;run;
proc sql noprint;
 select cats(_name_,'=',col1) into : rename separated by ' '
  from group_name;
quit;
%put &rename;


proc sql noprint;
 select distinct catt('data(where=(g1="',g1,'") rename=(col1=',g1,'))') 
  into : merge separated by ' '
   from data;
quit;
%put &merge;
data want(rename=(&rename));
 merge &merge;
 by id g2;
 drop g1 g2;
run;





PGStats
Opal | Level 21

It is a good idea indeed to simplify your data structure. Another transposition approach:

 

data have;
length Group1 Group1_ID1 Group1_D2 Group2 Group2_ID1 Group2_D2 Group3 Group3_ID1 Group3_D2 $20;
infile cards missover;
input ID Name$ Group1 Group1_ID1 Group1_D2 Group2 Group2_ID1 Group2_D2 Group3 Group3_ID1 Group3_D2;
cards;
101 Mike English Eng2 Eng1 Biology Bio1 
102 Julia English Eng1 Eng2 Biology Bio1 Bio2 Chemistry Che1 Che2 
103 Luc English Eng3 Eng1 Biology Bio1 Bio2 
;

data list0;
length gName gExt gValue $20;
set have;
array g group:;
do i = 1 to dim(g);
    if not missing(g{i}) then do;
        gName = scan(vname(g{i}), 1, "_");        
        gExt = scan(vname(g{i}), 2, "_");
        gValue = g{i};
        output;
        end;
    end;
keep id gName gExt gValue;
run;

proc sort data=list0; by id gName gExt; run;

data list1;
do until(last.gName);
    set list0; by id gName;
    if first.gName then var = gValue;
    else output;
    end;
run;

proc sort data=list1; by id gExt;

proc transpose data=list1 out=want(drop=_name_);
by id gExt;
var gValue;
id var;
run;

proc print data=want noobs; run;
PG
Reeza
Super User

I'd actually suggest you go one step further and include the subject as a variable. 

 

ID Subject Level

1 English Eng1

1 English Eng2

1 Chemistry Chem2

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1194 views
  • 4 likes
  • 5 in conversation