Trying to convert a table in this format:
ID | ON_OFF | CATEGORY | Group |
1 | OFF | P | A |
1 | ON | G | A |
2 | OFF | C | A |
2 | OFF | C | B |
3 | OFF | O | A |
3 | OFF | R | B |
4 | OFF | C | A |
4 | ON | G | B |
5 | OFF | S | A |
5 | OFF | R | A |
5 | ON | G | C |
6 | ON | S | A |
6 | ON | S | B |
6 | ON | R | A |
To a single record per id table in this format:
ID | ON_OFF | CATEGORY | Group |
1 | OFF_ON | P_G | A |
2 | OFF | C | A_B |
3 | OFF | O_R | A_B |
4 | OFF_ON | C_G | A_B |
5 | OFF_ON | S_R_G | A_A_C |
6 | ON | S_R | A_B_A |
Note that in some cases, I just need the unique instances of the variable (such as with On_Off) but in other cases I need every instance (such as with Group). I have been trying first./last. processing with by statements, but I can only get that to work with one variable at a time. Is there a way to do this for N variables within the same data step? Any insights are greatly appreciated!
Data Want;
set Have;
length cat source $500;
retain cat source;
by ID notsorted category on_off;
if first.ID then call missing(cat);
if first.category then cat=catx('_',cat,category);
if first.ID then call missing(source);
if first.on_off then source=catx('_',source,on_off);
if last.;
run;
Just DO UNTIL(last.ID) will be required to achieve this
Data have;
infile datalines dlm=',' dsd missover;
input ID ON_OFF :$8. CATEGORY :$8. Group:$8.;
datalines;
1,OFF,P,A
1,ON,G,A
2,OFF,C,A
2,OFF,C,B
3,OFF,O,A
3,OFF,R,B
4,OFF,C,A
4,ON,G,B
5,OFF,S,A
5,OFF,R,A
5,ON,G,C
6,ON,S,A
6,ON,S,B
6,ON,R,A
;
run;
proc sort data=have;
by id;
run;
data want(keep=id ON_OFF CATEGORY Group);
Format ON_OFF CATEGORY Group $15.;
do until(last.id);
set have(rename=( ON_OFF=ON_OFF_ CATEGORY=CATEGORY_ Group=Group_));
by id;
ON_OFF=tranwrd(Strip(strip(ON_OFF)||' '||strip(ON_OFF_))," ","_");
CATEGORY=tranwrd(strip(strip(CATEGORY)||' '||strip(CATEGORY_))," ","_");
Group=tranwrd(strip(strip(Group)||' '||strip(Group_))," ","_");
end;
run;
Alternatively,
data test(drop=category_ Group_ ON_OFF_);
Format ON_OFF CATEGORY Group $15.;
retain on_off category group;
set have(rename=( ON_OFF=ON_OFF_ CATEGORY=CATEGORY_ Group=Group_));
by id;
if first.id then do;
ON_OFF=" ";
CATEGORY=" ";
Group=" ";
end;
on_off=catx("_",on_off_,on_off);
CATEGORY=catx("_",CATEGORY,CATEGORY_);
Group=catx('_',Group,Group_);
if last.id;
run;
Just DO UNTIL(last.ID) will be required to achieve this
Data have;
infile datalines dlm=',' dsd missover;
input ID ON_OFF :$8. CATEGORY :$8. Group:$8.;
datalines;
1,OFF,P,A
1,ON,G,A
2,OFF,C,A
2,OFF,C,B
3,OFF,O,A
3,OFF,R,B
4,OFF,C,A
4,ON,G,B
5,OFF,S,A
5,OFF,R,A
5,ON,G,C
6,ON,S,A
6,ON,S,B
6,ON,R,A
;
run;
proc sort data=have;
by id;
run;
data want(keep=id ON_OFF CATEGORY Group);
Format ON_OFF CATEGORY Group $15.;
do until(last.id);
set have(rename=( ON_OFF=ON_OFF_ CATEGORY=CATEGORY_ Group=Group_));
by id;
ON_OFF=tranwrd(Strip(strip(ON_OFF)||' '||strip(ON_OFF_))," ","_");
CATEGORY=tranwrd(strip(strip(CATEGORY)||' '||strip(CATEGORY_))," ","_");
Group=tranwrd(strip(strip(Group)||' '||strip(Group_))," ","_");
end;
run;
Alternatively,
data test(drop=category_ Group_ ON_OFF_);
Format ON_OFF CATEGORY Group $15.;
retain on_off category group;
set have(rename=( ON_OFF=ON_OFF_ CATEGORY=CATEGORY_ Group=Group_));
by id;
if first.id then do;
ON_OFF=" ";
CATEGORY=" ";
Group=" ";
end;
on_off=catx("_",on_off_,on_off);
CATEGORY=catx("_",CATEGORY,CATEGORY_);
Group=catx('_',Group,Group_);
if last.id;
run;
Data have;
infile datalines dlm=',' dsd missover;
input ID ON_OFF :$8. CATEGORY :$8. Group:$8.;
datalines;
1,OFF,P,A
1,ON,G,A
2,OFF,C,A
2,OFF,C,B
3,OFF,O,A
3,OFF,R,B
4,OFF,C,A
4,ON,G,B
5,OFF,S,A
5,OFF,R,A
5,ON,G,C
6,ON,S,A
6,ON,S,B
6,ON,R,A
;
run;
data want;
array x1{999} $ 100 _temporary_;
array x2{999} $ 100 _temporary_;
array x3{999} $ 100 _temporary_;
call missing(of x1{*} x2{*} x3{*});
i1=0;i2=0;i3=0;
do until(last.id);
set have;
by id;
if ON_OFF not in x1 then do; i1+1;x1{i1}=ON_OFF;end;
if CATEGORY not in x2 then do; i2+1;x2{i2}=CATEGORY;end;
i3+1;x3{i3}=Group;
end;
new_ON_OFF=catx('_',of x1{*});
new_CATEGORY=catx('_',of x2{*});
new_Group=catx('_',of x3{*});
drop ON_OFF CATEGORY Group i1 i2 i3;
run;
Alternatively you could try the below code
data want;
length new_on_off new_CATEGORY new_group $100;
set have;
by id ;
retain new_on_off new_CATEGORY new_group;
if first.id then new_on_off='';
if first.id then new_CATEGORY='';
if first.id then new_group='';
if on_off ne '' and new_on_off ne on_off then new_on_off=catx('_',new_on_off,on_off);
if CATEGORY ne '' and new_CATEGORY ne CATEGORY then new_CATEGORY=catx('_',new_CATEGORY,CATEGORY);
if Group ne '' and new_group ne Group then new_group=catx('_',new_group,Group);
if last.id;
keep new_on_off new_CATEGORY new_group;
run;
For the cases where you just need the unique values do you need them in any order? The order the first appear? Lexical order?
You can use @Ksharp 's trick of temporary arrays to support all of your combinations. Then if you did want the distinct values ordered you could add a line to use CALL SORTC() to sort the distinct values. So for example to list the CATEGORY values in lexical order and the ON_OFF values in whatever order they appear you program might look like this:
data want;
array x1(999) $8 _temporary_;
array x2(999) $8 _temporary_;
array x3(999) $8 _temporary_;
call missing(of x1(*) x2(*) x3(*));
do i=1 by 1 until(last.id);
set have;
by id;
if ON_OFF not in x1 then x1(i)=ON_OFF ;
if CATEGORY not in x2 then x2(i)=CATEGORY ;
x3(i)=Group;
end;
length new_ON_OFF new_CATEGORY new_GROUP $200;
call sortc(of x2(*));
new_ON_OFF=catx('_',of x1(*));
new_CATEGORY=catx('_',of x2(*));
new_Group=catx('_',of x3(*));
drop ON_OFF CATEGORY Group i ;
run;
Data Want;
Set Have;
Retain On_Off2 Cat2 Group2 Count;
By Id Notsorted;
Groups = Lag(Group);
If First.Id Then
Do;
On_Off2 = On_Off;
Cat2 = Category;
Group2 = Group;
Count = 1;
End;
Else
Do;
If Group ^= Groups Then Count + 1;
If On_Off ^= On_Off2 Then On_Off2 = Catx('_', On_Off2, On_Off);
If Category ^= Cat2 Then Cat2 = Catx('_', Cat2, Category);
Group2 = Catx('_', Group2, Group);
End;
If Last.Id;
If Count = 1 Then Group2 = Substr(Group2, 1, 1);
Keep ID On_Off2 Cat2 Group2;
Rename On_Off2 = On_Off Cat2 = Category Group2 = Group;
Run;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.