Trying to convert a table in this format:
ID | ON_OFF | CATEGORY | Group |
1 | OFF | P | A |
1 | ON | G | A |
2 | OFF | C | A |
2 | OFF | C | B |
3 | OFF | O | A |
3 | OFF | R | B |
4 | OFF | C | A |
4 | ON | G | B |
5 | OFF | S | A |
5 | OFF | R | A |
5 | ON | G | C |
6 | ON | S | A |
6 | ON | S | B |
6 | ON | R | A |
To a single record per id table in this format:
ID | ON_OFF | CATEGORY | Group |
1 | OFF_ON | P_G | A |
2 | OFF | C | A_B |
3 | OFF | O_R | A_B |
4 | OFF_ON | C_G | A_B |
5 | OFF_ON | S_R_G | A_A_C |
6 | ON | S_R | A_B_A |
Note that in some cases, I just need the unique instances of the variable (such as with On_Off) but in other cases I need every instance (such as with Group). I have been trying first./last. processing with by statements, but I can only get that to work with one variable at a time. Is there a way to do this for N variables within the same data step? Any insights are greatly appreciated!
Data Want;
set Have;
length cat source $500;
retain cat source;
by ID notsorted category on_off;
if first.ID then call missing(cat);
if first.category then cat=catx('_',cat,category);
if first.ID then call missing(source);
if first.on_off then source=catx('_',source,on_off);
if last.;
run;
Just DO UNTIL(last.ID) will be required to achieve this
Data have;
infile datalines dlm=',' dsd missover;
input ID ON_OFF :$8. CATEGORY :$8. Group:$8.;
datalines;
1,OFF,P,A
1,ON,G,A
2,OFF,C,A
2,OFF,C,B
3,OFF,O,A
3,OFF,R,B
4,OFF,C,A
4,ON,G,B
5,OFF,S,A
5,OFF,R,A
5,ON,G,C
6,ON,S,A
6,ON,S,B
6,ON,R,A
;
run;
proc sort data=have;
by id;
run;
data want(keep=id ON_OFF CATEGORY Group);
Format ON_OFF CATEGORY Group $15.;
do until(last.id);
set have(rename=( ON_OFF=ON_OFF_ CATEGORY=CATEGORY_ Group=Group_));
by id;
ON_OFF=tranwrd(Strip(strip(ON_OFF)||' '||strip(ON_OFF_))," ","_");
CATEGORY=tranwrd(strip(strip(CATEGORY)||' '||strip(CATEGORY_))," ","_");
Group=tranwrd(strip(strip(Group)||' '||strip(Group_))," ","_");
end;
run;
Alternatively,
data test(drop=category_ Group_ ON_OFF_);
Format ON_OFF CATEGORY Group $15.;
retain on_off category group;
set have(rename=( ON_OFF=ON_OFF_ CATEGORY=CATEGORY_ Group=Group_));
by id;
if first.id then do;
ON_OFF=" ";
CATEGORY=" ";
Group=" ";
end;
on_off=catx("_",on_off_,on_off);
CATEGORY=catx("_",CATEGORY,CATEGORY_);
Group=catx('_',Group,Group_);
if last.id;
run;
Just DO UNTIL(last.ID) will be required to achieve this
Data have;
infile datalines dlm=',' dsd missover;
input ID ON_OFF :$8. CATEGORY :$8. Group:$8.;
datalines;
1,OFF,P,A
1,ON,G,A
2,OFF,C,A
2,OFF,C,B
3,OFF,O,A
3,OFF,R,B
4,OFF,C,A
4,ON,G,B
5,OFF,S,A
5,OFF,R,A
5,ON,G,C
6,ON,S,A
6,ON,S,B
6,ON,R,A
;
run;
proc sort data=have;
by id;
run;
data want(keep=id ON_OFF CATEGORY Group);
Format ON_OFF CATEGORY Group $15.;
do until(last.id);
set have(rename=( ON_OFF=ON_OFF_ CATEGORY=CATEGORY_ Group=Group_));
by id;
ON_OFF=tranwrd(Strip(strip(ON_OFF)||' '||strip(ON_OFF_))," ","_");
CATEGORY=tranwrd(strip(strip(CATEGORY)||' '||strip(CATEGORY_))," ","_");
Group=tranwrd(strip(strip(Group)||' '||strip(Group_))," ","_");
end;
run;
Alternatively,
data test(drop=category_ Group_ ON_OFF_);
Format ON_OFF CATEGORY Group $15.;
retain on_off category group;
set have(rename=( ON_OFF=ON_OFF_ CATEGORY=CATEGORY_ Group=Group_));
by id;
if first.id then do;
ON_OFF=" ";
CATEGORY=" ";
Group=" ";
end;
on_off=catx("_",on_off_,on_off);
CATEGORY=catx("_",CATEGORY,CATEGORY_);
Group=catx('_',Group,Group_);
if last.id;
run;
Data have;
infile datalines dlm=',' dsd missover;
input ID ON_OFF :$8. CATEGORY :$8. Group:$8.;
datalines;
1,OFF,P,A
1,ON,G,A
2,OFF,C,A
2,OFF,C,B
3,OFF,O,A
3,OFF,R,B
4,OFF,C,A
4,ON,G,B
5,OFF,S,A
5,OFF,R,A
5,ON,G,C
6,ON,S,A
6,ON,S,B
6,ON,R,A
;
run;
data want;
array x1{999} $ 100 _temporary_;
array x2{999} $ 100 _temporary_;
array x3{999} $ 100 _temporary_;
call missing(of x1{*} x2{*} x3{*});
i1=0;i2=0;i3=0;
do until(last.id);
set have;
by id;
if ON_OFF not in x1 then do; i1+1;x1{i1}=ON_OFF;end;
if CATEGORY not in x2 then do; i2+1;x2{i2}=CATEGORY;end;
i3+1;x3{i3}=Group;
end;
new_ON_OFF=catx('_',of x1{*});
new_CATEGORY=catx('_',of x2{*});
new_Group=catx('_',of x3{*});
drop ON_OFF CATEGORY Group i1 i2 i3;
run;
Alternatively you could try the below code
data want;
length new_on_off new_CATEGORY new_group $100;
set have;
by id ;
retain new_on_off new_CATEGORY new_group;
if first.id then new_on_off='';
if first.id then new_CATEGORY='';
if first.id then new_group='';
if on_off ne '' and new_on_off ne on_off then new_on_off=catx('_',new_on_off,on_off);
if CATEGORY ne '' and new_CATEGORY ne CATEGORY then new_CATEGORY=catx('_',new_CATEGORY,CATEGORY);
if Group ne '' and new_group ne Group then new_group=catx('_',new_group,Group);
if last.id;
keep new_on_off new_CATEGORY new_group;
run;
For the cases where you just need the unique values do you need them in any order? The order the first appear? Lexical order?
You can use @Ksharp 's trick of temporary arrays to support all of your combinations. Then if you did want the distinct values ordered you could add a line to use CALL SORTC() to sort the distinct values. So for example to list the CATEGORY values in lexical order and the ON_OFF values in whatever order they appear you program might look like this:
data want;
array x1(999) $8 _temporary_;
array x2(999) $8 _temporary_;
array x3(999) $8 _temporary_;
call missing(of x1(*) x2(*) x3(*));
do i=1 by 1 until(last.id);
set have;
by id;
if ON_OFF not in x1 then x1(i)=ON_OFF ;
if CATEGORY not in x2 then x2(i)=CATEGORY ;
x3(i)=Group;
end;
length new_ON_OFF new_CATEGORY new_GROUP $200;
call sortc(of x2(*));
new_ON_OFF=catx('_',of x1(*));
new_CATEGORY=catx('_',of x2(*));
new_Group=catx('_',of x3(*));
drop ON_OFF CATEGORY Group i ;
run;
Data Want;
Set Have;
Retain On_Off2 Cat2 Group2 Count;
By Id Notsorted;
Groups = Lag(Group);
If First.Id Then
Do;
On_Off2 = On_Off;
Cat2 = Category;
Group2 = Group;
Count = 1;
End;
Else
Do;
If Group ^= Groups Then Count + 1;
If On_Off ^= On_Off2 Then On_Off2 = Catx('_', On_Off2, On_Off);
If Category ^= Cat2 Then Cat2 = Catx('_', Cat2, Category);
Group2 = Catx('_', Group2, Group);
End;
If Last.Id;
If Count = 1 Then Group2 = Substr(Group2, 1, 1);
Keep ID On_Off2 Cat2 Group2;
Rename On_Off2 = On_Off Cat2 = Category Group2 = Group;
Run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.