I want the list of visits (1) to remove any duplicate visits and (2) to list in order.
data DSN;
infile datalines dlm= '*';
label ID= "Participant ID"
AVSLST= "All Visits Completed"
PPCENSVSx= "Censored Visits (have)"
PPCENSVS= "Censored Visits (want)"
;
length AVSLST PPCENSVSx PPCENSVS $30.;
input ID AVSLST $ PPCENSVSx $ PPCENSVS $;
datalines;
1 * 01, 02, 03, 04, 05, 06, 07, 08 * 06, 12, 10 * 06, 10, 12
2 * 01, 02, 03, 04, 05, 06, 07 * 04, 05, 06, 07, 04, 02 * 02, 04, 05, 06, 07
3 * 01, 02, 03, 05, 06, 07, 08 * 07, 08, 03, 07, 04 * 03, 04, 07, 08
;
run;
I ended up doing a variation of this. Since this is a subject-level dataset, it can't stay in long format.
data have;
infile datalines dlm= '*' truncover;
label
ID = "Participant ID"
AVSLST = "All Visits Completed"
PPCENSVSx = "Censored Visits (have)"
PPCENSVS = "Censored Visits (want)"
;
length AVSLST PPCENSVSx PPCENSVS $30.;
input ID :$6. AVSLST $ PPCENSVSx $ PPCENSVS $;
datalines;
1 * 01, 02, 03, 04, 05, 06, 07, 08 * 06, 12, 10 * 06, 10, 12
2 * 01, 02, 03, 04, 05, 06, 07 * 04, 05, 06, 07, 04, 02 * 02, 04, 05, 06, 07
3 * 01, 02, 03, 05, 06, 07, 08 * 07, 08, 03, 07, 04 * 03, 04, 07, 08
4
5 *01
6 * *01
;
data long;
set have;
length VISNO $2.;
do i= 1 to countc(PPCENSVSx, ',')+1;
VISNO= scan(PPCENSVSx, i);
output;
end;
proc sort data= long out= long2 nodupkeys; by ID VISNO;
data want;
length PPCENSVS $54.;
do until(last.ID);
set long2;
by ID;
PPCENSVS= catx(', ', PPCENSVS, VISNO);
end;
keep ID PPCENSVS;
run;
Transpose to a long layout, then sort with NODUPKEY.
Why do you have multiple values in a single variable at all? Generally a poor idea.
What are we supposed to compare to 'remove duplicate visits'? From which variable?
What is the maximum number of "visits" that need to be considered? Are ALL the "visits" no more than 2 digits and always numeric? The number is critical as new variables when created really should have a defined length and can't tell what may be needed.
Here is one way to just sort such as string. I am picking on the PPCENSVSx variable. Basically read each value into a variable in an array, sort the array, then stack back together.
There are other way if you tell us that ALL of the values separated by comma space will ALWAYS be the equivalent of an integer (not stated).
I would drop the i, j and V array variables after verifying things are working.
data sortstring; set dsn; array v (15) $ 2 ; do i= 1 to countw(ppcensvsx,','); v[i]= strip(scan(ppcensvsx,i,',')); end; call sortc(of v[*]); length ppsort $ 48; do j= (dim(v) -i + 2) to dim(v); ppsort = catx(', ',ppsort,v[j]); end; run;
This is an approach that uses the "visit" number as an index and will only use the index once in building a string:
data sortstring2; set dsn; array v (15) ; do i= 1 to countw(ppcensvsx,','); v[input(strip(scan(ppcensvsx,i,',')),f4.)]=1; end; length ppsort $ 48; do j= 1 to dim(v); if v[j]=1 then ppsort = catx(', ',ppsort,put(j,z2.)); end; run;
@mariko5797 wrote:
I want the list of visits (1) to remove any duplicate visits and (2) to list in order.
data DSN; infile datalines dlm= '*'; label ID= "Participant ID" AVSLST= "All Visits Completed" PPCENSVSx= "Censored Visits (have)" PPCENSVS= "Censored Visits (want)" ; length AVSLST PPCENSVSx PPCENSVS $30.; input ID AVSLST $ PPCENSVSx $ PPCENSVS $; datalines; 1 * 01, 02, 03, 04, 05, 06, 07, 08 * 06, 12, 10 * 06, 10, 12 2 * 01, 02, 03, 04, 05, 06, 07 * 04, 05, 06, 07, 04, 02 * 02, 04, 05, 06, 07 3 * 01, 02, 03, 05, 06, 07, 08 * 07, 08, 03, 07, 04 * 03, 04, 07, 08 ; run;
Like @Kurt_Bremser suggested: Transform your data to a long structure which is much easier to work with.
data have;
infile datalines dlm= '*' truncover;
label
ID = "Participant ID"
AVSLST = "All Visits Completed"
PPCENSVSx = "Censored Visits (have)"
PPCENSVS = "Censored Visits (want)"
;
length AVSLST PPCENSVSx PPCENSVS $30.;
input ID :$6. AVSLST $ PPCENSVSx $ PPCENSVS $;
datalines;
1 * 01, 02, 03, 04, 05, 06, 07, 08 * 06, 12, 10 * 06, 10, 12
2 * 01, 02, 03, 04, 05, 06, 07 * 04, 05, 06, 07, 04, 02 * 02, 04, 05, 06, 07
3 * 01, 02, 03, 05, 06, 07, 08 * 07, 08, 03, 07, 04 * 03, 04, 07, 08
4
5 *01
6 * *01
;
run;
data long;
set have;
length visit_no 8 visit_completed_flg visit_censored_flg 3;
length _vno $5 _visits $200;
_visits=catx(',',AVSLST,PPCENSVSx);
do i=1 to countc(_visits,',')+1;
_vno=scan(_visits,i);
visit_no=input(_vno,best32.);
visit_completed_flg = (findw(AVSLST,strip(_vno))>0);
visit_censored_flg = (findw(PPCENSVSx,strip(_vno))>0);
output;
end;
keep id visit_:;
run;
proc sort data=long out=want nodupkey;
by id visit_no;
run;
I ended up doing a variation of this. Since this is a subject-level dataset, it can't stay in long format.
data have;
infile datalines dlm= '*' truncover;
label
ID = "Participant ID"
AVSLST = "All Visits Completed"
PPCENSVSx = "Censored Visits (have)"
PPCENSVS = "Censored Visits (want)"
;
length AVSLST PPCENSVSx PPCENSVS $30.;
input ID :$6. AVSLST $ PPCENSVSx $ PPCENSVS $;
datalines;
1 * 01, 02, 03, 04, 05, 06, 07, 08 * 06, 12, 10 * 06, 10, 12
2 * 01, 02, 03, 04, 05, 06, 07 * 04, 05, 06, 07, 04, 02 * 02, 04, 05, 06, 07
3 * 01, 02, 03, 05, 06, 07, 08 * 07, 08, 03, 07, 04 * 03, 04, 07, 08
4
5 *01
6 * *01
;
data long;
set have;
length VISNO $2.;
do i= 1 to countc(PPCENSVSx, ',')+1;
VISNO= scan(PPCENSVSx, i);
output;
end;
proc sort data= long out= long2 nodupkeys; by ID VISNO;
data want;
length PPCENSVS $54.;
do until(last.ID);
set long2;
by ID;
PPCENSVS= catx(', ', PPCENSVS, VISNO);
end;
keep ID PPCENSVS;
run;
I think there have been discussions about this, please take a look at: Double value elimination on one cell
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.