Hi all, I am trying to summarise some visit data.
This is the data I have:
USUBJID VISIT DATE MODULE
101 SCREENING 2000-01-02
101 SCREENING 2000-01-02
101 UNSCHEDULED 2000-01-02 VS
101 UNSCHEDULED 2000-01-02 VS
101 UNSCHEDULED 2000-01-02 LB
101 UNSCHEDULED 2000-01-04 EG
102 SCREENING 2000-01-05
102 VISIT 1 2000-01-06
102 UNSCHEDULED 2000-01-06 VS
102 UNSCHEDULED 2000-01-07 LB
103 VISIT 1 2000-10-06
103 VISIT 1 2000-10-06
This is what I would like. I would like one row per USUBJID/VISIT/DATE. However MODULE needs to be concatenated with commas, only listing distinct values for that USUBJID/VISIT/DATE
USUBJID VISIT DATE MODULE
101 SCREENING 2000-01-02
101 UNSCHEDULED 2000-01-02 VS, LB
101 UNSCHEDULED 2000-01-04 EG
102 SCREENING 2000-01-05
102 VISIT 1 2000-01-06
102 UNSCHEDULED 2000-01-06 VS
102 UNSCHEDULED 2000-01-07 LB
103 VISIT 1 2000-10-06
I would appreciate any help.
Thanks!
Do you really have strings that look like dates instead of dates? I guess not.
I had to dow two tweaks to the FINDW function: stripping the blanks off _module, and using a blank as alternate delimiter:
data have;
infile datalines dlm="," truncover;
input usubjid :$20. visit :$20. date :yymmdd10. module :$20.;
format date yymmdd10.;
datalines;
101,SCREENING,2000-01-02,EG1
101,SCREENING,2000-01-02
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,LB
101,UNSCHEDULED,2000-01-04,EG
102,SCREENING,2000-01-05
102,VISIT 1,2000-01-06
102,UNSCHEDULED,2000-01-06,VS
102,UNSCHEDULED,2000-01-07,LB
103,VISIT 1,2000-10-06,VS
103,VISIT 1,2000-10-06,VS
;
proc sort data=have;
by usubjid visit date;
run;
data want;
set have (rename=(module=_module));
by usubjid visit date;
length module $10; * set as needed for maximum number of modules;
retain module;
if first.date then module = "";
if findw(module,strip(_module),", ") = 0 then module = catx(",",module,_module);
if last.date;
drop _module;
run;
What is your question?
I just added the question, thanks.
First sort:
proc sort data=have;
by usubjid visit date;
run;
then use a data step with a new retained variable:
data want;
set have (rename=(module=_module));
by usubjid visit date;
length module $10; * set as needed for maximum number of modules;
retain module;
if first.date then module = "";
if findw(module,_module,",") = 0 then module = catx(",",module,_module);
if last.date;
drop _module;
run;
Codes are untested; for tested code, supply example data in a (working!) data step with datalines.
Thankyou Kurt. I will try your code soon and flag as the solution.
Hi Kurt
I tried your code. However I end up with duplicate values of MODULE e.g. VS,VS,LB. I would only need the non-duplicate values e.g. VS,LB.
I've supplied some datalines here to investigate:
data have;
INFILE DATALINES DLM="," missover;
input USUBJID :$20. VISIT :$20. DATE :$20. MODULE :$20.;
datalines;
101,SCREENING,2000-01-02,EG1
101,SCREENING,2000-01-02
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,LB
101,UNSCHEDULED,2000-01-04,EG
102,SCREENING,2000-01-05
102,VISIT 1,2000-01-06
102,UNSCHEDULED,2000-01-06,VS
102,UNSCHEDULED,2000-01-07,LB
103,VISIT 1,2000-10-06,VS
103,VISIT 1,2000-10-06,VS
;
proc sort data=have;
by usubjid visit date;
run;
data want;
set have (rename=(module=_module));
by usubjid visit date;
length module $10; * set as needed for maximum number of modules;
retain module;
if first.date then module = "";
if findw(module,_module,",") = 0 then module = catx(",",module,_module);
if last.date;
drop _module;
run;
Do you really have strings that look like dates instead of dates? I guess not.
I had to dow two tweaks to the FINDW function: stripping the blanks off _module, and using a blank as alternate delimiter:
data have;
infile datalines dlm="," truncover;
input usubjid :$20. visit :$20. date :yymmdd10. module :$20.;
format date yymmdd10.;
datalines;
101,SCREENING,2000-01-02,EG1
101,SCREENING,2000-01-02
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,LB
101,UNSCHEDULED,2000-01-04,EG
102,SCREENING,2000-01-05
102,VISIT 1,2000-01-06
102,UNSCHEDULED,2000-01-06,VS
102,UNSCHEDULED,2000-01-07,LB
103,VISIT 1,2000-10-06,VS
103,VISIT 1,2000-10-06,VS
;
proc sort data=have;
by usubjid visit date;
run;
data want;
set have (rename=(module=_module));
by usubjid visit date;
length module $10; * set as needed for maximum number of modules;
retain module;
if first.date then module = "";
if findw(module,strip(_module),", ") = 0 then module = catx(",",module,_module);
if last.date;
drop _module;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.