Hi all, I am trying to summarise some visit data.
This is the data I have:
USUBJID VISIT DATE MODULE
101 SCREENING 2000-01-02
101 SCREENING 2000-01-02
101 UNSCHEDULED 2000-01-02 VS
101 UNSCHEDULED 2000-01-02 VS
101 UNSCHEDULED 2000-01-02 LB
101 UNSCHEDULED 2000-01-04 EG
102 SCREENING 2000-01-05
102 VISIT 1 2000-01-06
102 UNSCHEDULED 2000-01-06 VS
102 UNSCHEDULED 2000-01-07 LB
103 VISIT 1 2000-10-06
103 VISIT 1 2000-10-06
This is what I would like. I would like one row per USUBJID/VISIT/DATE. However MODULE needs to be concatenated with commas, only listing distinct values for that USUBJID/VISIT/DATE
USUBJID VISIT DATE MODULE
101 SCREENING 2000-01-02
101 UNSCHEDULED 2000-01-02 VS, LB
101 UNSCHEDULED 2000-01-04 EG
102 SCREENING 2000-01-05
102 VISIT 1 2000-01-06
102 UNSCHEDULED 2000-01-06 VS
102 UNSCHEDULED 2000-01-07 LB
103 VISIT 1 2000-10-06
I would appreciate any help.
Thanks!
Do you really have strings that look like dates instead of dates? I guess not.
I had to dow two tweaks to the FINDW function: stripping the blanks off _module, and using a blank as alternate delimiter:
data have;
infile datalines dlm="," truncover;
input usubjid :$20. visit :$20. date :yymmdd10. module :$20.;
format date yymmdd10.;
datalines;
101,SCREENING,2000-01-02,EG1
101,SCREENING,2000-01-02
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,LB
101,UNSCHEDULED,2000-01-04,EG
102,SCREENING,2000-01-05
102,VISIT 1,2000-01-06
102,UNSCHEDULED,2000-01-06,VS
102,UNSCHEDULED,2000-01-07,LB
103,VISIT 1,2000-10-06,VS
103,VISIT 1,2000-10-06,VS
;
proc sort data=have;
by usubjid visit date;
run;
data want;
set have (rename=(module=_module));
by usubjid visit date;
length module $10; * set as needed for maximum number of modules;
retain module;
if first.date then module = "";
if findw(module,strip(_module),", ") = 0 then module = catx(",",module,_module);
if last.date;
drop _module;
run;
What is your question?
I just added the question, thanks.
First sort:
proc sort data=have;
by usubjid visit date;
run;
then use a data step with a new retained variable:
data want;
set have (rename=(module=_module));
by usubjid visit date;
length module $10; * set as needed for maximum number of modules;
retain module;
if first.date then module = "";
if findw(module,_module,",") = 0 then module = catx(",",module,_module);
if last.date;
drop _module;
run;
Codes are untested; for tested code, supply example data in a (working!) data step with datalines.
Thankyou Kurt. I will try your code soon and flag as the solution.
Hi Kurt
I tried your code. However I end up with duplicate values of MODULE e.g. VS,VS,LB. I would only need the non-duplicate values e.g. VS,LB.
I've supplied some datalines here to investigate:
data have;
INFILE DATALINES DLM="," missover;
input USUBJID :$20. VISIT :$20. DATE :$20. MODULE :$20.;
datalines;
101,SCREENING,2000-01-02,EG1
101,SCREENING,2000-01-02
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,LB
101,UNSCHEDULED,2000-01-04,EG
102,SCREENING,2000-01-05
102,VISIT 1,2000-01-06
102,UNSCHEDULED,2000-01-06,VS
102,UNSCHEDULED,2000-01-07,LB
103,VISIT 1,2000-10-06,VS
103,VISIT 1,2000-10-06,VS
;
proc sort data=have;
by usubjid visit date;
run;
data want;
set have (rename=(module=_module));
by usubjid visit date;
length module $10; * set as needed for maximum number of modules;
retain module;
if first.date then module = "";
if findw(module,_module,",") = 0 then module = catx(",",module,_module);
if last.date;
drop _module;
run;
Do you really have strings that look like dates instead of dates? I guess not.
I had to dow two tweaks to the FINDW function: stripping the blanks off _module, and using a blank as alternate delimiter:
data have;
infile datalines dlm="," truncover;
input usubjid :$20. visit :$20. date :yymmdd10. module :$20.;
format date yymmdd10.;
datalines;
101,SCREENING,2000-01-02,EG1
101,SCREENING,2000-01-02
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,LB
101,UNSCHEDULED,2000-01-04,EG
102,SCREENING,2000-01-05
102,VISIT 1,2000-01-06
102,UNSCHEDULED,2000-01-06,VS
102,UNSCHEDULED,2000-01-07,LB
103,VISIT 1,2000-10-06,VS
103,VISIT 1,2000-10-06,VS
;
proc sort data=have;
by usubjid visit date;
run;
data want;
set have (rename=(module=_module));
by usubjid visit date;
length module $10; * set as needed for maximum number of modules;
retain module;
if first.date then module = "";
if findw(module,strip(_module),", ") = 0 then module = catx(",",module,_module);
if last.date;
drop _module;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.