BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kalbo
Obsidian | Level 7

Hi all, I am trying to summarise some visit data.

 

This is the data I have:

 

USUBJID  VISIT                   DATE            MODULE
101           SCREENING       2000-01-02
101           SCREENING       2000-01-02
101           UNSCHEDULED 2000-01-02   VS

101           UNSCHEDULED 2000-01-02   VS
101           UNSCHEDULED 2000-01-02   LB
101           UNSCHEDULED 2000-01-04   EG
102           SCREENING       2000-01-05
102           VISIT 1                2000-01-06
102           UNSCHEDULED 2000-01-06   VS
102           UNSCHEDULED 2000-01-07   LB
103           VISIT 1                2000-10-06
103           VISIT 1                2000-10-06

 

This is what I would like. I would like one row per USUBJID/VISIT/DATE. However MODULE needs to be concatenated with commas, only listing distinct values for that USUBJID/VISIT/DATE


USUBJID     VISIT                     DATE             MODULE
101              SCREENING         2000-01-02
101              UNSCHEDULED   2000-01-02    VS, LB
101              UNSCHEDULED   2000-01-04    EG
102              SCREENING         2000-01-05
102              VISIT 1                  2000-01-06
102              UNSCHEDULED   2000-01-06     VS
102              UNSCHEDULED   2000-01-07     LB
103              VISIT 1                   2000-10-06

 

I would appreciate any help.

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Do you really have strings that look like dates instead of dates? I guess not.

I had to dow two tweaks to the FINDW function: stripping the blanks off _module, and using a blank as alternate delimiter:

data have;
infile datalines dlm="," truncover;
input usubjid :$20. visit :$20. date :yymmdd10. module :$20.;
format date yymmdd10.;
datalines;
101,SCREENING,2000-01-02,EG1
101,SCREENING,2000-01-02
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,LB
101,UNSCHEDULED,2000-01-04,EG
102,SCREENING,2000-01-05
102,VISIT 1,2000-01-06
102,UNSCHEDULED,2000-01-06,VS
102,UNSCHEDULED,2000-01-07,LB
103,VISIT 1,2000-10-06,VS
103,VISIT 1,2000-10-06,VS
;

proc sort data=have;
by usubjid visit date;
run;

data want;
set have (rename=(module=_module));
by usubjid visit date;
length module $10; * set as needed for maximum number of modules;
retain module;
if first.date then module = "";
if findw(module,strip(_module),", ") = 0 then module = catx(",",module,_module);
if last.date;
drop _module;
run;

View solution in original post

6 REPLIES 6
PeterClemmensen
Tourmaline | Level 20

What is your question?

kalbo
Obsidian | Level 7

I just added the question, thanks.

Kurt_Bremser
Super User

First sort:

proc sort data=have;
by usubjid visit date;
run;

then use a data step with a new retained variable:

data want;
set have (rename=(module=_module));
by usubjid visit date;
length module $10; * set as needed for maximum number of modules;
retain module;
if first.date then module = "";
if findw(module,_module,",") = 0 then module = catx(",",module,_module);
if last.date;
drop _module;
run;

Codes are untested; for tested code, supply example data in a (working!) data step with datalines.

kalbo
Obsidian | Level 7

Thankyou Kurt. I will try your code soon and flag as the solution.

kalbo
Obsidian | Level 7

Hi Kurt

 

I tried your code. However I end up with duplicate values of MODULE e.g. VS,VS,LB. I would only need the non-duplicate values e.g. VS,LB.

 

I've supplied some datalines here to investigate:


data have;
INFILE DATALINES DLM="," missover;
input USUBJID :$20. VISIT :$20. DATE :$20. MODULE :$20.;
datalines;
101,SCREENING,2000-01-02,EG1
101,SCREENING,2000-01-02
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,LB
101,UNSCHEDULED,2000-01-04,EG
102,SCREENING,2000-01-05
102,VISIT 1,2000-01-06
102,UNSCHEDULED,2000-01-06,VS
102,UNSCHEDULED,2000-01-07,LB
103,VISIT 1,2000-10-06,VS
103,VISIT 1,2000-10-06,VS
;

proc sort data=have;
by usubjid visit date;
run;

data want;
set have (rename=(module=_module));
by usubjid visit date;
length module $10; * set as needed for maximum number of modules;
retain module;
if first.date then module = "";
if findw(module,_module,",") = 0 then module = catx(",",module,_module);
if last.date;
drop _module;
run;

Kurt_Bremser
Super User

Do you really have strings that look like dates instead of dates? I guess not.

I had to dow two tweaks to the FINDW function: stripping the blanks off _module, and using a blank as alternate delimiter:

data have;
infile datalines dlm="," truncover;
input usubjid :$20. visit :$20. date :yymmdd10. module :$20.;
format date yymmdd10.;
datalines;
101,SCREENING,2000-01-02,EG1
101,SCREENING,2000-01-02
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,LB
101,UNSCHEDULED,2000-01-04,EG
102,SCREENING,2000-01-05
102,VISIT 1,2000-01-06
102,UNSCHEDULED,2000-01-06,VS
102,UNSCHEDULED,2000-01-07,LB
103,VISIT 1,2000-10-06,VS
103,VISIT 1,2000-10-06,VS
;

proc sort data=have;
by usubjid visit date;
run;

data want;
set have (rename=(module=_module));
by usubjid visit date;
length module $10; * set as needed for maximum number of modules;
retain module;
if first.date then module = "";
if findw(module,strip(_module),", ") = 0 then module = catx(",",module,_module);
if last.date;
drop _module;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1096 views
  • 1 like
  • 3 in conversation