BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kalbo
Obsidian | Level 7

Hi all, I am trying to summarise some visit data.

 

This is the data I have:

 

USUBJID  VISIT                   DATE            MODULE
101           SCREENING       2000-01-02
101           SCREENING       2000-01-02
101           UNSCHEDULED 2000-01-02   VS

101           UNSCHEDULED 2000-01-02   VS
101           UNSCHEDULED 2000-01-02   LB
101           UNSCHEDULED 2000-01-04   EG
102           SCREENING       2000-01-05
102           VISIT 1                2000-01-06
102           UNSCHEDULED 2000-01-06   VS
102           UNSCHEDULED 2000-01-07   LB
103           VISIT 1                2000-10-06
103           VISIT 1                2000-10-06

 

This is what I would like. I would like one row per USUBJID/VISIT/DATE. However MODULE needs to be concatenated with commas, only listing distinct values for that USUBJID/VISIT/DATE


USUBJID     VISIT                     DATE             MODULE
101              SCREENING         2000-01-02
101              UNSCHEDULED   2000-01-02    VS, LB
101              UNSCHEDULED   2000-01-04    EG
102              SCREENING         2000-01-05
102              VISIT 1                  2000-01-06
102              UNSCHEDULED   2000-01-06     VS
102              UNSCHEDULED   2000-01-07     LB
103              VISIT 1                   2000-10-06

 

I would appreciate any help.

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Do you really have strings that look like dates instead of dates? I guess not.

I had to dow two tweaks to the FINDW function: stripping the blanks off _module, and using a blank as alternate delimiter:

data have;
infile datalines dlm="," truncover;
input usubjid :$20. visit :$20. date :yymmdd10. module :$20.;
format date yymmdd10.;
datalines;
101,SCREENING,2000-01-02,EG1
101,SCREENING,2000-01-02
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,LB
101,UNSCHEDULED,2000-01-04,EG
102,SCREENING,2000-01-05
102,VISIT 1,2000-01-06
102,UNSCHEDULED,2000-01-06,VS
102,UNSCHEDULED,2000-01-07,LB
103,VISIT 1,2000-10-06,VS
103,VISIT 1,2000-10-06,VS
;

proc sort data=have;
by usubjid visit date;
run;

data want;
set have (rename=(module=_module));
by usubjid visit date;
length module $10; * set as needed for maximum number of modules;
retain module;
if first.date then module = "";
if findw(module,strip(_module),", ") = 0 then module = catx(",",module,_module);
if last.date;
drop _module;
run;

View solution in original post

6 REPLIES 6
PeterClemmensen
Tourmaline | Level 20

What is your question?

kalbo
Obsidian | Level 7

I just added the question, thanks.

Kurt_Bremser
Super User

First sort:

proc sort data=have;
by usubjid visit date;
run;

then use a data step with a new retained variable:

data want;
set have (rename=(module=_module));
by usubjid visit date;
length module $10; * set as needed for maximum number of modules;
retain module;
if first.date then module = "";
if findw(module,_module,",") = 0 then module = catx(",",module,_module);
if last.date;
drop _module;
run;

Codes are untested; for tested code, supply example data in a (working!) data step with datalines.

kalbo
Obsidian | Level 7

Thankyou Kurt. I will try your code soon and flag as the solution.

kalbo
Obsidian | Level 7

Hi Kurt

 

I tried your code. However I end up with duplicate values of MODULE e.g. VS,VS,LB. I would only need the non-duplicate values e.g. VS,LB.

 

I've supplied some datalines here to investigate:


data have;
INFILE DATALINES DLM="," missover;
input USUBJID :$20. VISIT :$20. DATE :$20. MODULE :$20.;
datalines;
101,SCREENING,2000-01-02,EG1
101,SCREENING,2000-01-02
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,LB
101,UNSCHEDULED,2000-01-04,EG
102,SCREENING,2000-01-05
102,VISIT 1,2000-01-06
102,UNSCHEDULED,2000-01-06,VS
102,UNSCHEDULED,2000-01-07,LB
103,VISIT 1,2000-10-06,VS
103,VISIT 1,2000-10-06,VS
;

proc sort data=have;
by usubjid visit date;
run;

data want;
set have (rename=(module=_module));
by usubjid visit date;
length module $10; * set as needed for maximum number of modules;
retain module;
if first.date then module = "";
if findw(module,_module,",") = 0 then module = catx(",",module,_module);
if last.date;
drop _module;
run;

Kurt_Bremser
Super User

Do you really have strings that look like dates instead of dates? I guess not.

I had to dow two tweaks to the FINDW function: stripping the blanks off _module, and using a blank as alternate delimiter:

data have;
infile datalines dlm="," truncover;
input usubjid :$20. visit :$20. date :yymmdd10. module :$20.;
format date yymmdd10.;
datalines;
101,SCREENING,2000-01-02,EG1
101,SCREENING,2000-01-02
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,VS
101,UNSCHEDULED,2000-01-02,LB
101,UNSCHEDULED,2000-01-04,EG
102,SCREENING,2000-01-05
102,VISIT 1,2000-01-06
102,UNSCHEDULED,2000-01-06,VS
102,UNSCHEDULED,2000-01-07,LB
103,VISIT 1,2000-10-06,VS
103,VISIT 1,2000-10-06,VS
;

proc sort data=have;
by usubjid visit date;
run;

data want;
set have (rename=(module=_module));
by usubjid visit date;
length module $10; * set as needed for maximum number of modules;
retain module;
if first.date then module = "";
if findw(module,strip(_module),", ") = 0 then module = catx(",",module,_module);
if last.date;
drop _module;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1284 views
  • 1 like
  • 3 in conversation