Solved: Re: Extract VARIABLE NAME and concatenate with other variable names ba...

saslove · Posted 04-12-2019 06:30 PM

I have a dataset that looks like this

currentstudy Study1ID Study2ID Study3ID Study4ID

001 180 780 278 560

002 790 520

003 231

As you can see, each patient can be enrolled on multiple other studies.

What I need is a dataset that looks like below.

Basically, concatenate the column names/study names of the other studies that same patient is enrolled in.

currentstudy Other_enrolled_studies

001 Study1,Study2,Study3,Study4

002 Study2,Study4

003 Study 3

and so on for several hundred patients across multiple studies.

Thanks very much,

Please help.

Reeza · Posted 04-12-2019 06:35 PM

data want;
set have;

length studies $200;

array study(*) study: ;

do i=1 to dim(study);

if not missing(study(i)) then do;
      studies = catx(", ", studies, vname(study(i)));
end;

run;

1. Declare an array of all studys

2. Loop through and find non missing entries

3. Find variable name (vname)

4. append to list of 'studies'

@saslove wrote:

I have a dataset that looks like this

currentstudy Study1ID Study2ID Study3ID Study4ID

001 180 780 278 560

002 790 520

003 231

As you can see, each patient can be enrolled on multiple other studies.

What I need is a dataset that looks like below.

Basically, concatenate the column names/study names of the other studies that same patient is enrolled in.

currentstudy Other_enrolled_studies

001 Study1,Study2,Study3,Study4

002 Study2,Study4

003 Study 3

and so on for several hundred patients across multiple studies.

Thanks very much,

Please help.

View solution in original post

Reeza · Posted 04-12-2019 06:35 PM

data want;
set have;

length studies $200;

array study(*) study: ;

do i=1 to dim(study);

if not missing(study(i)) then do;
      studies = catx(", ", studies, vname(study(i)));
end;

run;

1. Declare an array of all studys

2. Loop through and find non missing entries

3. Find variable name (vname)

4. append to list of 'studies'

@saslove wrote:

I have a dataset that looks like this

currentstudy Study1ID Study2ID Study3ID Study4ID

001 180 780 278 560

002 790 520

003 231

As you can see, each patient can be enrolled on multiple other studies.

What I need is a dataset that looks like below.

Basically, concatenate the column names/study names of the other studies that same patient is enrolled in.

currentstudy Other_enrolled_studies

001 Study1,Study2,Study3,Study4

002 Study2,Study4

003 Study 3

and so on for several hundred patients across multiple studies.

Thanks very much,

Please help.

saslove · Posted 04-12-2019 06:44 PM

That worked very well. Thanks Reeza

Jagadishkatam · Posted 04-13-2019 04:29 AM

An alternative way

data want;
set have;
array var1(*) Study1ID Study2ID Study3ID Study4ID;
array var2(*)$ Study1IDc Study2IDc Study3IDc Study4IDc;
do i = 1 to dim(var1);
if var1(i) ne . then var2(i)=vname(var1(i));
newvar=catx(',',of var2(*));
end;
run;

Thanks,
Jag

Reeza · Posted 04-15-2019 09:06 PM

@Jagadishkatam wrote:

An alternative way

data want;
set have;
array var1(*) Study1ID Study2ID Study3ID Study4ID;
array var2(*)$ Study1IDc Study2IDc Study3IDc Study4IDc;
do i = 1 to dim(var1);
if var1(i) ne . then var2(i)=vname(var1(i));
newvar=catx(',',of var2(*));
end;
run;

@Jagadishkatam should the newer line be outside of the loop?

Jagadishkatam · Posted 04-16-2019 07:51 AM

@Reeza , thanks for your respons. I tested the code on sashelp.class and seem like it is working. Appreciate your thoughts.

data want;
set sashelp.class;
array var1(*) age height weight ;
array var2(*)$ agec heightc weightc;
do i = 1 to dim(var1);
if var1(i) ne . then var2(i)=vname(var1(i));
newvar=catx(',',of var2(*));
end;
run;

Thanks,
Jag

Reeza · Posted 04-16-2019 10:52 AM

Technically it works, because it gets replaced at each iteration of the I loop, including the last, which is the same as doing it after the loop is finished. It's just not really logical to do it when it's not needed. If the data set is small this has no impact, if the data set is larger it's an inefficiency that slows things down.

Extract VARIABLE NAME and concatenate with other variable names based on condition

Re: Extract VARIABLE NAME and concatenate with other variable names based on condition

Re: Extract VARIABLE NAME and concatenate with other variable names based on condition

Re: Extract VARIABLE NAME and concatenate with other variable names based on condition

Re: Extract VARIABLE NAME and concatenate with other variable names based on condition

Re: Extract VARIABLE NAME and concatenate with other variable names based on condition

Re: Extract VARIABLE NAME and concatenate with other variable names based on condition

Re: Extract VARIABLE NAME and concatenate with other variable names based on condition

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!