BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
mariko5797
Pyrite | Level 9

I want the list of visits (1) to remove any duplicate visits and (2) to list in order.

data DSN;
 infile datalines dlm= '*'; 
 label 	ID= "Participant ID"
 	  	AVSLST= "All Visits Completed"
		PPCENSVSx= "Censored Visits (have)"
		PPCENSVS= "Censored Visits (want)"
		;
 length	AVSLST PPCENSVSx PPCENSVS $30.;
 input 	ID AVSLST $ PPCENSVSx $ PPCENSVS $;
 datalines;
	1	* 01, 02, 03, 04, 05, 06, 07, 08	* 06, 12, 10 				* 06, 10, 12
	2	* 01, 02, 03, 04, 05, 06, 07		* 04, 05, 06, 07, 04, 02	* 02, 04, 05, 06, 07
	3	* 01, 02, 03, 05, 06, 07, 08		* 07, 08, 03, 07, 04		* 03, 04, 07, 08
	;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
mariko5797
Pyrite | Level 9

I ended up doing a variation of this. Since this is a subject-level dataset, it can't stay in long format.

data have;
  infile datalines dlm= '*' truncover; 
  label
  ID        = "Participant ID"
  AVSLST    = "All Visits Completed"
  PPCENSVSx = "Censored Visits (have)"
  PPCENSVS  = "Censored Visits (want)"
  ;
  length AVSLST PPCENSVSx PPCENSVS $30.;
  input ID :$6. AVSLST $ PPCENSVSx $ PPCENSVS $;
  datalines;
	1 * 01, 02, 03, 04, 05, 06, 07, 08 * 06, 12, 10             * 06, 10, 12
	2 * 01, 02, 03, 04, 05, 06, 07     * 04, 05, 06, 07, 04, 02 * 02, 04, 05, 06, 07
	3 * 01, 02, 03, 05, 06, 07, 08     * 07, 08, 03, 07, 04     * 03, 04, 07, 08
	4
	5 *01
	6 * *01
	;
data long;
 set have;
 length VISNO $2.;
	 do i= 1 to countc(PPCENSVSx, ',')+1;
	 	VISNO= scan(PPCENSVSx, i);
		output;
	 end;
proc sort data= long out= long2 nodupkeys; by ID VISNO;
data want;
 length PPCENSVS $54.;
 	do until(last.ID);
		set long2;
		by ID;
			PPCENSVS= catx(', ', PPCENSVS, VISNO);
	end;
 keep ID PPCENSVS;
run;

View solution in original post

5 REPLIES 5
ballardw
Super User

Why do you have multiple values in a single variable at all? Generally a poor idea.

What are we supposed to compare to 'remove duplicate visits'? From which variable?

What is the maximum number of "visits" that need to be considered? Are ALL the "visits" no more than 2 digits and always numeric? The number is critical as new variables when created really should have a defined length and can't tell what may be needed.

 

Here is one way to just sort such as string. I am picking on the PPCENSVSx variable. Basically read each value into a variable in an array, sort the array, then stack back together.

There are other way if you tell us that ALL of the values separated by comma space will ALWAYS be the equivalent of an integer (not stated).

 

I would drop the i, j and V array variables after verifying things are working.

 

data sortstring;
   set dsn;
   array v (15) $ 2 ;
   do i= 1 to countw(ppcensvsx,',');
      v[i]= strip(scan(ppcensvsx,i,','));
   end;
   call sortc(of v[*]);
   length ppsort $ 48;
   do j= (dim(v) -i + 2) to dim(v);
      ppsort = catx(', ',ppsort,v[j]);
   end;
run;

 

 

This is an approach that uses the "visit" number as an index and will only use the index once in building a string:

data sortstring2;
   set dsn;
   array v (15)  ;
   do i= 1 to countw(ppcensvsx,',');
      v[input(strip(scan(ppcensvsx,i,',')),f4.)]=1;
   end;
   length ppsort $ 48;
   do j= 1 to dim(v);
      if v[j]=1 then ppsort = catx(', ',ppsort,put(j,z2.));
   end;
run;

 


@mariko5797 wrote:

I want the list of visits (1) to remove any duplicate visits and (2) to list in order.

data DSN;
 infile datalines dlm= '*'; 
 label 	ID= "Participant ID"
 	  	AVSLST= "All Visits Completed"
		PPCENSVSx= "Censored Visits (have)"
		PPCENSVS= "Censored Visits (want)"
		;
 length	AVSLST PPCENSVSx PPCENSVS $30.;
 input 	ID AVSLST $ PPCENSVSx $ PPCENSVS $;
 datalines;
	1	* 01, 02, 03, 04, 05, 06, 07, 08	* 06, 12, 10 				* 06, 10, 12
	2	* 01, 02, 03, 04, 05, 06, 07		* 04, 05, 06, 07, 04, 02	* 02, 04, 05, 06, 07
	3	* 01, 02, 03, 05, 06, 07, 08		* 07, 08, 03, 07, 04		* 03, 04, 07, 08
	;
run;

 

Patrick
Opal | Level 21

Like @Kurt_Bremser suggested: Transform your data to a long structure which is much easier to work with.

data have;
  infile datalines dlm= '*' truncover; 
  label
  ID        = "Participant ID"
  AVSLST    = "All Visits Completed"
  PPCENSVSx = "Censored Visits (have)"
  PPCENSVS  = "Censored Visits (want)"
  ;
  length AVSLST PPCENSVSx PPCENSVS $30.;
  input ID :$6. AVSLST $ PPCENSVSx $ PPCENSVS $;
  datalines;
1 * 01, 02, 03, 04, 05, 06, 07, 08 * 06, 12, 10             * 06, 10, 12
2 * 01, 02, 03, 04, 05, 06, 07     * 04, 05, 06, 07, 04, 02 * 02, 04, 05, 06, 07
3 * 01, 02, 03, 05, 06, 07, 08     * 07, 08, 03, 07, 04     * 03, 04, 07, 08
4
5 *01
6 * *01
;
run;

data long;
  set have;
  length visit_no 8 visit_completed_flg visit_censored_flg 3;
  length _vno $5 _visits $200;
  _visits=catx(',',AVSLST,PPCENSVSx);
  do i=1 to countc(_visits,',')+1;
    _vno=scan(_visits,i);
    visit_no=input(_vno,best32.);
    visit_completed_flg = (findw(AVSLST,strip(_vno))>0);
    visit_censored_flg  = (findw(PPCENSVSx,strip(_vno))>0);
    output;
  end;
  keep id visit_:;
run;

proc sort data=long out=want nodupkey;
  by id visit_no;
run;

 

mariko5797
Pyrite | Level 9

I ended up doing a variation of this. Since this is a subject-level dataset, it can't stay in long format.

data have;
  infile datalines dlm= '*' truncover; 
  label
  ID        = "Participant ID"
  AVSLST    = "All Visits Completed"
  PPCENSVSx = "Censored Visits (have)"
  PPCENSVS  = "Censored Visits (want)"
  ;
  length AVSLST PPCENSVSx PPCENSVS $30.;
  input ID :$6. AVSLST $ PPCENSVSx $ PPCENSVS $;
  datalines;
	1 * 01, 02, 03, 04, 05, 06, 07, 08 * 06, 12, 10             * 06, 10, 12
	2 * 01, 02, 03, 04, 05, 06, 07     * 04, 05, 06, 07, 04, 02 * 02, 04, 05, 06, 07
	3 * 01, 02, 03, 05, 06, 07, 08     * 07, 08, 03, 07, 04     * 03, 04, 07, 08
	4
	5 *01
	6 * *01
	;
data long;
 set have;
 length VISNO $2.;
	 do i= 1 to countc(PPCENSVSx, ',')+1;
	 	VISNO= scan(PPCENSVSx, i);
		output;
	 end;
proc sort data= long out= long2 nodupkeys; by ID VISNO;
data want;
 length PPCENSVS $54.;
 	do until(last.ID);
		set long2;
		by ID;
			PPCENSVS= catx(', ', PPCENSVS, VISNO);
	end;
 keep ID PPCENSVS;
run;
whymath
Lapis Lazuli | Level 10

I think there have been discussions about this, please take a look at: Double value elimination on one cell 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 836 views
  • 2 likes
  • 5 in conversation