Solved: Sorting within a string

mariko5797 · Posted 05-08-2024 04:13 PM

I want the list of visits (1) to remove any duplicate visits and (2) to list in order.

data DSN;
 infile datalines dlm= '*'; 
 label 	ID= "Participant ID"
 	  	AVSLST= "All Visits Completed"
		PPCENSVSx= "Censored Visits (have)"
		PPCENSVS= "Censored Visits (want)"
		;
 length	AVSLST PPCENSVSx PPCENSVS $30.;
 input 	ID AVSLST $ PPCENSVSx $ PPCENSVS $;
 datalines;
	1	* 01, 02, 03, 04, 05, 06, 07, 08	* 06, 12, 10 				* 06, 10, 12
	2	* 01, 02, 03, 04, 05, 06, 07		* 04, 05, 06, 07, 04, 02	* 02, 04, 05, 06, 07
	3	* 01, 02, 03, 05, 06, 07, 08		* 07, 08, 03, 07, 04		* 03, 04, 07, 08
	;
run;

mariko5797 · Posted 05-09-2024 11:48 AM

I ended up doing a variation of this. Since this is a subject-level dataset, it can't stay in long format.

data have;
  infile datalines dlm= '*' truncover; 
  label
  ID        = "Participant ID"
  AVSLST    = "All Visits Completed"
  PPCENSVSx = "Censored Visits (have)"
  PPCENSVS  = "Censored Visits (want)"
  ;
  length AVSLST PPCENSVSx PPCENSVS $30.;
  input ID :$6. AVSLST $ PPCENSVSx $ PPCENSVS $;
  datalines;
	1 * 01, 02, 03, 04, 05, 06, 07, 08 * 06, 12, 10             * 06, 10, 12
	2 * 01, 02, 03, 04, 05, 06, 07     * 04, 05, 06, 07, 04, 02 * 02, 04, 05, 06, 07
	3 * 01, 02, 03, 05, 06, 07, 08     * 07, 08, 03, 07, 04     * 03, 04, 07, 08
	4
	5 *01
	6 * *01
	;
data long;
 set have;
 length VISNO $2.;
	 do i= 1 to countc(PPCENSVSx, ',')+1;
	 	VISNO= scan(PPCENSVSx, i);
		output;
	 end;
proc sort data= long out= long2 nodupkeys; by ID VISNO;
data want;
 length PPCENSVS $54.;
 	do until(last.ID);
		set long2;
		by ID;
			PPCENSVS= catx(', ', PPCENSVS, VISNO);
	end;
 keep ID PPCENSVS;
run;

View solution in original post

Kurt_Bremser · Posted 05-08-2024 05:35 PM

Transpose to a long layout, then sort with NODUPKEY.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ballardw · Posted 05-08-2024 05:58 PM

Why do you have multiple values in a single variable at all? Generally a poor idea.

What are we supposed to compare to 'remove duplicate visits'? From which variable?

What is the maximum number of "visits" that need to be considered? Are ALL the "visits" no more than 2 digits and always numeric? The number is critical as new variables when created really should have a defined length and can't tell what may be needed.

Here is one way to just sort such as string. I am picking on the PPCENSVSx variable. Basically read each value into a variable in an array, sort the array, then stack back together.

There are other way if you tell us that ALL of the values separated by comma space will ALWAYS be the equivalent of an integer (not stated).

I would drop the i, j and V array variables after verifying things are working.

data sortstring;
   set dsn;
   array v (15) $ 2 ;
   do i= 1 to countw(ppcensvsx,',');
      v[i]= strip(scan(ppcensvsx,i,','));
   end;
   call sortc(of v[*]);
   length ppsort $ 48;
   do j= (dim(v) -i + 2) to dim(v);
      ppsort = catx(', ',ppsort,v[j]);
   end;
run;

This is an approach that uses the "visit" number as an index and will only use the index once in building a string:

data sortstring2;
   set dsn;
   array v (15)  ;
   do i= 1 to countw(ppcensvsx,',');
      v[input(strip(scan(ppcensvsx,i,',')),f4.)]=1;
   end;
   length ppsort $ 48;
   do j= 1 to dim(v);
      if v[j]=1 then ppsort = catx(', ',ppsort,put(j,z2.));
   end;
run;

@mariko5797 wrote:

I want the list of visits (1) to remove any duplicate visits and (2) to list in order.

data DSN;
 infile datalines dlm= '*'; 
 label 	ID= "Participant ID"
 	  	AVSLST= "All Visits Completed"
		PPCENSVSx= "Censored Visits (have)"
		PPCENSVS= "Censored Visits (want)"
		;
 length	AVSLST PPCENSVSx PPCENSVS $30.;
 input 	ID AVSLST $ PPCENSVSx $ PPCENSVS $;
 datalines;
	1	* 01, 02, 03, 04, 05, 06, 07, 08	* 06, 12, 10 				* 06, 10, 12
	2	* 01, 02, 03, 04, 05, 06, 07		* 04, 05, 06, 07, 04, 02	* 02, 04, 05, 06, 07
	3	* 01, 02, 03, 05, 06, 07, 08		* 07, 08, 03, 07, 04		* 03, 04, 07, 08
	;
run;

Patrick · Posted 05-08-2024 06:48 PM

Like @Kurt_Bremser suggested: Transform your data to a long structure which is much easier to work with.

data have;
  infile datalines dlm= '*' truncover; 
  label
  ID        = "Participant ID"
  AVSLST    = "All Visits Completed"
  PPCENSVSx = "Censored Visits (have)"
  PPCENSVS  = "Censored Visits (want)"
  ;
  length AVSLST PPCENSVSx PPCENSVS $30.;
  input ID :$6. AVSLST $ PPCENSVSx $ PPCENSVS $;
  datalines;
1 * 01, 02, 03, 04, 05, 06, 07, 08 * 06, 12, 10             * 06, 10, 12
2 * 01, 02, 03, 04, 05, 06, 07     * 04, 05, 06, 07, 04, 02 * 02, 04, 05, 06, 07
3 * 01, 02, 03, 05, 06, 07, 08     * 07, 08, 03, 07, 04     * 03, 04, 07, 08
4
5 *01
6 * *01
;
run;

data long;
  set have;
  length visit_no 8 visit_completed_flg visit_censored_flg 3;
  length _vno $5 _visits $200;
  _visits=catx(',',AVSLST,PPCENSVSx);
  do i=1 to countc(_visits,',')+1;
    _vno=scan(_visits,i);
    visit_no=input(_vno,best32.);
    visit_completed_flg = (findw(AVSLST,strip(_vno))>0);
    visit_censored_flg  = (findw(PPCENSVSx,strip(_vno))>0);
    output;
  end;
  keep id visit_:;
run;

proc sort data=long out=want nodupkey;
  by id visit_no;
run;

mariko5797 · Posted 05-09-2024 11:48 AM

I ended up doing a variation of this. Since this is a subject-level dataset, it can't stay in long format.

data have;
  infile datalines dlm= '*' truncover; 
  label
  ID        = "Participant ID"
  AVSLST    = "All Visits Completed"
  PPCENSVSx = "Censored Visits (have)"
  PPCENSVS  = "Censored Visits (want)"
  ;
  length AVSLST PPCENSVSx PPCENSVS $30.;
  input ID :$6. AVSLST $ PPCENSVSx $ PPCENSVS $;
  datalines;
	1 * 01, 02, 03, 04, 05, 06, 07, 08 * 06, 12, 10             * 06, 10, 12
	2 * 01, 02, 03, 04, 05, 06, 07     * 04, 05, 06, 07, 04, 02 * 02, 04, 05, 06, 07
	3 * 01, 02, 03, 05, 06, 07, 08     * 07, 08, 03, 07, 04     * 03, 04, 07, 08
	4
	5 *01
	6 * *01
	;
data long;
 set have;
 length VISNO $2.;
	 do i= 1 to countc(PPCENSVSx, ',')+1;
	 	VISNO= scan(PPCENSVSx, i);
		output;
	 end;
proc sort data= long out= long2 nodupkeys; by ID VISNO;
data want;
 length PPCENSVS $54.;
 	do until(last.ID);
		set long2;
		by ID;
			PPCENSVS= catx(', ', PPCENSVS, VISNO);
	end;
 keep ID PPCENSVS;
run;

whymath · Posted 05-08-2024 10:19 PM

I think there have been discussions about this, please take a look at: Double value elimination on one cell

Sorting within a string

Re: Sorting within a string

Re: Sorting within a string

Re: Sorting within a string

Re: Sorting within a string

Re: Sorting within a string

Re: Sorting within a string

Sorting within a string

Re: Sorting within a string

Re: Sorting within a string

Re: Sorting within a string

Re: Sorting within a string

Re: Sorting within a string

Re: Sorting within a string

Registration is open

SAS Training: Just a Click Away