Solved: How to remove a fixed string from the text?

ybz12003 · Posted 10-02-2020 07:00 PM

Hello Experts,

I would like to remove all the strings which contains 'UGESTAGE' in the sample dataset 'Have.' I am looking for the result like the dataset 'Want.' Please help me to approach it, thanks.

data Have;
      infile datalines delimiter='#';
  input Comments : $200.  ;
datalines;
	UGESTAGE=34 5/7 WKS #
	CHOLESTASIS,     UGESTAGE = 35WKS,  ANEMIA#
	UGESTAGE=33.4 WKS , ROP #
	ASD, UGESTAGE= 24WKS
;
run;

data Want;
      infile datalines delimiter='/';
  input Disease : $300.  ;
datalines;
	/ /
	CHOLESTASIS, ANEMIA /
	ROP /
	/ASD /
;
run;

jimbarbour · Posted 10-06-2020 12:20 AM

No, this isn't a replacement; this is a concatenation. The CATX function is concatenating the strings together separated by a comma. With CATX, the first argument specifies the separator. The slash is your end of record delimiter, but a comma separates each "token" in your records.

The program first determines how many string tokens there are by counting the comma separators (and adding one since there is no comma after the final string token).

_Strings					=	COUNT(Comments, ',')	+	1;

Then the program goes through the tokens, one at a time. Each token will be concatenated into the output record unless the token contains 'UGESTAGE' in which case it will be skipped.

	DO	_i						=	1	TO	_Strings;
		IF	NOT	INDEX(SCAN(Comments, _i), 'UGESTAGE')	THEN
			DO;
				Output_Comment	=	CATX(', ', STRIP(Output_Comment), STRIP(SCAN(Comments, _i, ',')));
			END;
	END;

Jim

View solution in original post

jimbarbour · Posted 10-02-2020 07:23 PM

The SCAN function should come in handy in this case; code example, below.

Do you really want to keep the record if all strings are removed? I deleted it, but the code can be easily modified to output the record even if all of the contents have been deleted.

I didn't include it in the below SAS code, but of course if your data is anything but upper case, you would need to use lowcase() or upcase() to make sure that the comparison is case insensitive.

data Have;
      infile datalines delimiter='#';
  input Comments : $200.  ;
datalines;
	UGESTAGE=34 5/7 WKS #
	CHOLESTASIS,     UGESTAGE = 35WKS,  ANEMIA#
	UGESTAGE=33.4 WKS , ROP #
	ASD, UGESTAGE= 24WKS
;
run;

Data	Want(RENAME=(Output_Comment=Comments));
	DROP	_:;
	DROP	Comments;
	LENGTH	Output_Comment	$256;
	SET	Have;
	_Strings					=	COUNT(Comments, ',')	+	1;

	DO	_i						=	1	TO	_Strings;
		IF	NOT	INDEX(SCAN(Comments, _i), 'UGESTAGE')	THEN
			DO;
				Output_Comment	=	CATX(', ', STRIP(Output_Comment), STRIP(SCAN(Comments, _i, ',')));
			END;
	END;

	IF	LENGTHN(Output_Comment)	=	0	THEN
		DELETE;
	ELSE
		Output_Comment			=	CAT(STRIP(Output_Comment), ' /');
run;

Results:

Jim

ybz12003 · Posted 10-03-2020 10:24 AM

In your result, the first row is missing. It should be blank for the first obs.

jimbarbour · Posted 10-03-2020 11:09 AM

Yes, it was deleted. See my comments in my first response.

So you really need it? No problem. Just comment out the DELETE logic like the below. See results below that. How does that look?

Data	Want(RENAME=(Output_Comment=Comments));
	DROP	_:;
	DROP	Comments;
	LENGTH	Output_Comment	$256;
	SET	Have;
	_Strings					=	COUNT(Comments, ',')	+	1;

	DO	_i						=	1	TO	_Strings;
		IF	NOT	INDEX(SCAN(Comments, _i), 'UGESTAGE')	THEN
			DO;
				Output_Comment	=	CATX(', ', STRIP(Output_Comment), STRIP(SCAN(Comments, _i, ',')));
			END;
	END;

/*	IF	LENGTHN(Output_Comment)	=	0	THEN*/
/*		DELETE;*/
/*	ELSE*/
		Output_Comment			=	CAT(STRIP(Output_Comment), ' /');
run;

Results:

Jim

ybz12003 · Posted 10-05-2020 04:22 PM

Hello,

In your codes below, is the STRIP(Output_Comment) used to replace the ' STRIP(SCAN(Comments, _i, ','))'?

CATX(', ', STRIP(Output_Comment), STRIP(SCAN(Comments, _i, ',')));

jimbarbour · Posted 10-06-2020 12:20 AM

No, this isn't a replacement; this is a concatenation. The CATX function is concatenating the strings together separated by a comma. With CATX, the first argument specifies the separator. The slash is your end of record delimiter, but a comma separates each "token" in your records.

The program first determines how many string tokens there are by counting the comma separators (and adding one since there is no comma after the final string token).

_Strings					=	COUNT(Comments, ',')	+	1;

Then the program goes through the tokens, one at a time. Each token will be concatenated into the output record unless the token contains 'UGESTAGE' in which case it will be skipped.

	DO	_i						=	1	TO	_Strings;
		IF	NOT	INDEX(SCAN(Comments, _i), 'UGESTAGE')	THEN
			DO;
				Output_Comment	=	CATX(', ', STRIP(Output_Comment), STRIP(SCAN(Comments, _i, ',')));
			END;
	END;

Jim

How to remove a fixed string from the text?

Re: How to remove a fixed string from the text?

Re: How to remove a fixed string from the text?

Re: How to remove a fixed string from the text?

Re: How to remove a fixed string from the text?

Re: How to remove a fixed string from the text?

Re: How to remove a fixed string from the text?

How to remove a fixed string from the text?

Re: How to remove a fixed string from the text?

Re: How to remove a fixed string from the text?

Re: How to remove a fixed string from the text?

Re: How to remove a fixed string from the text?

Re: How to remove a fixed string from the text?

Re: How to remove a fixed string from the text?

Registration is open

SAS Training: Just a Click Away