BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ybz12003
Rhodochrosite | Level 12

Hello Experts,

 

I would like to remove all the strings which contains 'UGESTAGE' in the sample dataset 'Have.'    I am looking for the result like the dataset 'Want.'   Please help me to approach it, thanks.

 

data Have;
      infile datalines delimiter='#';
  input Comments : $200.  ;
datalines;
	UGESTAGE=34 5/7 WKS #
	CHOLESTASIS,     UGESTAGE = 35WKS,  ANEMIA#
	UGESTAGE=33.4 WKS , ROP #
	ASD, UGESTAGE= 24WKS
;
run;

data Want;
      infile datalines delimiter='/';
  input Disease : $300.  ;
datalines;
	/ /
	CHOLESTASIS, ANEMIA /
	ROP /
	/ASD /
;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
jimbarbour
Meteorite | Level 14

No, this isn't a replacement; this is a concatenation.  The CATX function is concatenating the strings together separated by a comma.  With CATX, the first argument specifies the separator.  The slash is your end of record delimiter, but a comma separates each "token" in your records.

 

The program first determines how many string tokens there are by counting the comma separators (and adding one since there is no comma after the final string token).

 

_Strings					=	COUNT(Comments, ',')	+	1;

 

 

Then the program goes through the tokens, one at a time.  Each token will be concatenated into the output record unless the token contains 'UGESTAGE' in which case it will be skipped.

 

	DO	_i						=	1	TO	_Strings;
		IF	NOT	INDEX(SCAN(Comments, _i), 'UGESTAGE')	THEN
			DO;
				Output_Comment	=	CATX(', ', STRIP(Output_Comment), STRIP(SCAN(Comments, _i, ',')));
			END;
	END;

 

 

Jim

View solution in original post

5 REPLIES 5
jimbarbour
Meteorite | Level 14

The SCAN function should come in handy in this case; code example, below.

 

Do you really want to keep the record if all strings are removed?  I deleted it, but the code can be easily modified to output the record even if all of the contents have been deleted.

 

I didn't include it in the below SAS code, but of course if your data is anything but upper case, you would need to use lowcase() or upcase() to make sure that the comparison is case insensitive.

data Have;
      infile datalines delimiter='#';
  input Comments : $200.  ;
datalines;
	UGESTAGE=34 5/7 WKS #
	CHOLESTASIS,     UGESTAGE = 35WKS,  ANEMIA#
	UGESTAGE=33.4 WKS , ROP #
	ASD, UGESTAGE= 24WKS
;
run;

Data	Want(RENAME=(Output_Comment=Comments));
	DROP	_:;
	DROP	Comments;
	LENGTH	Output_Comment	$256;
	SET	Have;
	_Strings					=	COUNT(Comments, ',')	+	1;

	DO	_i						=	1	TO	_Strings;
		IF	NOT	INDEX(SCAN(Comments, _i), 'UGESTAGE')	THEN
			DO;
				Output_Comment	=	CATX(', ', STRIP(Output_Comment), STRIP(SCAN(Comments, _i, ',')));
			END;
	END;

	IF	LENGTHN(Output_Comment)	=	0	THEN
		DELETE;
	ELSE
		Output_Comment			=	CAT(STRIP(Output_Comment), ' /');
run;

Results:

jimbarbour_0-1601681120823.png

 

Jim

ybz12003
Rhodochrosite | Level 12
In your result, the first row is missing. It should be blank for the first obs.
jimbarbour
Meteorite | Level 14

Yes, it was deleted.  See my comments in my first response.

 

So you really need it?  No problem.  Just comment out the DELETE logic like the below.  See results below that.  How does that look?

Data	Want(RENAME=(Output_Comment=Comments));
	DROP	_:;
	DROP	Comments;
	LENGTH	Output_Comment	$256;
	SET	Have;
	_Strings					=	COUNT(Comments, ',')	+	1;

	DO	_i						=	1	TO	_Strings;
		IF	NOT	INDEX(SCAN(Comments, _i), 'UGESTAGE')	THEN
			DO;
				Output_Comment	=	CATX(', ', STRIP(Output_Comment), STRIP(SCAN(Comments, _i, ',')));
			END;
	END;

/*	IF	LENGTHN(Output_Comment)	=	0	THEN*/
/*		DELETE;*/
/*	ELSE*/
		Output_Comment			=	CAT(STRIP(Output_Comment), ' /');
run;

Results:

jimbarbour_0-1601737741808.png

 

Jim

 

ybz12003
Rhodochrosite | Level 12

Hello,

 

In your codes below, is the STRIP(Output_Comment) used to replace the ' STRIP(SCAN(Comments, _i, ','))'?

 

CATX(', ', STRIP(Output_Comment), STRIP(SCAN(Comments, _i, ',')));

jimbarbour
Meteorite | Level 14

No, this isn't a replacement; this is a concatenation.  The CATX function is concatenating the strings together separated by a comma.  With CATX, the first argument specifies the separator.  The slash is your end of record delimiter, but a comma separates each "token" in your records.

 

The program first determines how many string tokens there are by counting the comma separators (and adding one since there is no comma after the final string token).

 

_Strings					=	COUNT(Comments, ',')	+	1;

 

 

Then the program goes through the tokens, one at a time.  Each token will be concatenated into the output record unless the token contains 'UGESTAGE' in which case it will be skipped.

 

	DO	_i						=	1	TO	_Strings;
		IF	NOT	INDEX(SCAN(Comments, _i), 'UGESTAGE')	THEN
			DO;
				Output_Comment	=	CATX(', ', STRIP(Output_Comment), STRIP(SCAN(Comments, _i, ',')));
			END;
	END;

 

 

Jim

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1570 views
  • 1 like
  • 2 in conversation