Solved: Looking for single symbols in the texts?

ybz12003 · Posted 11-17-2022 08:58 AM

Hello,

I have a sample dataset list below. I'm looking to list the SINGLE symbols in my final result. Is there a way to accomplish that?

data Charge_Details;  
	length Charge $200; 
	infile datalines delimiter='$'; 
	input Charge;  
	datalines;                     
	HB ACETAMINOPHEN 160MG/5ML TYLEN ADULT/PED > 4.0$
	HB ALBUTEROL 0.83MG/ML INH SOL(P$
	HB 5%DEXTROSE/1/2NS+KCL 10MEQ/L 1$
	HB SUPERVISED COUGH & SUCTION - 1/2 HR$
;   

data Symbol;
	set Charge_Details;
	symbol=compress(strip(upcase(Charge)), ' ' ,'A');
run;

data Want;  
	length Symbols $200; 
	infile datalines delimiter='$'; 
	input Symbols;  
	datalines;                     
	/>(%+&-$
;

Kurt_Bremser · Posted 11-17-2022 09:27 AM

Just an idea:

data Symbol;
  set Charge_Details;
  symbol = compress(strip(upcase(Charge)),'/>(%+&-','k');
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

PeterClemmensen · Posted 11-17-2022 09:26 AM

So you want a single line with all the distinct special characters from all lines in the input data, correct? 🙂

The DATA to DATA Step Macro
Blog: SASnrd

ybz12003 · Posted 11-17-2022 09:52 AM

Yes

Kurt_Bremser · Posted 11-17-2022 11:33 AM

So you need to do this:

define all the "special" characters you are looking for
scan each line for the characters and mark those found
at the end of the dataset, output those that were found

I would define two temporary arrays, one with elements of length $1 which keeps your characters to look for, and another numeric array of same size initialized to 0s. In each observation, scan with FINDC, and set the corresponding element in the numeric array to 1.

When through the dataset, ouput all characters which have a 1 in their corresponding numeric element.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kurt_Bremser · Posted 11-17-2022 09:27 AM

Just an idea:

data Symbol;
  set Charge_Details;
  symbol = compress(strip(upcase(Charge)),'/>(%+&-','k');
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ybz12003 · Posted 11-17-2022 09:54 AM

I have over 100 thousand lines in the actual dataset, I won't be able to review each line to define which special symbols.

mkeintz · Posted 11-17-2022 10:08 AM

@ybz12003 wrote:
I have over 100 thousand lines in the actual dataset, I won't be able to review each line to define which special symbols.

You don't need to review all the lines.

But you do need to have foreknowledge of either (1) all symbols that might be classified as "special", (whether they will be encountered in the data or not), or (2) all symbols that are not classified as "special".

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ybz12003 · Posted 11-17-2022 10:21 AM

Still, I need to go through all the way down the whole column to look for which are included or not.

Tom · Posted 11-17-2022 10:42 AM

@ybz12003 wrote:
Still, I need to go through all the way down the whole column to look for which are included or not.

Which WHAT. Please clarify what you are trying to do.

Are you trying to count how many observations have of ANY "special" characters? Are you trying to count how many have EACH of the individual special characters? Are you just trying to generate a list of special characters that appear anywhere?

Are you trying to subset the data? How? Keep the records with special characters? Keep the records without special characters?

Are you trying to remove the special characters?

Replace them in some way?

Looking for single symbols in the texts?

Re: Looking for single symbols in the texts?

Re: Looking for single symbols in the texts?

Re: Looking for single symbols in the texts?

Re: Looking for single symbols in the texts?

Re: Looking for single symbols in the texts?

Re: Looking for single symbols in the texts?

Re: Looking for single symbols in the texts?

Re: Looking for single symbols in the texts?

Re: Looking for single symbols in the texts?

Registration is open

SAS Training: Just a Click Away