BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ybz12003
Rhodochrosite | Level 12

Hello, 

I have a sample dataset list below.  I'm looking to list the SINGLE symbols in my final result.  Is there a way to accomplish that?

data Charge_Details;  
	length Charge $200; 
	infile datalines delimiter='$'; 
	input Charge;  
	datalines;                     
	HB ACETAMINOPHEN 160MG/5ML TYLEN ADULT/PED > 4.0$
	HB ALBUTEROL 0.83MG/ML INH SOL(P$
	HB 5%DEXTROSE/1/2NS+KCL 10MEQ/L 1$
	HB SUPERVISED COUGH & SUCTION - 1/2 HR$
;   

data Symbol;
	set Charge_Details;
	symbol=compress(strip(upcase(Charge)), ' ' ,'A');
run;

data Want;  
	length Symbols $200; 
	infile datalines delimiter='$'; 
	input Symbols;  
	datalines;                     
	/>(%+&-$
;   
1 ACCEPTED SOLUTION
8 REPLIES 8
PeterClemmensen
Tourmaline | Level 20

So you want a single line with all the distinct special characters from all lines in the input data, correct? 🙂

ybz12003
Rhodochrosite | Level 12
Yes
Kurt_Bremser
Super User

So you need to do this:

  1. define all the "special" characters you are looking for
  2. scan each line for the characters and mark those found
  3. at the end of the dataset, output those that were found

I would define two temporary arrays, one with elements of length $1 which keeps your characters to look for, and another numeric array of same size initialized to 0s. In each observation, scan with FINDC, and set the corresponding element in the numeric array to 1.

When through the dataset, ouput all characters which have a 1 in their corresponding numeric element.

 

ybz12003
Rhodochrosite | Level 12
I have over 100 thousand lines in the actual dataset, I won't be able to review each line to define which special symbols.
mkeintz
PROC Star

@ybz12003 wrote:
I have over 100 thousand lines in the actual dataset, I won't be able to review each line to define which special symbols.

You don't need to review all the lines.

But you do need to have foreknowledge of either (1) all symbols that might be classified as "special", (whether they will be encountered in the data or not), or (2) all symbols that are not classified as "special".

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ybz12003
Rhodochrosite | Level 12
Still, I need to go through all the way down the whole column to look for which are included or not.
Tom
Super User Tom
Super User

@ybz12003 wrote:
Still, I need to go through all the way down the whole column to look for which are included or not.

Which WHAT.  Please clarify what you are trying to do.

 

Are you trying to count how many observations have of ANY "special" characters?  Are you trying to count how many have EACH of the individual special characters?  Are you just trying to generate a list of special characters that appear anywhere?

 

Are you trying to subset the data?  How?  Keep the records with special characters?  Keep the records without special characters?

 

Are you trying to remove the special characters? 

Replace them in some way?

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1398 views
  • 3 likes
  • 5 in conversation