DATA Step, Macro, Functions and more

data with optional variables

Posts: 0

data with optional variables

Hi, everyone.

I got a txt file with millions of observations. There are three variables for each obs, like name subname and number.
However, subname is optional without any special notice.

part of the data:

John (tech) (43) Johnson(econ) (32) Julian (24) Justin (34) Jo (math) (32)
Julia(econ) (33) June (93)

How can I manipulate this data?

Thank you for your time.

Super User
Posts: 5,256

Re: data with optional variables

This can be done in a number of ways. One is to read your data into three variables, then checking if your second variable contains any digits (using ANYDIGIT function), if so move the contents to the third variable.

Data never sleeps
Respected Advisor
Posts: 3,887

Re: data with optional variables

Hi Jun

Also the problem is in general not too difficult to solve I can think of a few challenges which might occur on how your raw data look like.
Is the record structure really the way you show it to us (several 'observations' in one line)? Could it be that the name is missing (quite possible if there are millions of records) and that therefore you could have 2 to 4 consecutive values in brackets belonging to 2 different 'observations'?

Please let me know as the concrete solution will depend on how the data looks like.

The easiest way I can think about right now is to use Regular Expressions (funcions PRX.. in SAS) to decide which substring makes up an 'observation' - but Regular Expressions need also some practice to use and understand.

Cheers, Patrick
Posts: 0

Re: data with optional variables

a big thank you to linux and patrick
With your recommendations, I find my problem. It focus on the structure of the raw data, cauz the raw data is too rough.
It is solved now. Thank you for your time!

Ask a Question
Discussion stats
  • 3 replies
  • 3 in conversation