BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi, everyone.

I got a txt file with millions of observations. There are three variables for each obs, like name subname and number.
However, subname is optional without any special notice.

part of the data:

John (tech) (43) Johnson(econ) (32) Julian (24) Justin (34) Jo (math) (32)
Julia(econ) (33) June (93)
....

How can I manipulate this data?

Thank you for your time.

Jun
3 REPLIES 3
LinusH
Tourmaline | Level 20
This can be done in a number of ways. One is to read your data into three variables, then checking if your second variable contains any digits (using ANYDIGIT function), if so move the contents to the third variable.

/Linus
Data never sleeps
Patrick
Opal | Level 21
Hi Jun

Also the problem is in general not too difficult to solve I can think of a few challenges which might occur on how your raw data look like.
Is the record structure really the way you show it to us (several 'observations' in one line)? Could it be that the name is missing (quite possible if there are millions of records) and that therefore you could have 2 to 4 consecutive values in brackets belonging to 2 different 'observations'?

Please let me know as the concrete solution will depend on how the data looks like.

The easiest way I can think about right now is to use Regular Expressions (funcions PRX.. in SAS) to decide which substring makes up an 'observation' - but Regular Expressions need also some practice to use and understand.

Cheers, Patrick
deleted_user
Not applicable
a big thank you to linux and patrick
With your recommendations, I find my problem. It focus on the structure of the raw data, cauz the raw data is too rough.
It is solved now. Thank you for your time!

Jun

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 667 views
  • 0 likes
  • 3 in conversation