BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jwinkle
Fluorite | Level 6

Hi all,

 

I am having an issue reading raw data into Enterprise Guide due to the nature of the data.  For reasons that are too long to explain in this forum, I cannot use the import wizard.  

 

Essentially, I have multiple variables per line of data, but the placement of delimiters (and even amount of variables per line) is inconsistent.  What I would just like to do at this point is import the data so that each line of data is its own observation for ONE variable.  

 

I have gotten to the point where I have a variable that contains the beginning of each line, but once it reaches the first delimiter, it skips the rest of that line and begins again on the next line.  

 

Here is a sample of the data (2 consecutive dashes represent tabs for the purpose of demonstration):

 

SPAIN -- -- -- -- -- --

  Catalunya -- -- -- -- -- --

      Mercedes Benz -- 4992 -- Barcelona, ES -- CR -- -- -- --

           123 Tapas Street -- Paraque -- 21441 -- 12 -- 9842

           456 Toro Avenue -- NoSe -- 78443 -- 97 -- 821

Z     789 Del Mar Road -- Refor -- 12743 -- 50 -- 6612

           TOTALS FOR ENTITY 4992 -- 3 Office(s) -- 17,275

 

 

This would be one sample section of data, but some other chunks may have only 1 address or 10 addresses per entity (ie there is very little consistency).  Also, there is sometimes additional data in between each of these country sections (like seen above) that is very different.  The point is, I would like to just read it as one line=one observation with a total of one variable.  So, the data above would look something like this:

 

Obs        VAR1

 

1             SPAIN

2                Catalunya

3                     Mercedes Benz  4992  Barcelona, ES CD

.

.

.

7                     TOTALS FOR ENTITY  3 Office(s)  17,275

------------------------------------------------------------------------------

 

THANKS IN ADVANCE FOR THE HELP!!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Without more complete examples it will be hard to address specific items.

It is very likely that you want to start with

 

input @;

To hold the input line at the current location.

The parse out the value

if index(_infile_,'      ') then region = input(_infile_,$50.);

Which gets a bit messier for the the other rows.

 

Then test for the presence of specific. Also if your data is irregular in the number of lines you will need to read for a specific record you may end up wanting to RETAIN variables (such as what appears to be Country and Region) and have an OUTPUT statement only after you have read the detail line of the data..

 

Note: messy data often equals messy code to deal with it.

 

And no need to apologize for not using a wizard. The wizards are set up for fairly regular data, which this isnt'.

View solution in original post

5 REPLIES 5
LinusH
Tourmaline | Level 20

After an INPUT statement, the automatic variable _infile_ is created for you, which you could assign to var1.

Be sure to specify a length to var1 that can hod the entire line.

Data never sleeps
jwinkle
Fluorite | Level 6

Thanks for the quick reply!

 

The only problem I am having with this solution is that I'm trying to write it so that if a line contains a certain string, it will print that into a specific variable, but when it finds the string I specify, it prints the NEXT line to that variable.....

 

EXAMPLE :

 

if index(_infile_,'      ') then
input Region $CHAR50.;

 

So in this example, it does not print Catalunya as the region, but rather the line starting with "mercedes benz ..."

ballardw
Super User

Without more complete examples it will be hard to address specific items.

It is very likely that you want to start with

 

input @;

To hold the input line at the current location.

The parse out the value

if index(_infile_,'      ') then region = input(_infile_,$50.);

Which gets a bit messier for the the other rows.

 

Then test for the presence of specific. Also if your data is irregular in the number of lines you will need to read for a specific record you may end up wanting to RETAIN variables (such as what appears to be Country and Region) and have an OUTPUT statement only after you have read the detail line of the data..

 

Note: messy data often equals messy code to deal with it.

 

And no need to apologize for not using a wizard. The wizards are set up for fairly regular data, which this isnt'.

Cynthia_sas
SAS Super FREQ
I am curious about the note in an earlier post about the use of TAB characters. Are there TAB characters in the data? If the TAB characters are regularly spaced, then you may be able to use the TAB as a delimiter to parse out the value you want.

If you are not using the TAB, or they are just random in the input file, then that idea will not be useful to you.

cynthia
Ksharp
Super User
You'd better attach a TEXT file to let us test it . and of course, not forget to post the output you want to see.


sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1770 views
  • 0 likes
  • 5 in conversation