BookmarkSubscribeRSS Feed
deleted_user
Not applicable
I have a large file that has all of the data in one column with each variable(approximately 20 different variables) listed over and over again.

Using a simple example, if this was a file of names, addresses and phone numbers, all the data would be in a single column. Each name, address and phone for each person would listed in the one column. So if there 50 people in this 'address book', it would still be one column with the set of three variables (name, address, phone) repeated 50 times and grouped together by "observation."

Any suggestions as to how I import this type of data file into SAS EG?
6 REPLIES 6
Cynthia_sas
Diamond | Level 26
There are a lot of different ways to read data with SAS, both inside and outside of EG. It always helps to see a sample of the data. So, for example, does it look like this:

[pre]
Fred Mertz 123-4567 1935 Sunset Blvd Apt 1 Ethel Mertz 123-4567 1935 Sunset Blvd Apt 1
[/pre]

Or does it look like this:
[pre]
"Lucy Ricardo","333-4444","1935 Sunset Blvd Apt 2"
"Ricky Ricardo","333-4444","1935 Sunset Blvd Apt 2"
[/pre]

Or like this:
[pre]
name, phone, address
"Lucy Ricardo","333-4444","1935 Sunset Blvd Apt 2"
"Ricky Ricardo","333-4444","1935 Sunset Blvd Apt 2"

name, phone, address
"Fred Mertz", "123-4567" , "1935 Sunset Blvd Apt 1"
"Ethel Mertz", "123-4567" , "1935 Sunset Blvd Apt 1"
[/pre]


When you say that the data is "all in one column", it makes me think you are looking at in in Excel?? Is this perhaps an Excel file? Is it a comma separated file? Is this a file that has been exported from another type of data base structure??? If you open the file with Notepad, what does it look like? This may be a question for Tech Support, but it'll be easier to determine that if you can sketch out a little sample or give more information of what your input data looks like.
cynthia
deleted_user
Not applicable
It actually came to me as a *.dat file which I had never seen before.

To be exact, the data is organized as follows:

NM1*QC*1*LastName*FirstName****MR*00123456
REF*CE*MA
REF*1L*A
REF*1A*428.0
REF*1B*362.50
DTM*232*20060516
DTM*233*20060517
SVC*HC:X5476*23.18*0**1

The asterisks separate the data on the rows, referring, for example, name (NM1), or Member Number (MR), or dates (DTM), etc. You can see it is separated by asterisks but then it is repeated over and over for eadh observation.

I call it one column because it is simply a listing and there are no column breaks (unless we consider asterisks a column break, but then the data varies depending on the row it is in. Thanks for asking!
Cynthia_sas
Diamond | Level 26
Doug:
I am teaching class this week and did not have a chance to answer yesterday, sorry 'bout that. There are a few things that not quite right with your data sample.

First, let me say that I was able to read it with EG. So that's the good news.

But, there are a few problematic things about your data file. It is just an ASCII text file. .DAT is just how some folks send "data" files.

Let's look at your header: [pre]
NM1*QC*1*LastName*FirstName****MR*00123456
[/pre]

A few issues here...EG was not happy about reading the .dat file that I made, with those 4 asterisks in a row after FirstName. Also, it bothered me to have a variable name that started with a number (after QC, you have the number 1 and after MR, the number 00123456. I don't know what your data folks meant, but I had to put dummy var names up on the first row so EG had column names...like this:
[pre]
NM1*QC*x1*LastName*FirstName*x2*x3*x4*MR*x00123456
or maybe they meant that to be QC1 -- I can't answer that --

NM1*QC1*LastName*FirstName*x2*x3*x4*MR*x00123456
[/pre]

OK...so let's assume that the first column should have been NM1*QC1*LastName....for the first 3 columns...the problem is that when I parse out the data visually and line it up (manually), LastName looks like a number in some of the rows and then there's the poor last row, where FirstName looks like 0:
[pre]
NM1 *QC1 *LastName *FirstName *x2 *x3 *x4 *MR *00123456
REF *CE *MA
REF *1L *A
REF *1A *428.0
REF *1B *362.50
DTM *232 *20060516
DTM *233 *20060517
SVC *HC:X5476 *23.18 *0 * *1

[/pre]

So, I don't think you would want a LastName for some of these observations of 428.0 or 362.50....but, it -is- possible to read this with EG...first, fix up row 1 so it makes a bit more sense, then look on the File menu for Import Data and then point yourself to the fixed .DAT file. At some point in the import process, you have a chance to tell EG that the file is delimited by "other" -- so you would type in an asterisk (*) in the other delimiter field.

The problem is that I don't think this is good data to begin with and there might not be any point to reading it with EG until you go back to the person who gave it to you and find out where the missing data is. EG will assume that LastName is numeric (because it's just going to find numbers) -- so you'd have to change that in the Import Data dialog boxes. But, the bottom line is that I think even though you CAN read this data with EG, you may not WANT to read this data with EG until you find out a bit more about the data and/or get a corrected file. When you eyeball a file like this, it ought to hang together logically -- naturally, you expect the first row to be column names (even if the delimiter is an *) and so if you manually line things up in your head or in a temp version of the file, what lines up under LastName, for example, should make sense before you go forward.

Good luck with this and if you get better data, I'm sure you will be able to walk through the EG Import Data dialog boxes with no problems.
cynthia
Cynthia_sas
Diamond | Level 26
Doug...my bad. I left off the rest of a paragraph in the above reply...

in the middle...

but, it -is- possible to read this with EG...first, fix up row 1 so it makes sense. Then go to the EG File menu and select Import Data. From there, you point yourself to the fixed .DAT file. Then EG will give you a chance to say that the file is delimited and you will need to look for an "other" choice so you can type in an * as the delimiter. Finally, EG will show you a window where you get to change column attributes (like changing LastName to character, etc). Then you select RUN to actually try to let EG read the data file. If the import works, you will see the data in the table viewer.

(now go to the above reply and start reading with the para that starts ... "The problem is..."

cynthia
deleted_user
Not applicable
Thank you for taking another look at this. What I may do is try to resolve only for the rows of data that I need. For example, the NM rows are names and customer numbers. I am thinking of taking just those rows into a new file and then parsing it out. We'll see how it goes. Again, thank you!
Cynthia_sas
Diamond | Level 26
Aha!!! I think I understand...I think you have hierarchical data and you showed me only 1 obs...does it look like this:
[pre]
NM1*QC*1*Mertz*Ethel****MR*00123456
REF*CE*MA
REF*1L*A
REF*1A*428.0
REF*1B*362.50
DTM*232*20060516
DTM*233*20060517
SVC*HC:X5476*23.18*0**1
NM1*QC*1*Mertz*Fred****MR*9999999
REF*CE*MA
REF*1L*A
REF*1A*428.0
REF*1B*362.50
DTM*232*20060516
DTM*233*20060517
SVC*HC:X5476*23.18*0**1
[/pre]
You probably do NOT want to try to read this with EG if your data looks like the above. (where every NM1 record is a name record. You probably need a custom data step progam in a code node to read the file and only get the NM1 records.

Tech Support can probably help you with that program, if you can send them the WHOLE file that you are dealing with.

cynthia

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1975 views
  • 0 likes
  • 2 in conversation