BookmarkSubscribeRSS Feed
Hari2
Quartz | Level 8

Hi All,

 

Please help to create a SAS dataset when we have data as below which has more than 1 delimiter. Here , delimiter is | and actual data is saved between double quotes. 

 

we have data like below, which is coming in a .csv file and data is saved in first cell of each row.

1.  Please note that here we need to consider '|' as a delimiter. 

2. we need to read values with out quotes or even if quotes come , it's fine we can compress later.  

3. values for 4th column is null. 

4. so finally, we need to have a dataset with 6 columns and 2 rows. 

 

Reqnr|"S Id"|"abc"|"abc Date"|"Name"|"cont"

FCQ00001|"FCQ00001000"|"No"|""|"FLOW"|"EDTA K2 4mL (violet)"
FCQ00005|"FCQ00005000"|"No"|""|"FLOW"|"EDTA K2 4mL (violet)"

 

Please help on this. Thanks in advance.

 

11 REPLIES 11
Oligolas
Barite | Level 11

Hi,

use the import assistant if you're not familiar with SAS code.

This is how you can achieve imports by yourself:

 

1. Open "Import Data" from the file menu - Click Next

2. Choose "Delimited File (*.*)" from the drop down box - Click Next

3. Choose your File's location - Click "Options..."

4. Choose "Character" from the checkbox and enter the pipe sign "|". If your file contains the column names in the first row, then choose "first row of data=2" - Click 'OK' then Click Next

5. Enter the dataset name to create

6. Enter the path where the PROC IMPORT statements should be saved for later consultation- Click Finish

 

 

________________________

- Cheers -

Tom
Super User Tom
Super User

So if you know what the data is then just write a data step to read it. The file looks mainly like a normal delimited file, although there appear to be some unneeded extra quotes.

 

First I will create a text file from your example so I have something to code with:

options parmcards=example;
filename example temp;

parmcards4;
Reqnr|"S Id"|"abc"|"abc Date"|"Name"|"cont"
FCQ00001|"FCQ00001000"|"No"|""|"FLOW"|"EDTA K2 4mL (violet)"
FCQ00005|"FCQ00005000"|"No"|""|"FLOW"|"EDTA K2 4mL (violet)"
;;;;

Now I can write a data step to read from EXAMPLE fileref.  Since you didn't provide any details on what type of data is in the fields, especially the empty ABC_DATE field, I just guessed some lengths to use for those variables.

The DSD option is what tells the INPUT statement to remove the quotes from around the values.

data want;
  length regnr $10 id $20 abc abc_date name $8 cont $30 ;
  infile example dsd truncover dlm='|' firstobs=2;
  input regnr id abc abc_date name cont;
run;

Note since none of your values actually include the pipe delimiter the quotes in your file are not needed.  Your text file could be represented as just:

Reqnr|S Id|abc|abc Date|Name|cont
FCQ00001|FCQ00001000|No||FLOW|EDTA K2 4mL (violet)
FCQ00005|FCQ00005000|No||FLOW|EDTA K2 4mL (violet)
Peter_C
Rhodochrosite | Level 12
the DSD option on the INFILE statement, expects and strips "bounding" quotes, so it looks to a data step that your raww data has only one delimiter, the pipe(|)
Hari2
Quartz | Level 8
Hi Tom,
thanks for your response. as you mentioned , you are creating a text file using Cards. However please suggest how to make it when we have data in .csv file - that too, in a single cell of each row.
So, in first Row at cell A1 - we have column names around 50 roughly.
2nd row at cell B1 - it has values for all columns .
3rd row at cell C1 - it has values again for all col's.
......
......
etc.....
you can refer example data which I have given in Previous post.
Tom
Super User Tom
Super User

@Hari2 wrote:
Hi Tom,
thanks for your response. as you mentioned , you are creating a text file using Cards. However please suggest how to make it when we have data in .csv file - that too, in a single cell of each row.
So, in first Row at cell A1 - we have column names around 50 roughly.
2nd row at cell B1 - it has values for all columns .
3rd row at cell C1 - it has values again for all col's.
......
......
etc.....
you can refer example data which I have given in Previous post.

I don't understand the question. What I showed you IS how to read a delimited text file.   I just included your example data to be able to have an actual file to work with.  The INFILE statement is what points to the file to be read.  In my code it is using the fileref EXAMPLE that I created for your example file.  You could define the fileref EXAMPLE to point to your existing file.  Or use a different fileref.  Or not use a fileref at all and instead just include the quoted physical name of the file in the INPUT statement directly.

 

A CSV file does not have cells, that is a spreadsheet concept.  A CSV file has lines of text with multiple values on each line.  A CSV file frequently will have a header row with names (or descriptions) for each column of data.  But a CSV file has no other information to guide you in writing the program to read it.  Nothing to say that a particular column should be numeric or character or if character what the length should be reserved for the values.  If you don't have a description of the contents you will still have to make decision about what each column in the file represents and how each variable should be defined.

 

What is it that you do not understand?

 

If you want SAS to guess how to name the variables and how to define each variable (by looking at the examples in the file you are currently reading) then try PROC IMPORT.

proc import datafile='myfile.csv' dbms=dlm out=want replace;
  delimiter='|';
run;

But remember that it can only use the information in this one example file you are reading.  So if later you get some other data that you need to combine with the first file you might find that they variables have been defined differently because the specific values in those fields where a different subset of the possible values.

Hari2
Quartz | Level 8
HI Tom,

Thanks for your detailed explanation. actually it was helpful and you gave some insights to get the solution. Thanks very much.

please see steps below, How I got the solution.
/*Since we have unwanted extra Double quotes in actual values, we are not able to fetch into sas dataset from .csv file directly.
- Step 1: we are fetching it into a dataset as it is (means how we have actual data in ,csv file into a single column).
- Step 2: Removing unwanted double quotes from actual string (String contains, values for all variables separated with | delimiter)
- Step 3: Exporting again to a .txt file
- Step 4: From text file we are converting it into SAS dataset which is working fine.
*/
Tom
Super User Tom
Super User

Are you saying that instead of having lines like you originally posted.

Reqnr|"S Id"|"abc"|"abc Date"|"Name"|"cont"
FCQ00001|"FCQ00001000"|"No"|""|"FLOW"|"EDTA K2 4mL (violet)"
FCQ00005|"FCQ00005000"|"No"|""|"FLOW"|"EDTA K2 4mL (violet)"

You have lines where all of the data is one value on the line?  Like

"Reqnr|""S Id""|""abc""|""abc Date""|""Name""|""cont"""
"FCQ00001|""FCQ00001000""|""No""|""""|""FLOW""|""EDTA K2 4mL (violet)"""
"FCQ00005|""FCQ00005000""|""No""|""""|""FLOW""|""EDTA K2 4mL (violet)"""

So if the lines are shorter than 32,767 bytes each you could read the line into a character variable and write it back out to a new file.

filename new temp;
data _null_;
  infile example dsd truncover ;
  file new;
  input line :$32767.;
  put line;
run;

You could even modify the _INFILE_ automatic variable to make it possible to parse the lines in a single data step without first making and updated text file.

data want;
  length regnr $10 id $20 abc abc_date name $8 cont $30 ;
  infile example dsd truncover dlm='|' firstobs=2;
  input @;
  _infile_=dequote(_infile_);
  input regnr id abc abc_date name cont;
run;
Peter_C
Rhodochrosite | Level 12
If there is an organised way that double quotes are present in your data, then there is a way that SAS can read the underlying data in a single pass, divided into the columns you seem to expect.
Traditionally, a database table export would quote strings values (to permit delimiters within data) . The DSD option of INFILE statement is engineered to cope with that tradition. Additionally, when DSD is not appropriate, SAS has the $quote informat.
Please provide a sample of your data (anonymised if necessary)
Kurt_Bremser
Super User

The "first cell in each row" lets me suspect that you inspected this file with MS Excel, which is a very poor choice for doing this. Always inspect text files with a suitable text editor (like Notepad++) which can deal with files of arbitrary size and also informs you of encodings, type of line separators (Windows/UNIX/Mac), and allows hex view for special characters.

 

Anyway, using your data as posted, this works:

data want;
infile datalines firstobs=2 dsd dlm="|";
input reqnr $ s_id :$11. abc :$3. abc_date :yymmdd10. name $ cont :$30.;
datalines;
Reqnr|"S Id"|"abc"|"abc Date"|"Name"|"cont"
FCQ00001|"FCQ00001000"|"No"|""|"FLOW"|"EDTA K2 4mL (violet)"
FCQ00005|"FCQ00005000"|"No"|""|"FLOW"|"EDTA K2 4mL (violet)"
FCQ00005|"FCQ00005000"|"No"|""|"FLOW"|"EDTA ""K2 4mL (violet)"
FCQ00005|"FCQ00005000"|"No"|""|"FLOW"|"EDTA "K2 4mL (violet)"
;

As you can see, I added extra rows to illustrate how the DSD option deals with additional quotes contained within delimiters.

 

You should be able to use the code by removing the datalines block and replacing DATALINES in the INFILE statement with the path to your text file. If something unexpected happens, please copy/paste the whole log of the step into a window opened with this button:

Bildschirmfoto 2020-04-07 um 08.32.59.jpg

Hari2
Quartz | Level 8

Hi All,

 

Thanks very much for your responses. 

 

- As I mentioned already, entire row data (for all variables separated with | symbol and each value is enclosed in "") , is stored in first cell of each row of  a CSV File.  

- so when I read it into dataset, I am getting complete data into a single column (with no. of rows that are there in .csv)

- So I am reading it first and then writing back to a text file and compressing double quotes .

- then reading from text file which is working fine and I am able to read into dataset as required. 

 

- Thanks to Tom from whom I got this idea to convert it to text file first and then read it which is working fine. 

Kurt_Bremser
Super User

A csv file DOES NOT HAVE CELLS. PERIOD.

A csv file has rows, and in these rows you find delimiters. These delimiters separate columns.

I repeat: there is nothing like a "first cell" in a csv file. That's only what you see if you use the execrable, useless (in this context) Excel to open a csv file that does not have semicolons as delimiters. Use a text editor, nothing else.

 

My code proves that a file with the layout you posted is easy to read into a dataset directly. If it does not work for you, post a sample of your file as is (not what you see in Excel!!), if necessary, attach it to the post.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2380 views
  • 1 like
  • 5 in conversation