BookmarkSubscribeRSS Feed
sandeep12
Fluorite | Level 6

Can you explain me the difference between reading data from an external file and reading data from an existing data set?

7 REPLIES 7
andreas_lds
Jade | Level 19

External text files (*.txt, *.csv, ...) a read by using infile- and input-statement in a data-step. Or by proc import, if you want some extra work to fix things that proc import guessed wrong. External binary-files are read by proc import or by using libname-statement.

 

Datasets are accessed read with the set-statement in a dataset or by any procedure, often with the option data=<dataset>. If datasets are stored permanently, you need to assign a library before you can use them.

 

This answer is not complete, you will need to search the docs for details about the various statements mentioned. You should look at the free training offered by SAS, too, see https://www.sas.com/en_us/training/offers/free-training.html.

cosmid
Lapis Lazuli | Level 10

I believe there is another difference, reading data from an existing data set is faster than reading data from an external file. Assuming that your existing data set is a SAS data set.

Tom
Super User Tom
Super User

@cosmid wrote:

I believe there is another difference, reading data from an existing data set is faster than reading data from an external file. Assuming that your existing data set is a SAS data set.


It might not be faster but it is definitely easier. 

 

I would use the verb USING for an existing SAS dataset instead of READING.

 

If you are reading from a "file" you need to tell SAS how to convert it into a dataset, or force it to guess by using PROC IMPORT. 

When using a dataset SAS already knows what variables there are and all of the metadata about the dataset and the variables.

rinugour
Fluorite | Level 6

The main difference is that while reading an existing data set with the SET statement, SAS retains the values of the variables from one observation to the next. Whereas when reading the data from an external file, only the observations are read. The variables will have to re-declared if they need to be used.

Nikit7
Calcite | Level 5

When SAS reads in a raw data set, the input buffer is created at the beginning of the compilation phase. The input buffer is used to hold raw data.

However, if you read in a SAS data set instead of a raw data file, the input buffer will not be created.

Nikit7
Calcite | Level 5

While reading from external file,
An input buffer is created to hold the raw data at the beginning of the compilation.

For flat files using infile statement, the data is read as observations and have to explicitly redeclare variables to be read by input statement.

Other methods would be proc import, excel engine or piping.

  

While reading from SAS datasets, input buffer is not created.

Values of variables are retained from dataset being read.

Set statement in data steps and various procs can read from datasets and manupulate/create datasets in SAS.

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @sandeep12 

 

A file is a named area in computer storage. The physical content of a file is just a stream of bits, but different programs can "map" the bits as logical structures that makes sense in the given program, so it can be seen as information containing e.g. a photo, a spreadsheet, lines of text or a two-dimensional table with observations and columns.

 

A SAS data set is also just a file, but it has a content which SAS sees as two-dimensional tables. The ability to see a file as a two-dimensional table is hidden in a Libnane engine, so a file accessed through a LIBNAME is a SAS data set. SAS has also the ability to make a bridge to tables in a DBMS system or (with limitations) xlsx- and json-files, so the can be seen as SAS data sets.

 

The libname is essential in SAS. All SAS procedures for statistics and reporting require SAS data sets as input, and data in other logical formats must be converted to SAS data sets before they can be used in statistics and reporting.

 

The SAS tool to read files and convert them to SAS data sets is the DATA STEP, and I think it was the main purpose with the data step way back, when all input was created as 80 character text lines on punched cards. SAS procedures are controlled by options, but the data step can execute programs of any complexity written in the SAS BASE language.

 

SAS data sets are accessed through a LIBNAME, but a data step can also access oher types of files declared in a FILENAME. A filename has no engine to supply a logical structure to the file, so a program written in the SAS language is needed to interpret data, e.g. splitting a comma-separated text line into variables.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2444 views
  • 2 likes
  • 7 in conversation