BookmarkSubscribeRSS Feed
ajuhack
Fluorite | Level 6

Like a datalines having multiple varables

How i will Assign Auto length for the var3 as the 9 obs having 19 alphabetical  and its a big data of multiple variables ..

var1 var2 var3

1     2     akjdh

2     3     sjfkdsfjfdsg

4     5     kokdslfmf

6      4     kvkdjfgkl 

9     10     mkdmsklnjvnvdhbvdth

8 REPLIES 8
bnarang
Calcite | Level 5

Data temp;

input var1 var2 var3 :$25.;

datalines;

1     2     akjdh

2     3     sjfkdsfjfdsg

4     5     kokdslfmf

6      4     kvkdjfgkl 

9     10     mkdmsklnjvnvdhbvdth

;

ajuhack
Fluorite | Level 6

here we see this data but when you get a txt files of lakhs of obs .

I am trying to do something if i get large no obs and var ,

that i don,t need to look,  just use the code and then auto assign the length to respective var as you mention above .

well thx for reply .

Smiley Happy

art297
Opal | Level 21

As long as they are delimited text files, just use a high enough value (to cover the number of rows you might have) in a guessingrows= option within proc import.

bnarang
Calcite | Level 5

Yes, As Aurthur and Cynthia mentioned, Some information is required. You use a high enough value. or Just open some good editor like ultra editor and observe the maximum value that particular variable can have but again this is one time solution. Again, if you append your files in future, You might need to change.

Cynthia_sas
SAS Super FREQ

Hi:

  When you write your DATA step program to read the data, you should have also received a record layout that explains the attributes of the input file, because it is this record layout that will tell you, for example, what to name the variables, what they represent, how "big" the variables are, what the possible/valid values are, etc. Whoever gives you the raw data file should also tell you the maximum possible value of VAR3.

  Otherwise, when you read the data, you can use the LENGTH statement for each obs to test the length of VAR3. Later (in another DATA step program, you can adjust the length if your need to.) A program with INFILE and INPUT (for reading "raw" data) will create the descriptor portion of the new SAS dataset based on the information in your "compile time" statements, like LENGTH, INFORMAT, FORMAT and, based on information in your INPUT statement. For example, any numeric variables will get a default length of 8, unless you specify differently. Any character variable will get a default length of $8, unless you specify differently.

  In the situation where you do NOT have a record layout, then you will need to make at least 1 pass through the data yourself to determine the maximum length of the VAR3 variable. Then, after that first pass, you can make adjustments accordingly, probably the easiest way would be in a second DATA step program, with an explicit LENGTH statement before the SET statement.

cynthia

TimArm
Obsidian | Level 7

Hello ajuhack,

I have attached 3 macros that you might like to try. The main macro is called %read_csv_data.sas  -  it calls the other 2 macros.

I have learned by experience not to trust proc import for the very reason that it uses guessingrows. If you are guessing, you will always run the risk of data truncation.

My solution is to perform 2 passes (in the way Cynthia suggested).

Actually, because of my environment, it also reads the top 100 lines first, before doing the full reads, to determine if lines are CRLF or LF terminated!

It searches the top 100 lines for a header record, in order to work out column names.

It searches for a delimiter, or you can specify it if you know it.

It reads all of the data into character variables - so you will need to convert numerics and dates afterwards.

In my test file of 95,000 obs it takes about 1 minute to run on my AIX server.

My other test file of 410MB and over 10 million obs takes 8 or 9 minutes. SAS is good like this - the first full read of the text file takes 7.5 minutes, and the second read takes 40 seconds.

I hope you find this useful.

Tim.

ajuhack
Fluorite | Level 6

Many Thank Tim and Every one for there valuable suggestion for this

art297
Opal | Level 21

: It is a bad name for the feature, but a term that appears to be used throughout the industry.  If one sets guessing rows to be a higher value than the number of records they have, and their system (e.g., with Excel) isn't set to limit how high that number can be, all of the guesswork has been eliminated and SAS is actually computing the largest width needed.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1181 views
  • 2 likes
  • 5 in conversation