BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
VinnyR
Calcite | Level 5

I know this questions has been discussed before but I am not sure what is the difference between these 2 approaches ...  1st approach is correct working as desired and 2nd approach is putting everything in 1st variable.

 

Approach 1:

data want;
INFILE 'filepath/filename.csv'
LRECL=256 ENCODING="LATIN1" DLM='2c'x MISSOVER DSD firstobs=2;
INPUT
var1 : $CHAR100.
var2 : $CHAR50.
var3 : ??BEST8.
var4 : $CHAR100.
var5 : $CHAR10.
var6 : $CHAR150.;
run;

 

Approach 2:

data want;
infile "filepath/filename.csv/"
lrecl=256  dlm="2c"x  missover  dsd  firstobs=2  encoding="LATIN1";
input var1 $ 1-100 var2 $ 101-175 var3 176-190 var4 $ 191-215 var5 $ 216-225 var6 $ 226-325;
run;

 

Also, I want to start reading the file from row2, can i still use PROC IMPORT to import this CSV file and get the desired dataset?

 

Regards,

Vinni

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The right answer depends on how the file you are reading is structured. If it really is a CSV file then using the list mode style INPUT statement like in the first example is correct.  The INPUT statement in the second example if what you would do if the input file was using fixed column locations to place the data for each field instead of using delimiters on the line to marge where the fields start and stop.  But if your fields are fixed length then you would not want to include the DSD option on the INFILE statement. And you would definitely want to set the LRECL on the INFILE statement longer than the number of columns you were trying to read in the INPUT statement.

 

Personally I would use a data step like this.  I find it helps a lot if you explicitly define the variables instead of forcing SAS to guess how you wanted them defined based on how you first use the variable. And SAS does not need to be told what INFORMAT to use for most variables.

 

data want;
  INFILE 'filepath/filename.csv' dsd firstobs=2 truncover ;
  length var1 $100 var2 $50 var3 8 var4 $100 var5 $10 var6 $150;
  input var1-var6 ;
run;

If you really wanted to suppress notes in the log about invalid numbers for `VAR3` then you could use this for the INPUT statement instead.

  input var1 var2 var3 ?? var4-var6 ;

Note there is no informat named BEST.  If you ask for it SAS will just use the normal numeric informat. In some ways the COMMA informat is a "best" informat for numbers as it will ignore commas and dollar signs that might appear.

Note that CSV files normally have leading spaces stripped already.  Do you really want to preserve any leading spaces that might appear in your input file by using the $CHAR informat instead of just reading the character fields normally?

 

View solution in original post

4 REPLIES 4
PeterClemmensen
Tourmaline | Level 20

You can use the GETNAMES=NO statement in proc import to start at the second row.

VinnyR
Calcite | Level 5

Hi Draycut, thanks for your reply.

 

How would you solve the issue for multiple lengths in the variable as in say first observation is 50 characters and second observation is 60 characters.Is there any way to change the length of the variable in PROC Import code?

Tom
Super User Tom
Super User

The right answer depends on how the file you are reading is structured. If it really is a CSV file then using the list mode style INPUT statement like in the first example is correct.  The INPUT statement in the second example if what you would do if the input file was using fixed column locations to place the data for each field instead of using delimiters on the line to marge where the fields start and stop.  But if your fields are fixed length then you would not want to include the DSD option on the INFILE statement. And you would definitely want to set the LRECL on the INFILE statement longer than the number of columns you were trying to read in the INPUT statement.

 

Personally I would use a data step like this.  I find it helps a lot if you explicitly define the variables instead of forcing SAS to guess how you wanted them defined based on how you first use the variable. And SAS does not need to be told what INFORMAT to use for most variables.

 

data want;
  INFILE 'filepath/filename.csv' dsd firstobs=2 truncover ;
  length var1 $100 var2 $50 var3 8 var4 $100 var5 $10 var6 $150;
  input var1-var6 ;
run;

If you really wanted to suppress notes in the log about invalid numbers for `VAR3` then you could use this for the INPUT statement instead.

  input var1 var2 var3 ?? var4-var6 ;

Note there is no informat named BEST.  If you ask for it SAS will just use the normal numeric informat. In some ways the COMMA informat is a "best" informat for numbers as it will ignore commas and dollar signs that might appear.

Note that CSV files normally have leading spaces stripped already.  Do you really want to preserve any leading spaces that might appear in your input file by using the $CHAR informat instead of just reading the character fields normally?

 

VinnyR
Calcite | Level 5

Thanks a ton Tom, 

Your contribution is really appreciated!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 2035 views
  • 0 likes
  • 3 in conversation