Solved: Re: CSV inline format change

Citrine10 · Posted 05-19-2022 05:39 AM

Hi there, I am trying to change the format of my field called Passport to $15 and keep it that way. How do I do that?

I initially used proc import and copied and pasted the informat code. Not sure if there is a more sleek way as with various csv files coming in the format won't always be the same, where there will be various columns of different data types. I need to ensure that on every csv import, the column called Passport will change to character type. In some instances the passport number is only numeric however I need this column to be character as other files might have this field as alpha-numeric.

Is there perhaps a better way to use infile and ensure the Passport field remains character throughout?

data work.test;
    infile "C:/mycsv.csv"
	delimiter = ","
	missover 
	dsd
	firstobs=2
;

    informat Code best32. ;
    informat Name $14. ;
    informat Firstname $35. ;
    informat Passport best32. ;

    format Code best12. ;
    format Name $14. ;
    format Firstname $35. ;
                                            
    format Passport $15. ;

    input
             Code 
             Name $
             Firstname $
             Passport ;
run;

Tom · Posted 05-30-2022 10:38 AM

Not sure I follow your logic here. What is the actual problem you are trying to solve?

If your CSV files have a know structure then create a template code (or perhaps even a macro) for reading the file and just change the input filename when you need to read a new version.

Perhaps you meant to say you have universe of possible variables but each individual CSV file only contains some of them? In that case making a template DATASET might help.

Say you have a empty dataset defined with 20 variables.

data template;
   length var1 8 var2 $20 .... var20 8;
   format var3 date9.;
   stop;
run;

And you want to read a CSV file that has just three of those variables.

var3,var1,var2
01JAN2020,125,Sam

If you can somehow get the list of names to read from the first line. Say into a macro variable. Perhaps like this:

data _null_;
  infile csv ;
  input;
  call symputx('varlist',translate(_infile_,' ',','));
  stop;
run;

Then you could use that list of variables to read from the CSV file.

data want;
   if 0 then set template;
  infile csv dsd truncover firstobs=2;
  input (&varlist) (+0);
run;

View solution in original post

andreas_lds · Posted 05-19-2022 06:15 AM

You don't need to define an informat or format for normal values, only for dates, times, datetimes and numbers formatted with special formats like having a comma as decimal separator. You should length statement to define type and length for each variable.

JosvanderVelden · Posted 05-19-2022 06:20 AM

I have some difficulties understanding your code since you use a numeric informat for Passport but a character format. Can you send us the first 2 lines of the csv file?
NOTE: if you don't want to send the actual first dataline just make something up and post that.

Best regards, Jos

Citrine10 · Posted 05-19-2022 06:44 AM

Hi
Here is an example:
Code Name Firstname Passport
1234 Dress1 Jessi 46876
5678 Coat45 Alex 90743

So Passport is identified as best32. How can I convert it to character string? Is there a way to do it with my inline statements?

Lots more datasets like this will be uploaded whereby in some cases the Passport might be like e.g. 1234UT799X2

So I need to accommodate for this and would initially like all passport fields to be character no matter what csv file is uploaded

andreas_lds · Posted 05-19-2022 06:53 AM

It seems that you should start to learn how to read data with a data step, see https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lepg/n02z8w5et43i1wn1eto968a6zrpi.htm for details.

Tom · Posted 05-26-2022 07:38 PM

@Citrine10 wrote:
Hi
Here is an example:
Code Name Firstname Passport
1234 Dress1 Jessi 46876
5678 Coat45 Alex 90743

So Passport is identified as best32. How can I convert it to character string? Is there a way to do it with my inline statements?

Lots more datasets like this will be uploaded whereby in some cases the Passport might be like e.g. 1234UT799X2

So I need to accommodate for this and would initially like all passport fields to be character no matter what csv file is uploaded

Why did you post sample data with space as the delimiter when your sample code was using comma as the delimiter? What delimiter do your actual text files use? Note that using a space as a delimiter for data like names that could include spaces will potentially cause trouble. Although commas in a CSV will also potentially cause trouble if the file is not generated properly.

So to read a CSV and have full control over how the variables are define just write your own data step. Take into account the full set of data you expect you might need to read when defining the variables.

The easiest way is to DEFINE the variables before using them in other statements, such as INFORMAT or FORMAT or INPUT. Use a LENGTH statement (or possibly an ATTRIB statement with the LENGTH= option) to define the variables.

So if your example file looks like a CSV file such as

Code,Name,Firstname,Passport
1234,Dress1,Jessi,46876
5678,Coat45,Alex,90743

Then your data step might look like:

data want;
  infile 'myfile.csv' dsd truncover firstobs=2;
  length code $10 name $20 firstname $20 passport $15 ;
  input code -- passport;
run;

Note that all of those variable names look like they should be character variables. There is no need to attach formats or informats to character variables. Use your knowledge of the type of data you are reading to assign a reasonable length to the variables. Long enough that no values are truncated, but not so long as to waste space and slow down processing.

Kurt_Bremser · Posted 05-19-2022 07:44 AM

Define the variables in a LENGTH statement, and omit formats and informats, as these are not needed for character variables.

I also changed code to character, as you will never use it for calculations.

data work.test;
infile "C:/mycsv.csv"
  delimiter = ","
  missover 
  dsd
  firstobs=2
;
length
  Code $10 /* set as needed for the longest value */
  Name $14
  Firstname $35
  Passport $15
;
input
  Code 
  Name
  Firstname
  Passport
;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Citrine10 · Posted 05-26-2022 05:56 AM

Thank you, this really helps.
I have excluded the input piece of your code thus leaving me with an empty table.
In some cases the CSV files will be all have a different amount of columns and not all columns will be populated. How do I ensure that where the column name from the clients table e.g. is called Passport , populates the Passport field?

Tom · Posted 05-26-2022 08:28 AM

@Citrine10 wrote:
Thank you, this really helps.
I have excluded the input piece of your code thus leaving me with an empty table.
In some cases the CSV files will be all have a different amount of columns and not all columns will be populated. How do I ensure that where the column name from the clients table e.g. is called Passport , populates the Passport field?

If you do not execute an INPUT statement then there is no way the variables will ever be populated.

If you are using a properly formatted delimited file then the delimiters (commas in a CSV file) are what determine which value is for which variable. So just make sure there are enough commas. There should be a comma between each value. If the value is missing then there is no need to have anything at all between the commas. Make sure to include the TRUNCOVER option on the INFILE statement and any missing commas at the end of the line will not matter. (Note you can get away with using the MISSOVER option instead of TRUNCOVER, but only if you make sure that your INPUT statement is using LIST MODE and not FORMATTED mode. Otherwise MISSOVER will throw away values at the end of the lines that are too short for the informat being used to read them.)

Citrine10 · Posted 05-30-2022 04:08 AM

hi Tom, Thank you for your assistance.
The reason I commented out the input statement is because I don't want the variables populated as yet 🙂 I want to create a structure of how the datasets should look when they are imported such that strings import as strings into the relevant columns and numeric as numeric etc. With a variable amount of columns, I am stuck with how to code this to ensure that if a file with only 3 columns populates in the correct 3 columns and if a file with 10 columns, populates in its correct 10 columns.
E.g. if I have columnA, columnB, columnH in my initial CSV file, upon import, it needs to populate in those fields called columnA, columnB and columnH in my dataset.

if I have columnG,columnY, columnT in my initial CSV file, upon import, it needs to populate in those fields called columnG,columnY, columnT in my dataset.

Tom · Posted 05-30-2022 10:38 AM

Not sure I follow your logic here. What is the actual problem you are trying to solve?

If your CSV files have a know structure then create a template code (or perhaps even a macro) for reading the file and just change the input filename when you need to read a new version.

Perhaps you meant to say you have universe of possible variables but each individual CSV file only contains some of them? In that case making a template DATASET might help.

Say you have a empty dataset defined with 20 variables.

data template;
   length var1 8 var2 $20 .... var20 8;
   format var3 date9.;
   stop;
run;

And you want to read a CSV file that has just three of those variables.

var3,var1,var2
01JAN2020,125,Sam

If you can somehow get the list of names to read from the first line. Say into a macro variable. Perhaps like this:

data _null_;
  infile csv ;
  input;
  call symputx('varlist',translate(_infile_,' ',','));
  stop;
run;

Then you could use that list of variables to read from the CSV file.

data want;
   if 0 then set template;
  infile csv dsd truncover firstobs=2;
  input (&varlist) (+0);
run;

Citrine10 · Posted 06-03-2022 05:01 AM

thank you, this really helped alot.

Register Today!

SAS Training: Just a Click Away