Hi there, I am trying to change the format of my field called Passport to $15 and keep it that way. How do I do that?
I initially used proc import and copied and pasted the informat code. Not sure if there is a more sleek way as with various csv files coming in the format won't always be the same, where there will be various columns of different data types. I need to ensure that on every csv import, the column called Passport will change to character type. In some instances the passport number is only numeric however I need this column to be character as other files might have this field as alpha-numeric.
Is there perhaps a better way to use infile and ensure the Passport field remains character throughout?
data work.test; infile "C:/mycsv.csv" delimiter = "," missover dsd firstobs=2 ; informat Code best32. ; informat Name $14. ; informat Firstname $35. ; informat Passport best32. ; format Code best12. ; format Name $14. ; format Firstname $35. ; format Passport $15. ; input Code Name $ Firstname $ Passport ; run;
Not sure I follow your logic here. What is the actual problem you are trying to solve?
If your CSV files have a know structure then create a template code (or perhaps even a macro) for reading the file and just change the input filename when you need to read a new version.
Perhaps you meant to say you have universe of possible variables but each individual CSV file only contains some of them? In that case making a template DATASET might help.
Say you have a empty dataset defined with 20 variables.
data template;
length var1 8 var2 $20 .... var20 8;
format var3 date9.;
stop;
run;
And you want to read a CSV file that has just three of those variables.
var3,var1,var2 01JAN2020,125,Sam
If you can somehow get the list of names to read from the first line. Say into a macro variable. Perhaps like this:
data _null_;
infile csv ;
input;
call symputx('varlist',translate(_infile_,' ',','));
stop;
run;
Then you could use that list of variables to read from the CSV file.
data want;
if 0 then set template;
infile csv dsd truncover firstobs=2;
input (&varlist) (+0);
run;
You don't need to define an informat or format for normal values, only for dates, times, datetimes and numbers formatted with special formats like having a comma as decimal separator. You should length statement to define type and length for each variable.
It seems that you should start to learn how to read data with a data step, see https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lepg/n02z8w5et43i1wn1eto968a6zrpi.htm for details.
@Citrine10 wrote:
Hi
Here is an example:
Code Name Firstname Passport
1234 Dress1 Jessi 46876
5678 Coat45 Alex 90743
So Passport is identified as best32. How can I convert it to character string? Is there a way to do it with my inline statements?
Lots more datasets like this will be uploaded whereby in some cases the Passport might be like e.g. 1234UT799X2
So I need to accommodate for this and would initially like all passport fields to be character no matter what csv file is uploaded
Why did you post sample data with space as the delimiter when your sample code was using comma as the delimiter? What delimiter do your actual text files use? Note that using a space as a delimiter for data like names that could include spaces will potentially cause trouble. Although commas in a CSV will also potentially cause trouble if the file is not generated properly.
So to read a CSV and have full control over how the variables are define just write your own data step. Take into account the full set of data you expect you might need to read when defining the variables.
The easiest way is to DEFINE the variables before using them in other statements, such as INFORMAT or FORMAT or INPUT. Use a LENGTH statement (or possibly an ATTRIB statement with the LENGTH= option) to define the variables.
So if your example file looks like a CSV file such as
Code,Name,Firstname,Passport 1234,Dress1,Jessi,46876 5678,Coat45,Alex,90743
Then your data step might look like:
data want;
infile 'myfile.csv' dsd truncover firstobs=2;
length code $10 name $20 firstname $20 passport $15 ;
input code -- passport;
run;
Note that all of those variable names look like they should be character variables. There is no need to attach formats or informats to character variables. Use your knowledge of the type of data you are reading to assign a reasonable length to the variables. Long enough that no values are truncated, but not so long as to waste space and slow down processing.
Define the variables in a LENGTH statement, and omit formats and informats, as these are not needed for character variables.
I also changed code to character, as you will never use it for calculations.
data work.test;
infile "C:/mycsv.csv"
delimiter = ","
missover
dsd
firstobs=2
;
length
Code $10 /* set as needed for the longest value */
Name $14
Firstname $35
Passport $15
;
input
Code
Name
Firstname
Passport
;
run;
@Citrine10 wrote:
Thank you, this really helps.
I have excluded the input piece of your code thus leaving me with an empty table.
In some cases the CSV files will be all have a different amount of columns and not all columns will be populated. How do I ensure that where the column name from the clients table e.g. is called Passport , populates the Passport field?
If you do not execute an INPUT statement then there is no way the variables will ever be populated.
If you are using a properly formatted delimited file then the delimiters (commas in a CSV file) are what determine which value is for which variable. So just make sure there are enough commas. There should be a comma between each value. If the value is missing then there is no need to have anything at all between the commas. Make sure to include the TRUNCOVER option on the INFILE statement and any missing commas at the end of the line will not matter. (Note you can get away with using the MISSOVER option instead of TRUNCOVER, but only if you make sure that your INPUT statement is using LIST MODE and not FORMATTED mode. Otherwise MISSOVER will throw away values at the end of the lines that are too short for the informat being used to read them.)
Not sure I follow your logic here. What is the actual problem you are trying to solve?
If your CSV files have a know structure then create a template code (or perhaps even a macro) for reading the file and just change the input filename when you need to read a new version.
Perhaps you meant to say you have universe of possible variables but each individual CSV file only contains some of them? In that case making a template DATASET might help.
Say you have a empty dataset defined with 20 variables.
data template;
length var1 8 var2 $20 .... var20 8;
format var3 date9.;
stop;
run;
And you want to read a CSV file that has just three of those variables.
var3,var1,var2 01JAN2020,125,Sam
If you can somehow get the list of names to read from the first line. Say into a macro variable. Perhaps like this:
data _null_;
infile csv ;
input;
call symputx('varlist',translate(_infile_,' ',','));
stop;
run;
Then you could use that list of variables to read from the CSV file.
data want;
if 0 then set template;
infile csv dsd truncover firstobs=2;
input (&varlist) (+0);
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.