Dear experts,
I need to import some files from a Csv. I am using " infile -> informat -> format"
But I am struggling with the field address that contains almost everything in term of delimiters, kind of information, number of rows.
Any good suggestion to cope with it efficently, i.e. getting all the data in a string for the same variable.
Here two examples of the cells that are generating problems:
1.
65, Zenedine Zidane
Brügge 7100
BELGIUM
2,
Diego Costa Private Limited | z.H. Cesc Fabregas | Dr. BRUG Wenger, Loftus Road
Bistocchi 642009
Bangladesh
How to import the data in order to be sure to get the content as one record each? I am using the following delimiter "¦" to avoid confusion but still due to the several rows in the address the csv is not correctly imported.
Thank in advance, SH.
A true CSV file you shall assign FILENMAE like:
filename my_file CSV '...file path and name ...";
As your file is not a true CSV format then assign:
filename my_file "...file path and name ...";
when your input data is given in more than one row you need treat each row seperately:
you need to identify what kind of data is in the current row and assign it to the appropriate variables.
you need to gather (by retain) all variables of same identity (person) from it first row to its last row
and write output only after assigning all its variables.
I saked for an inpu of full 10 rows and example of output, showing how that data should parse into variables,
then I can post a code how to do it.
This comes from the files not being "CSV". CSV means Comma Separated Variable files, and consist of one row with headers, then one row per data item, with each field separated by a comma, possibly enclosed in quotes if the data contains commas.
What you have there is some sort of user defined format. It would be best idea to go back to whoever came up with that and get them to fix/or describe. However not always possible. So you will need to do some work on the import yourself if that is the case.
This article shows reading multiple lines from a file:
https://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a002001051.htm
Basically you read the first row, then parse it out into the variables you want by using scan(). Then you read the next row and set that to variable, then the next row. Of course if you user defined file does not have all three in each time, then you will also need some logic to check this etc. Far simpler to use standard file formats.
Are there in yout input more lines and other data than adresses ?
Is this the input file format:
1.
65, Zenedine Zidane
Brügge 7100
BELGIUM
2,
Diego Costa Private Limited | z.H. Cesc Fabregas | Dr. BRUG Wenger, Loftus Road
Bistocchi 642009
Bangladesh
including: address serial number followed always by 3 lines ?
It will be better if you post part of your input as is, using notepad++ and copy/paste of 6-10 lines.
Anyhow, @RW9 is right, that is not a CSV file format.
And one more question - what variables do you want to have on output.
post an example of one or two addresses.
Dear experts,
I understand that "probably" the file is not properly structured to be a csv but this is what I have, let's put in this way: the file's extensions is .csv
@ShmuelI have several variables, e.g. name, surname and addresse is just one of them. In most of the cases the cell where the address is contained is of 3 rows but this is not always he case, sometimes. Would you have a suggestion in case I would have always 3 rows? Maybe it could be already step forward.
Output example:
if the address as a variable is before the last column which is gender. I would like to get:
as variable address:
Diego Costa Private Limited | z.H. Cesc Fabregas | Dr. BRUG Wenger, Loftus Road Bistocchi 642009 Bangladesh
Gender: "Male" (an not Fabregas)
Best regards,
SH
Totally agree with @RW9.
There must be some kind of layout/rules for that file.
Without that it's pure guessing which can get very complex.
Provide us with the user defined format for that file and an example of the desired output.
Daniel Santos @ www.cgd.pt
Dear @Daniel-Santos,
nice to know that you agree but I posted what I can. Of course it is a guess, if you do not have a best guess, the problem is for you too complex and you cannot support, feel free to ignore the topic and avoid redundancies. But thanks for the effort.
Dear Sir_Highbury.
I'm sorry if I hurted your feelings in someway, but I do think I haven't been rude in anyway.
We are all trying to help here, but I'm sorry if guessing or prophetizing are not one of my virtues.
Having said that, have a good day Sir, and good luck with your guessing.
Daniel Santos @ www.cgd.pt
A true CSV file you shall assign FILENMAE like:
filename my_file CSV '...file path and name ...";
As your file is not a true CSV format then assign:
filename my_file "...file path and name ...";
when your input data is given in more than one row you need treat each row seperately:
you need to identify what kind of data is in the current row and assign it to the appropriate variables.
you need to gather (by retain) all variables of same identity (person) from it first row to its last row
and write output only after assigning all its variables.
I saked for an inpu of full 10 rows and example of output, showing how that data should parse into variables,
then I can post a code how to do it.
@Shmuelthanks for the explanation, I am traing to go in the direction you suggested, if you can posta code... even better. BRs, SH
Please run next code to create a sample file for testing:
filename sample '...sample file path and name ...";
filename my_file '... your input file....csv';
data _NULL_;
infile my_file truncover;
file sample;
input _infile_ $;
if _N_ le 10 then put _infile_;
else stop;
run;
upload and popst the sample file as is;
use any text browser (like notepad, notepad++ etc.)
and post a sketch how would you like to see the output dataset,
what shall have each variable taken from first address in the sample.
"let's put in this way: the file's extensions is .csv" - https://en.wikipedia.org/wiki/Filename_extension
Incorrect use of of an extension can lead to problems opening, and misinterpretation - try renaming a .docx to .gif for instance and opening it (and yes, you can name the extension anything you like).
As your datafile is in a particular user format, this means you wil have to write custom code to handle that particular file format. The way you do that is via reading in a text file (infile), reading in the data elements (input/format/informat), and tehn parsing the data returned if necessary.
So, if this is the input data:
Diego Costa Private Limited | z.H. Cesc Fabregas | Dr. BRUG Wenger, Loftus Road
Bistocchi 642009
Bangladesh
Then what we need to do is to read each line as a block of text (if there was a particular end character we could of course keep the pointer running, but I do not see this in this case, so I am going this way):
data want (drop=str: i);
infile "s:\temp\rob\a.txt" dlm="¬";
length str1 str2 str2 addr1 addr2 addr3 zip city $200;
input str1 $;
array addr{3};
do i=1 to countw(str1,"|");
addr{i}=scan(str1,i,"|");
end;
input zip $;
input city $;
run;
Note, I have put the above text in a text file in a.txt as given in the code, and it parses the data correctly. If your data is different, then you need to code respectively.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.