- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Asking for help:
I have a csv-file with 14 variable names in the first line, and the set of values for variables in the next few lines (green).
In SAS on Demand for Academcis I am aiming for a sas table with each value set in one line.
proc import datafile='/home/u59608141/1_Skills/occupations_en.csv'
out=WORK.OCCUPATIONS dbms=CSV replace;
delimiter=',';
run;
but I get only 8 variables and some of the variables distributed over several lines:
How can I change my import to get all 14 Columns and each value into one cell?
Like posts you agree with or like. Mark helpful answers as “accepted solutions”. Generally have a look at https://communities.sas.com/t5/Getting-Started/tkb-p/community_articles
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can you show an example of your actual CSV file (you posted a photograph of what some spreadsheet tool such as EXCEL thought was in the file instead). A CSV file is just plain old TEXT so you should be able to copy the lines for three of four of the observations and paste them into this forum in the pop-up window you get when you click on the Insert Code icon (looks like < / > on the menu).
Are you saying that your file should have 14 variables on each line but PROC IMPORT is seeing more than one line per observation? Or are you saying your data file is intended to have one line with multiple values and then a series of followup lines with one value per line?
If the CSV file supposed to have one line per file then perhaps it is malformed and cannot be parsed? There are two things can cause a CSV file to be be impossible to parse.
1) Having the delimiter character in one the fields, and not having that field value quoted in the CSV file.
2) Having the end of line characters in one of the fields, whether or not the field value is quoted.
To access both of these we would need to see an example of the actual file.
The second one might be fixable if the actual lines end with CRLF (normal on a DOS/Windows machine) and the embedded line breaks are only single CR or LF characters. Try using the TERMSTR=CRLF option on the INFILE (or FILENAME) statement to test that.
The first one is going to be much harder to fix. Normally you can only handle that if the extra delimiter only appear in one of the fields. Then you can read the fields before that one using normal INPUT and then parse the field after it using SCAN() on the automatic _INFILE_ variable. And whatever is left is what goes into the field that could have commas.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can you show an example of your actual CSV file (you posted a photograph of what some spreadsheet tool such as EXCEL thought was in the file instead). A CSV file is just plain old TEXT so you should be able to copy the lines for three of four of the observations and paste them into this forum in the pop-up window you get when you click on the Insert Code icon (looks like < / > on the menu).
Are you saying that your file should have 14 variables on each line but PROC IMPORT is seeing more than one line per observation? Or are you saying your data file is intended to have one line with multiple values and then a series of followup lines with one value per line?
If the CSV file supposed to have one line per file then perhaps it is malformed and cannot be parsed? There are two things can cause a CSV file to be be impossible to parse.
1) Having the delimiter character in one the fields, and not having that field value quoted in the CSV file.
2) Having the end of line characters in one of the fields, whether or not the field value is quoted.
To access both of these we would need to see an example of the actual file.
The second one might be fixable if the actual lines end with CRLF (normal on a DOS/Windows machine) and the embedded line breaks are only single CR or LF characters. Try using the TERMSTR=CRLF option on the INFILE (or FILENAME) statement to test that.
The first one is going to be much harder to fix. Normally you can only handle that if the extra delimiter only appear in one of the fields. Then you can read the fields before that one using normal INPUT and then parse the field after it using SCAN() on the automatic _INFILE_ variable. And whatever is left is what goes into the field that could have commas.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Tom ,
First step: I will give an extract on the csv-Text file corresponding to the image above:
conceptType,conceptUri,iscoGroup,preferredLabel,altLabels,hiddenLabels,status,modifiedDate,regulatedProfessionNote,scopeNote,definition,inScheme,description,code
Occupation,http://data.europa.eu/esco/occupation/00030d09-2b3a-4efd-87cc-c4ea39d27c34,2654,technical director,"technical and operations director
head of technical
director of technical arts
head of technical department
technical supervisor
technical manager",,released,2024-01-25T11:28:50.295Z,http://data.europa.eu/esco/regulated-professions/unregulated,,,"http://data.europa.eu/esco/concept-scheme/occupations,
http://data.europa.eu/esco/concept-scheme/member-occupations","Technical directors realise the artistic visions of the creators within technical constraints. They coordinate the operations of various production units, such as scene, wardrobe, sound and lighting, and make-up. They adapt the prototype and study the feasibility, implementation, operation and technical monitoring of the artistic project. They are also responsible for the stage equipment and technical equipment.",2654.1.7
Occupation,http://data.europa.eu/esco/occupation/000e93a3-d956-4e45-aacb-f12c83fedf84,8121,metal drawing machine operator,"metal drawing machine technician
metal drawing machine operative
wire drawer
draw machine operative
forming machine technician
Second step: I will read your response more closely and think about it.
Thanks
Markus
Like posts you agree with or like. Mark helpful answers as “accepted solutions”. Generally have a look at https://communities.sas.com/t5/Getting-Started/tkb-p/community_articles
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Tom,
Thanks!
Using TERMSTR=CRLF option on the INFILE (or FILENAME) statement gave me the result, I was looking for!
Best
Markus
Like posts you agree with or like. Mark helpful answers as “accepted solutions”. Generally have a look at https://communities.sas.com/t5/Getting-Started/tkb-p/community_articles
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That's good.
If you did have a file with actual end of line characters in the field values, and those values are quoted, then you can count the number of quotes to tell which end of line characters need to be replaced so that the file is readable by the DSD option of the INFILE statement.
See this macro for an example of how to do that.
https://github.com/sasutils/macros/blob/master/replace_crlf.sas