11-02-2016 02:54 PM - last edited on 11-02-2016 04:45 PM by Reeza
Lase week I posted a question on importing CSV file. Some one posted some suggestions. But it didnot help. I did some work and have some questions on it.
In the dataset, why it showing only first two variable names and rest as VAR3 to VAR13. The third variable name is shown under second varianle.
I am attaching the csv file that I am trying to download. Please help how to get the dataset with all variable names and without any errors
The code from log when I downloaded maually:
data WORK.dv ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile 'C:\Users\rr\Desktop\PEC.csv' delimiter = ',' missover DSD lrecl=32767
informat Deviation__ best32.;
informat Screening $6. ;
informat VAR3 best32. ;
informat VAR4 $171. ;
informat VAR5 $9. ;
informat VAR6 DATE8. ;
informat VAR7 DATE9. ;
informat VAR8 $5. ;
informat VAR9 $32. ;
informat VAR10 $9. ;
informat VAR11 $2. ;
informat VAR12 $2. ;
informat VAR13 $4. ;
format Deviation__ best12. ;
format Screening $6. ;
format VAR3 best12. ;
format VAR4 $171. ;
format VAR5 $9. ;
format VAR6 DATE8. ;
format VAR7 DATE9. ;
format VAR8 $5. ;
format VAR9 $32. ;
format VAR10 $9. ;
format VAR11 $2. ;
format VAR12 $2. ;
format VAR13 $4. ;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
I also tried this code, which resulted in error message "Import procedure not successful."
proc import datafile="C:\Users\nn\Desktop\pp.csv" DBMS=csv out=mm;
guessingrows = 100;
11-03-2016 11:56 AM - edited 11-03-2016 11:57 AM
You don't even need to upload a CSV file. Just copy some sample lines and paste them into the editor. Open the file with WordPad or NotePad or other text editor tool. Copy a the header row and a couple of data rows and paste them into a code block in your message. Like this:
Id,Var1,Var2 1234,3.4,Great 2345,5.6,Terrible
In reality just going through the process to attempt to do that will probably show you the answer to your original question.
11-03-2016 12:04 PM
Number",Subject ID Number,Description of Deviation - Physical Examination,Study Visit Day,Date Deviation Occurred,Date PD Identified by CRA,Classification (Minor or Major),Category,Outcome,Reported to RA,Reported to IRB/IEC,Action required
1,Q-0000,1000,"During the screening on 1JUN11 the alcohol breath test was not performed, however the subject returned to the clinic to perform this assessment on the 10th of June 2006.",Screening,3-Jun-06,23-Jun-06,Minor,alcohol breath test,continued,NA,NA,None
11-03-2016 12:18 PM
Looks like your CSV file has a line break in the middle of the header for the second column. Hence the confusion since it looks like two lines. If you generated this on Windows then there is a chance that you can use the TERMSTR= option to get SAS to find the lines properly.
filename myfile 'myfilename.csv' termstr=crlf ; proc import datafile=myfile ....
11-02-2016 04:03 PM - edited 11-02-2016 04:05 PM
Much easier to read a XLSX file than a CSV (as long as you are using a recent version of SAS).
But your column headers do not make very good variable names.
8 libname x xlsx "&path\PEC1.xlsx"; NOTE: Libref X was successfully assigned as follows: Engine: XLSX Physical Name: C:\Users\XXXXXX\Downloads\PEC1.xlsx 9 proc copy inlib=x outlib=work; run; NOTE: Copying X.PEC to WORK.PEC (memtype=DATA). NOTE: Variable Name Change. Deoooooon # -> Deoooooon__ NOTE: Variable Name Change. Sooooning N00ber -> Sooooning___N00ber NOTE: Variable Name Change. Sooooot ID N -> Sooooot_ID_____________________N NOTE: Variable Name Change. Deoooooooon of Devooooon - Phooo -> Deoooooooon_of_Devooooon___Phooo NOTE: Variable Name Change. Stooy Vioot Doy -> Stooy_Vioot_Doy NOTE: Variable Name Change. Date Devioooon Ocooored -> Date_Devioooon_Ocooored NOTE: Variable Name Change. Dooe PO Ideooooood by COA -> Dooe_PO_Ideooooood_by_COA NOTE: Variable Name Change. Clasooooooooon (Mooor or Mooor) -> Clasooooooooon__Mooor_or_Mooor_ NOTE: Variable Name Change. Rooorted to RA -> Rooorted_to_RA NOTE: Variable Name Change. Rooorted to IRB/IEC -> Rooorted_to_IRB_IEC NOTE: Variable Name Change. Aoooon roooired -> Aoooon_roooired NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used. NOTE: The import data set has 1 observations and 13 variables. NOTE: There were 1 observations read from the data set X.PEC. NOTE: The data set WORK.PEC has 1 observations and 13 variables. NOTE: PROCEDURE COPY used (Total process time): real time 0.03 seconds cpu time 0.03 seconds 10 proc print; run; WARNING: Data too long for column "Deoooooooon_of_Devooooon___Phooo"; truncated to 103 characters to fit. NOTE: There were 1 observations read from the data set WORK.PEC. NOTE: PROCEDURE PRINT used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
11-02-2016 05:24 PM
There are 2 main reasons you would get VARx for variables when using proc import for a csv and all are related to the content of the column heading.
1) Missing if your csv header row has any consecutive commas not enclosed in quotes then there is nothing SAS sees to assign the variable and basically uses VAR plus the column number. I suspect that your header looked like:
if you open the CSV in Wordpad.
2) The text of the column heading is identical to another column and exceeds 32 characters. Something
The underscore after FIRST is at position 33 so the first 32 characters are the same. Since SAS has a current limit of 32 characters for a variable name the first one gets truncated to 32 characters. The second gets VARx named as SAS doesn't attempt to guess how to truncate an supply suffix.
Note that if you have columns with the same name in your CSV such as INCOME you can end up with INCOME, INCOME2, INCOME3 because SAS has room to add the counter.
You may also have some interesting side effects if your column headings have many characters that are not allowed in SAS variable names as they will all be replaced by underscores.
One of the major advantages of writing, or modifying Proc Import generated code, is that YOU can assign meaningful or at least short variable names and add LABEL statements to provide more text for documenting what that variable means.
I routinely deal with exported data that has column headings like "When is this information being collected" and assign a variable name like CollectionDate (if a date variable) or CollectionOccasion if something like Intake or Update values.