SAS Only Returning First Digit of Data in CSV Import

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 17
Accepted Solution

SAS Only Returning First Digit of Data in CSV Import

[ Edited ]

Good afternoon, All:

 

I have two different CSV files, both identical from what I can tell in terms of data structure save out of excel, but one is reporting correctly and the other is not. Long story short there was an original CSV file, I reran the process that produces the file once some new values were added (new values that aren't a part of this particular issue from what I can tell), and now the new data set only brings up the first value on certain fields once it imports into SAS. I assume there is an issue with my CSV, but I have yet to be able to identify what it might be. I just wanted to see if anyone had a solution to the problem. Below is one example of the issue.

 

 

Original import:

 

 

proc import datafile="\\this_is_the_location_of_my_data\original_file.csv"
			out=basefile_original
			dbms=csv
			replace;
run;

 

Original log:

 

 /**********************************************************************
 *   PRODUCT:   SAS
 *   VERSION:   9.4
 *   CREATOR:   External File Interface
 *   DATE:      09FEB18
 *   DESC:      Generated SAS Datastep Code
 *   TEMPLATE SOURCE:  (None Specified.)
 ***********************************************************************/
    data WORK.BASEFILE_ORIGINAL    ;
    %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
    infile '\\this_is_where_my_data_is\original.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
       informat EMPLID best32. ;
       informat APPL_NBR best32. ;
       informat LAST_NAME $17. ;
       informat FIRST_NAME $8. ;
       informat MIDDLE_INIT $1. ;
       informat SEX $1. ;
       informat ETHNIC_CD $1. ;
       informat ACAD_CAREER $4. ;
       informat ACAD_PROG $5. ;
       informat ADMIT_TERM best32. ;
       informat ADMIT_TYPE $3. ;
       informat ADMITTED $1. ;
       informat PROG_ACTION $4. ;
       informat PROG_REASON $4. ;
       informat PROG_STATUS $2. ;
       informat CENSUS_DT mmddyy10. ;
       informat MAJOR $9. ;
       informat ACADEM_LEV best32. ;
       informat FICE best32. ;
       informat HS_CODE best32. ;
       informat HS_GPA best32. ;
       informat CLASS_RANK best32. ;
       informat CLASS_SIZE best32. ;
       informat ACT_COMP best32. ;
       informat ACT_READ best32. ;
       informat ACT_MATH best32. ;
       informat ACT_WRIT best32. ;
       informat SAT_VERB best32. ;
       informat SAT_ESSAY $1. ;
       informat SAT_MATH best32. ;
       informat SAT_WRITE best32. ;
       informat SAT_EBRWS best32. ;
       informat SAT_MATHS best32. ;
       informat SAT_TOT02 best32. ;
       informat GRE_VERB $1. ;
       informat GRE_QUAN $1. ;
       informat GRE_ANLY best32. ;
       informat GRE_VERBR best32. ;
       informat GRE_QUANR best32. ;
       informat GMA_QUAN $1. ;
       informat GMA_TOTAL $1. ;
       informat GMA_ANLY $1. ;
       informat UTT_AMIND best32. ;
       informat UTT_ASIAN best32. ;
       informat UTT_BLACK best32. ;
       informat UTT_HAWAIIAN best32. ;
       informat UTT_HISPA best32. ;
       informat UTT_NONRESAL best32. ;
       informat UTT_NSPEC best32. ;
       informat UTT_WHITE best32. ;
       format EMPLID best12. ;
       format APPL_NBR best12. ;
       format LAST_NAME $17. ;
       format FIRST_NAME $8. ;
       format MIDDLE_INIT $1. ;
       format SEX $1. ;
       format ETHNIC_CD $1. ;
       format ACAD_CAREER $4. ;
       format ACAD_PROG $5. ;
       format ADMIT_TERM best12. ;
       format ADMIT_TYPE $3. ;
       format ADMITTED $1. ;
       format PROG_ACTION $4. ;
       format PROG_REASON $4. ;
       format PROG_STATUS $2. ;
       format CENSUS_DT mmddyy10. ;
       format MAJOR $9. ;
       format ACADEM_LEV best12. ;
       format FICE best12. ;
       format HS_CODE best12. ;
       format HS_GPA best12. ;
       format CLASS_RANK best12. ;
       format CLASS_SIZE best12. ;
       format ACT_COMP best12. ;
       format ACT_READ best12. ;
       format ACT_MATH best12. ;
       format ACT_WRIT best12. ;
       format SAT_VERB best12. ;
       format SAT_ESSAY $1. ;
       format SAT_MATH best12. ;
       format SAT_WRITE best12. ;
       format SAT_EBRWS best12. ;
       format SAT_MATHS best12. ;
       format SAT_TOT02 best12. ;
       format GRE_VERB $1. ;
       format GRE_QUAN $1. ;
       format GRE_ANLY best12. ;
       format GRE_VERBR best12. ;
       format GRE_QUANR best12. ;
       format GMA_QUAN $1. ;
       format GMA_TOTAL $1. ;
       format GMA_ANLY $1. ;
       format UTT_AMIND best12. ;
       format UTT_ASIAN best12. ;
       format UTT_BLACK best12. ;
       format UTT_HAWAIIAN best12. ;
       format UTT_HISPA best12. ;
       format UTT_NONRESAL best12. ;
       format UTT_NSPEC best12. ;
       format UTT_WHITE best12. ;
    input
                EMPLID
                APPL_NBR
                LAST_NAME $
                FIRST_NAME $
                MIDDLE_INIT $
                SEX $
                ETHNIC_CD $
                ACAD_CAREER $
                ACAD_PROG $
                ADMIT_TERM
                ADMIT_TYPE $
                ADMITTED $
                PROG_ACTION $
                PROG_REASON $
                PROG_STATUS $
                CENSUS_DT
                MAJOR $
                ACADEM_LEV
                FICE
                HS_CODE
                HS_GPA
                CLASS_RANK
                CLASS_SIZE
                ACT_COMP
                ACT_READ
                ACT_MATH
                ACT_WRIT
                SAT_VERB
                SAT_ESSAY $
                SAT_MATH
                SAT_WRITE
                SAT_EBRWS
                SAT_MATHS
                SAT_TOT02
                GRE_VERB $
                GRE_QUAN $
                GRE_ANLY
                GRE_VERBR
                GRE_QUANR
                GMA_QUAN $
                GMA_TOTAL $
                GMA_ANLY $
                UTT_AMIND
                UTT_ASIAN
                UTT_BLACK
                UTT_HAWAIIAN
                UTT_HISPA
                UTT_NONRESAL
                UTT_NSPEC
                UTT_WHITE
    ;
    if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
    run;

 

Updated import:

 

 

proc import datafile="\\this_is_the_location_of_my_data\updated_file.csv"
			out=basefile
			dbms=csv
			replace;
run;

 

Updated log:

 

  /**********************************************************************
  *   PRODUCT:   SAS
  *   VERSION:   9.4
  *   CREATOR:   External File Interface
  *   DATE:      09FEB18
  *   DESC:      Generated SAS Datastep Code
  *   TEMPLATE SOURCE:  (None Specified.)
  ***********************************************************************/
     data WORK.BASEFILE    ;
     %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
     infile '\\this_is_where_my_data_is\updated_file.csv' delimiter = ',' MISSOVER DSD lrecl=32767
 firstobs=2 ;
        informat EMPLID best32. ;
        informat APPL_NBR best32. ;
        informat LAST_NAME $16. ;
        informat FIRST_NAME $9. ;
        informat MIDDLE_INIT $1. ;
        informat SEX $1. ;
        informat ETHNIC_CD $1. ;
        informat ACAD_CAREER $4. ;
        informat ACAD_PROG $5. ;
        informat ADMIT_TERM best32. ;
        informat ADMIT_TYPE $3. ;
        informat ADMITTED $1. ;
        informat PROG_ACTION $4. ;
        informat PROG_REASON $4. ;
        informat PROG_STATUS $2. ;
        informat CENSUS_DT mmddyy10. ;
        informat MAJOR $10. ;
        informat ACADEM_LEV best32. ;
        informat FICE best32. ;
        informat HS_CODE $1. ;
        informat HS_GPA $1. ;
        informat CLASS_RANK $1. ;
        informat CLASS_SIZE $1. ;
        informat ACT_COMP best32. ;
        informat ACT_READ best32. ;
        informat ACT_MATH best32. ;
        informat ACT_WRIT $1. ;
        informat SAT_VERB best32. ;
        informat SAT_ESSAY best32. ;
        informat SAT_MATH best32. ;
        informat SAT_WRITE best32. ;
        informat SAT_EBRWS $1. ;
        informat SAT_MATHS $1. ;
        informat SAT_TOT02 $1. ;
        informat GRE_VERB $1. ;
        informat GRE_QUAN $1. ;
        informat GRE_ANLY best32. ;
        informat GRE_VERBR best32. ;
        informat GRE_QUANR best32. ;
        informat GMA_QUAN $1. ;
        informat GMA_TOTAL $1. ;
        informat GMA_ANLY $1. ;
        informat UTT_AMIND best32. ;
        informat UTT_ASIAN best32. ;
        informat UTT_BLACK best32. ;
        informat UTT_HAWAIIAN best32. ;
        informat UTT_HISPA best32. ;
        informat UTT_NONRESAL best32. ;
        informat UTT_NSPEC best32. ;
        informat UTT_WHITE best32. ;
        format EMPLID best12. ;
        format APPL_NBR best12. ;
        format LAST_NAME $16. ;
        format FIRST_NAME $9. ;
        format MIDDLE_INIT $1. ;
        format SEX $1. ;
        format ETHNIC_CD $1. ;
        format ACAD_CAREER $4. ;
        format ACAD_PROG $5. ;
        format ADMIT_TERM best12. ;
        format ADMIT_TYPE $3. ;
        format ADMITTED $1. ;
        format PROG_ACTION $4. ;
        format PROG_REASON $4. ;
        format PROG_STATUS $2. ;
        format CENSUS_DT mmddyy10. ;
        format MAJOR $10. ;
        format ACADEM_LEV best12. ;
        format FICE best12. ;
        format HS_CODE $1. ;
        format HS_GPA $1. ;
        format CLASS_RANK $1. ;
        format CLASS_SIZE $1. ;
        format ACT_COMP best12. ;
        format ACT_READ best12. ;
        format ACT_MATH best12. ;
        format ACT_WRIT $1. ;
        format SAT_VERB best12. ;
        format SAT_ESSAY best12. ;
        format SAT_MATH best12. ;
        format SAT_WRITE best12. ;
        format SAT_EBRWS $1. ;
        format SAT_MATHS $1. ;
        format SAT_TOT02 $1. ;
        format GRE_VERB $1. ;
        format GRE_QUAN $1. ;
        format GRE_ANLY best12. ;
        format GRE_VERBR best12. ;
        format GRE_QUANR best12. ;
        format GMA_QUAN $1. ;
        format GMA_TOTAL $1. ;
        format GMA_ANLY $1. ;
        format UTT_AMIND best12. ;
        format UTT_ASIAN best12. ;
        format UTT_BLACK best12. ;
        format UTT_HAWAIIAN best12. ;
        format UTT_HISPA best12. ;
        format UTT_NONRESAL best12. ;
        format UTT_NSPEC best12. ;
        format UTT_WHITE best12. ;
     input
                 EMPLID
                 APPL_NBR
                 LAST_NAME $
                 FIRST_NAME $
                 MIDDLE_INIT $
                 SEX $
                 ETHNIC_CD $
                 ACAD_CAREER $
                 ACAD_PROG $
                 ADMIT_TERM
                 ADMIT_TYPE $
                 ADMITTED $
                 PROG_ACTION $
                 PROG_REASON $
                 PROG_STATUS $
                 CENSUS_DT
                 MAJOR $
                 ACADEM_LEV
                 FICE
                 HS_CODE $
                 HS_GPA $
                 CLASS_RANK $
                 CLASS_SIZE $
                 ACT_COMP
                 ACT_READ
                 ACT_MATH
                 ACT_WRIT $
                 SAT_VERB
                 SAT_ESSAY
                 SAT_MATH
                 SAT_WRITE
                 SAT_EBRWS $
                 SAT_MATHS $
                 SAT_TOT02 $
                 GRE_VERB $
                 GRE_QUAN $
                 GRE_ANLY
                 GRE_VERBR
                 GRE_QUANR
                 GMA_QUAN $
                 GMA_TOTAL $
                 GMA_ANLY $
                 UTT_AMIND
                 UTT_ASIAN
                 UTT_BLACK
                 UTT_HAWAIIAN
                 UTT_HISPA
                 UTT_NONRESAL
                 UTT_NSPEC
                 UTT_WHITE
     ;
     if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
     run;

 

 

When looking at one result from the original import I get this:

 

SAT_MATHSAT_WRITESAT_EBRWSSAT_MATHSSAT_TOT02
5504906505901240

 

All numbers are as desired and match the CSV that I imported. All three fields are informat BEST32., format BEST12.

 

 

When looking at the same result in the updated imported data set, I get this:

 

SAT_MATHSAT_WRITESAT_EBRWSSAT_MATHSSAT_TOT02
550490651

 

 

As you can see it is only producing the first value of the field for the three columns to the right. The two fields to the left are coming in the same informat and being put into the same format as the original file, but the three fields to the right have an informat of $1. This is where the problem lies, but the updated CSV itself is no different from the original from what I can tell (all fields are in 'General' format when viewed in Excel).

 

 

Thank you in advance for your help.

 

 

SAS 9.4 TS1M1

 


Accepted Solutions
Solution
‎02-09-2018 04:38 PM
Super User
Posts: 23,224

Re: SAS Only Returning First Digit of Data in CSV Import

I wasnt't asking for it, I was saying you can solve your problem with it.

 

But anyways, look at the format/informat/type of SAT_EBRWS in each set of code. You'll see they're different. 


PROC IMPORT is a guessing procedure. It can and regularly does guess wrong. In these cases it's best to specify your data types yourself using a data step instead. Rather than writing your own data step from scratch though, you can take one of the imports above and fix it and then use that code on both files, just changing the file reference. 

 

 

View solution in original post


All Replies
Super User
Posts: 23,224

Re: SAS Only Returning First Digit of Data in CSV Import

When you run PROC IMPORT the data step code is generated behind the scenes in the log. Use that code instead and modify as needed. 

 

You can select the code without line numbers by holding down ALT+SHIFT while selecting the code from the log.

Occasional Contributor
Posts: 17

Re: SAS Only Returning First Digit of Data in CSV Import

By request I have added the info from the log. Not sure if I added it to the post in the correct way or not, but that's the information the log put out.

Solution
‎02-09-2018 04:38 PM
Super User
Posts: 23,224

Re: SAS Only Returning First Digit of Data in CSV Import

I wasnt't asking for it, I was saying you can solve your problem with it.

 

But anyways, look at the format/informat/type of SAT_EBRWS in each set of code. You'll see they're different. 


PROC IMPORT is a guessing procedure. It can and regularly does guess wrong. In these cases it's best to specify your data types yourself using a data step instead. Rather than writing your own data step from scratch though, you can take one of the imports above and fix it and then use that code on both files, just changing the file reference. 

 

 

Occasional Contributor
Posts: 17

Re: SAS Only Returning First Digit of Data in CSV Import

Okay. I see what you are saying now. I will do exactly that - specify the formats for the fields as they come in. While I have you though - do you know why PROC IMPORT would guess one way for a particular file and then a completely different way when the structure of the file itself is the same? That just seems rather strange to me since I haven't seen it do that prior to this instance.

Highlighted
Super User
Posts: 23,224

Re: SAS Only Returning First Digit of Data in CSV Import

There's an option GUESSINGROWS that sets how many rows are scanned. If it happens that this particular file only has 1 value for say the first 1000 records then SAS will guess incorrectly. I did forget that you can set GUESSINGROWS=MAX in your PROC IMPORT to force it to scan the whole file before importing. This usually fixes most problems but it doesn't guarantee that your fields will be read correctly as numeric/character. For example IDs are often numeric but are really character type variables. 

 

If you're doing this consistently for a file, you should use a data step. This helps in the long run because you can then know that all your variables are the same format/type which helps if you need to combine the data together. If they're different you can run into errors or 'unexpected behaviour'.

 

Super User
Super User
Posts: 7,929

Re: SAS Only Returning First Digit of Data in CSV Import

CSV files do not contain any metadata about what the fields contain. The only metadata they have is possible variable names if the file contains a header row.  So it must GUESS what data is in the file.  It looks like for your problem file the values for the three new variables were probably empty on the first few lines of the file that PROC IMPORT used to make its guess.  It can make a better guess if you add guessingrows=max statement to the proc.

 

But the real solution is just write your own data step to read the file.  It is not hard, especially if you don't waste time adding informats and formats to variables that don't need them.  You can even force your identifier variables to be read as strings instead of numbers. This is critical if they really are strings, but also useful since then you cannot accidentally generate meaningless statistics on them.

 

data WORK.BASEFILE_ORIGINAL    ;
    infile '\\this_is_where_my_data_is\original.csv' dsd firstobs=2 truncover;
    length
        EMPLID $12
        APPL_NBR $12 
        LAST_NAME $30
        FIRST_NAME $30
        MIDDLE_INIT $1
        SEX $1
        ETHNIC_CD $1
        ACAD_CAREER $4
        ACAD_PROG $5
        ADMIT_TERM 8
        ADMIT_TYPE $3
        ADMITTED $1
        PROG_ACTION $4
        PROG_REASON $4
        PROG_STATUS $2
        CENSUS_DT 8
        MAJOR $20
        ACADEM_LEV 8
        FICE 8
        HS_CODE 8
        HS_GPA 8
        CLASS_RANK 8
        CLASS_SIZE 8
        ACT_COMP 8
        ACT_READ 8
        ACT_MATH 8
        ACT_WRIT 8
        SAT_VERB 8
        SAT_ESSAY $1
        SAT_MATH 8
        SAT_WRITE 8
        SAT_EBRWS 8
        SAT_MATHS 8
        SAT_TOT02 8
        GRE_VERB $1
        GRE_QUAN $1
        GRE_ANLY 8
        GRE_VERBR 8
        GRE_QUANR 8
        GMA_QUAN $1
        GMA_TOTAL $1
        GMA_ANLY $1
        UTT_AMIND 8
        UTT_ASIAN 8
        UTT_BLACK 8
        UTT_HAWAIIAN 8
        UTT_HISPA 8
        UTT_NONRESAL 8
        UTT_NSPEC 8
        UTT_WHITE 8
    ;
    informat CENSUS_DT mmddyy10. ;
    format CENSUS_DT yymmdd10. ;
    input   EMPLID -- UTT_WHITE ;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 257 views
  • 2 likes
  • 3 in conversation