New SAS User

Completely new to SAS or trying something new with SAS? Post here for help getting started.
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dbcrow
Quartz | Level 8

I'm having a devil of a time doing something that should be really easy (and is really easy with other stats programs I've used, including R and Stata):  importing a comma-delimited file into SAS.  

 

Here're the first few lines from the file ("Intercensal_2000-2010_DBInput_csv.txt", a publicly available data file from California's Department of Finance):  

 

"CountyCode","CountyName","Year","RaceCode","RaceName","Gender","Age","Population"
"01","Alameda",4/1/2000 0:00:00,"1","White","Female",0,2701.000000000
"01","Alameda",4/1/2000 0:00:00,"1","White","Female",1,2722.000000000
"01","Alameda",4/1/2000 0:00:00,"1","White","Female",2,2707.000000000
"01","Alameda",4/1/2000 0:00:00,"1","White","Female",3,2710.000000000
"01","Alameda",4/1/2000 0:00:00,"1","White","Female",4,2888.000000000
"01","Alameda",4/1/2000 0:00:00,"1","White","Female",5,2891.000000000
"01","Alameda",4/1/2000 0:00:00,"1","White","Female",6,2962.000000000

 

I can successfully import them with this "proc import" code, but some fields (CountyName and RaceName) are truncated.  

 

proc import datafile="[File Path]\Intercensal_2000-2010_DBInput_csv.txt" out=pop.population replace dbms=csv;
	delimiter = ",";
	getnames=yes;
run;

 

My understanding is that using "infile" in a data statement gives me more flexibility in specifying informats, lengths, inputs, etc., but getting the right combination, dealing with the double quotation marks, and so on is also proving to be a hassle.  So I tried using the Data Import Wizard, and get exactly the right file,  However, I substitute this Wizard-generated code:  

 

 INFILE 'C:[FilePath]\Local\Temp\SEG23860\Intercensal_2000-2010_DBInput_csv-e23e36cbe8d24b5d9fb12eca6ee6c71d.txt'
        LRECL=75
        ENCODING="WLATIN1"
        TERMSTR=CRLF
        DLM='7F'x
        MISSOVER
        DSD ;

with this code, containing the local file path:  

 

 

  INFILE "H:\[LocalFilePath]\Population\Intercensal_2000-2010_DBInput_csv.txt"
        LRECL=75
        ENCODING="WLATIN1"
        TERMSTR=CRLF
        DLM='7F'x
        MISSOVER
        DSD ;

I get a SAS dataset with the correct number of rows and columns, and correctly-named variable headers, but all missing data.  What gives?  

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
proc import datafile="[File Path]\Intercensal_2000-2010_DBInput_csv.txt" out=pop.population replace dbms=csv;
delimiter = ",";
getnames=yes;
guessingrows=100000;
run;

Try that first.
It'll take longer.

View solution in original post

16 REPLIES 16
Reeza
Super User
proc import datafile="[File Path]\Intercensal_2000-2010_DBInput_csv.txt" out=pop.population replace dbms=csv;
delimiter = ",";
getnames=yes;
guessingrows=100000;
run;

Try that first.
It'll take longer.
dbcrow
Quartz | Level 8

Have no idea why, but worked like a charm!  Thanks, Reeza.  

 

David

Reeza
Super User
GUESSINGROWS option was added. That forces SAS to scan more lines before it determines the length and types for fields. In general, it's never a good idea to use PROC IMPORT because it guesses, and not always well. For text files, it's generally a good idea to write your data step and I'm guessing files from the gov have decent enough docs that would make that easy. But this is also a good way to get around quickly. I use this route only when it's a one time read. If I need to read a data set regularly, I'll write a data step.
Reeza
Super User
And INFILE only creates a connection to a data set, it doesn't read anything in, INPUT would be used to read the data. And usually the option would be TRUNCOVER not MISSOVER.
dbcrow
Quartz | Level 8

Thanks.  I didn't post the complete code, which contains informats, input, statements, etc.  It's exactly the code generated by the import wizard, and just substitutes the correct local file path (infile "H:\EPICenter\Population\Intercensal_2000-2010_DBInput_csv.txt) for the temporary file path used by the wizard (INFILE 'C:\Users\DCrow\AppData\Local\Temp\SEG23860\Intercensal_2000-2010_DBInput_csv-e23e36cbe8d24b5d9fb12eca6ee6c71d.txt').  That's literally the only difference between the wizard-generated code, which works, and the code with the local file path, which doesn't work.  

 

Any ideas why my code isn't working?  

 

Here's my complete code:  

 

 

libname pop "H:\EPICenter\Population";
data pop.pop2000_2010_2;
    LENGTH
        CountyCode         8
        CountyName       $ 13
        Year               8
        RaceCode           8
        RaceName         $ 16
        Gender           $ 6
        Age                8
        Population         8 ;
    FORMAT
        CountyCode       BEST2.
        CountyName       $CHAR13.
        Year             DATETIME18.
        RaceCode         BEST1.
        RaceName         $CHAR16.
        Gender           $CHAR6.
        Age              BEST3.
        Population       BEST16. ;
    INFORMAT
        CountyCode       BEST2.
        CountyName       $CHAR13.
        Year             DATETIME18.
        RaceCode         BEST1.
        RaceName         $CHAR16.
        Gender           $CHAR6.
        Age              BEST3.
        Population       BEST16. ;
    INFILE "H:\EPICenter\Population\Intercensal_2000-2010_DBInput_csv.txt"
        LRECL=75
        ENCODING="WLATIN1"
        TERMSTR=CRLF
        DLM='7F'x
        MISSOVER
        DSD ;
    INPUT
        CountyCode       : ?? BEST2.
        CountyName       : $CHAR13.
        Year             : ?? ANYDTDTM16.
        RaceCode         : ?? BEST1.
        RaceName         : $CHAR16.
        Gender           : $CHAR6.
        Age              : ?? BEST3.
        Population       : ?? COMMA16. ;
RUN;
*/

 

 

Reeza
Super User
PROC IMPORT code worked? Is SAS on a Server there? If you're running that code, you'd have to be on local mode.
dbcrow
Quartz | Level 8

Yes, SAS is on a server.  Let me look into local mode--although, to be accurate, the file I'm trying to read in is on H:\, a server drive.  

Reeza
Super User
      data WORK.WANT    ;


      infile 'H:\EPICenter\Population\Intercensal_2000-2010_DBInput_csv.txt' delimiter = ',' TRUNCOVER DSD lrecl=100 firstobs=2  termstr=CRLF encoding="WLATIN1";
         informat CountyCode $4. ;
         informat CountyName $11. ;
         informat Year anydtdtm40. ;
         informat RaceCode $3. ;
         informat RaceName $18. ;
         informat Gender $8. ;
         informat Age best32. ;
         informat Population best32. ;
         format CountyCode $4. ;
         format CountyName $11. ;
         format Year datetime. ;
         format RaceCode $3. ;
         format RaceName $18. ;
         format Gender $8. ;
         format Age best12. ;
         format Population best12. ;
      input
                  CountyCode  $
                  CountyName  $
                  Year
                  RaceCode  $
                  RaceName  $
                  Gender  $
                  Age
                  Population
      ;

      run;

What happens with this code?

dbcrow
Quartz | Level 8

Totally works.  Was it adding "truncover" instead of "missover"?  Also, what lines in the log pointed out the error?  

 

Thanks for all the hand-holding.  I'm still learning the ropes in SAS, so explicit orientation really helps me understand a lot of the steps that more experienced SAS users take for granted.  

Reeza
Super User
I downloaded the file and looked at the documentation to determine how to read your file. I didn't really look at your code to see it does/doesn't work, but the colons and '07' confused me though I suppose EG generates it that way.
Reeza
Super User
Posting the log may help with telling you where the error is.
dbcrow
Quartz | Level 8

Reeza-

 

I just posted the log in reply to ballardw.  I'd appreciate any further help you could give.  

 

Regards,

David

ballardw
Super User

@dbcrow wrote:

Thanks.  I didn't post the complete code, which contains informats, input, statements, etc.  It's exactly the code generated by the import wizard, and just substitutes the correct local file path (infile "H:\EPICenter\Population\Intercensal_2000-2010_DBInput_csv.txt) for the temporary file path used by the wizard (INFILE 'C:\Users\DCrow\AppData\Local\Temp\SEG23860\Intercensal_2000-2010_DBInput_csv-e23e36cbe8d24b5d9fb12eca6ee6c71d.txt').  That's literally the only difference between the wizard-generated code, which works, and the code with the local file path, which doesn't work.  

 

Any ideas why my code isn't working?  

  

 


"Isn't working" is awful vague.

Are there errors in the log?: Post the code and log in a code box opened with the {i} to maintain formatting of error messages.

No output? Post any log in a code box.

Unexpected output? Provide input data in the form of text pasted into a code box, the actual results and the expected results.

 

If you have an INFORMAT there should be no reason to repeat it on the INPUT statement.

dbcrow
Quartz | Level 8

Thanks for this reply, ballardw.  

 

You'll notice that in my original post, I was more specific about "not working." I get a SAS data file with the correct number of rows and columns, and correctly named variables, but missing data in all the cells.  I thought this description was sufficient to give an understanding of the problem, but here's a picture of the data set produced.  The following million rows are the same, but I trust the first ten are sufficient to see the problem with the outputted data set.  

 

data_set.png

 

And here's the code again.  Note that this exactly the code that the Import Wizard generated, except that the path file refers to a server drive instead of a temporary file on my local C: drive.  

 

libname pop "H:\EPICenter\Population";
data pop.pop2000_2010_2;
    LENGTH
        CountyCode         8
        CountyName       $ 13
        Year               8
        RaceCode           8
        RaceName         $ 16
        Gender           $ 6
        Age                8
        Population         8 ;
    FORMAT
        CountyCode       BEST2.
        CountyName       $CHAR13.
        Year             DATETIME18.
        RaceCode         BEST1.
        RaceName         $CHAR16.
        Gender           $CHAR6.
        Age              BEST3.
        Population       BEST16. ;
    INFORMAT
        CountyCode       BEST2.
        CountyName       $CHAR13.
        Year             DATETIME18.
        RaceCode         BEST1.
        RaceName         $CHAR16.
        Gender           $CHAR6.
        Age              BEST3.
        Population       BEST16. ;
    INFILE "H:\EPICenter\Population\Intercensal_2000-2010_DBInput_csv.txt"
        LRECL=75
        ENCODING="WLATIN1"
        TERMSTR=CRLF
        DLM='7F'x
        MISSOVER
        DSD ;
    INPUT
        CountyCode       : ?? BEST2.
        CountyName       : $CHAR13.
        Year             : ?? ANYDTDTM16.
        RaceCode         : ?? BEST1.
        RaceName         : $CHAR16.
        Gender           : $CHAR6.
        Age              : ?? BEST3.
        Population       : ?? COMMA16. ;
RUN;

And here's the complete log:  

 

1                                                          The SAS System                            13:59 Tuesday, January 15, 2019

1          ;*';*";*/;quit;run;
2          OPTIONS PAGENO=MIN;
3          %LET _CLIENTTASKLABEL='Program';
4          %LET _CLIENTPROCESSFLOWNAME='Process Flow';
5          %LET _CLIENTPROJECTPATH='H:\EPICenter\Population\2000_2009_pop.egp';
6          %LET _CLIENTPROJECTPATHHOST='CDI1542D6018062';
7          %LET _CLIENTPROJECTNAME='2000_2009_pop.egp';
8          %LET _SASPROGRAMFILE='';
9          %LET _SASPROGRAMFILEHOST='';
10         
11         ODS _ALL_ CLOSE;
12         OPTIONS DEV=PNG;
13         GOPTIONS XPIXELS=0 YPIXELS=0;
14         FILENAME EGSR TEMP;
15         ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR
16             STYLE=HTMLBlue
17             STYLESHEET=(URL="file:///C:/Program%20Files%20(x86)/SASHome/x86/SASEnterpriseGuide/7.1/Styles/HTMLBlue.css")
18             GPATH=&sasworklocation
19             ENCODING=UTF8
20             options(rolap="on")
21         ;
NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
22         
23         GOPTIONS ACCESSIBLE;
24         libname pop "H:\EPICenter\Population";
NOTE: Libref POP was successfully assigned as follows: 
      Engine:        V9 
      Physical Name: H:\EPICenter\Population
25         data pop.pop2000_2010_2;
26             LENGTH
27                 CountyCode         8
28                 CountyName       $ 13
29                 Year               8
30                 RaceCode           8
31                 RaceName         $ 16
32                 Gender           $ 6
33                 Age                8
34                 Population         8 ;
35             FORMAT
36                 CountyCode       BEST2.
37                 CountyName       $CHAR13.
38                 Year             DATETIME18.
39                 RaceCode         BEST1.
40                 RaceName         $CHAR16.
41                 Gender           $CHAR6.
42                 Age              BEST3.
43                 Population       BEST16. ;
44             INFORMAT
45                 CountyCode       BEST2.
46                 CountyName       $CHAR13.
47                 Year             DATETIME18.
48                 RaceCode         BEST1.
49                 RaceName         $CHAR16.
50                 Gender           $CHAR6.
51                 Age              BEST3.
52                 Population       BEST16. ;
53             INFILE "H:\EPICenter\Population\Intercensal_2000-2010_DBInput_csv.txt"
54                 LRECL=75
2                                                          The SAS System                            13:59 Tuesday, January 15, 2019

55                 ENCODING="WLATIN1"
56                 TERMSTR=CRLF
57                 DLM='7F'x
58                 MISSOVER
59                 DSD ;
60             INPUT
61                 CountyCode       : ?? BEST2.
62                 CountyName       : $CHAR13.
63                 Year             : ?? ANYDTDTM16.
64                 RaceCode         : ?? BEST1.
65                 RaceName         : $CHAR16.
66                 Gender           : $CHAR6.
67                 Age              : ?? BEST3.
68                 Population       : ?? COMMA16. ;
69         RUN;

NOTE: The infile "H:\EPICenter\Population\Intercensal_2000-2010_DBInput_csv.txt" is:
      Filename=H:\EPICenter\Population\Intercensal_2000-2010_DBInput_csv.txt,
      RECFM=V,LRECL=75,File Size (bytes)=73342782,
      Last Modified=19Mar2013:10:37:42,
      Create Time=19Mar2013:10:37:42

NOTE: 1001109 records were read from the infile "H:\EPICenter\Population\Intercensal_2000-2010_DBInput_csv.txt".
      The minimum record length was 61.
      The maximum record length was 75.
      One or more lines were truncated.
NOTE: The data set POP.POP2000_2010_2 has 1001109 observations and 8 variables.
NOTE: DATA statement used (Total process time):
      real time           2.10 seconds
      cpu time            0.60 seconds
      

70         */
71         
72         GOPTIONS NOACCESSIBLE;
73         %LET _CLIENTTASKLABEL=;
74         %LET _CLIENTPROCESSFLOWNAME=;
75         %LET _CLIENTPROJECTPATH=;
76         %LET _CLIENTPROJECTPATHHOST=;
77         %LET _CLIENTPROJECTNAME=;
78         %LET _SASPROGRAMFILE=;
79         %LET _SASPROGRAMFILEHOST=;
80         
81         ;*';*";*/;quit;run;
82         ODS _ALL_ CLOSE;
83         
84         
85         QUIT; RUN;
86         

I don't know how to interpret this log file well enough to know what the problem is.  Maybe it has to do with "one or more lines are truncated," but that doesn't tell me much.  Maybe it tells you more!  🙂  

 

Thanks for the clarification on INFORMATS and INPUTS.  As I say, I'm just using the code generated by the import wizard.  Maybe I'll streamline it once I get it to work.  

 

Again, I'd appreciate any advice you could give.  

David

 

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 16 replies
  • 4117 views
  • 6 likes
  • 3 in conversation