- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm having a devil of a time doing something that should be really easy (and is really easy with other stats programs I've used, including R and Stata): importing a comma-delimited file into SAS.
Here're the first few lines from the file ("Intercensal_2000-2010_DBInput_csv.txt", a publicly available data file from California's Department of Finance):
"CountyCode","CountyName","Year","RaceCode","RaceName","Gender","Age","Population"
"01","Alameda",4/1/2000 0:00:00,"1","White","Female",0,2701.000000000
"01","Alameda",4/1/2000 0:00:00,"1","White","Female",1,2722.000000000
"01","Alameda",4/1/2000 0:00:00,"1","White","Female",2,2707.000000000
"01","Alameda",4/1/2000 0:00:00,"1","White","Female",3,2710.000000000
"01","Alameda",4/1/2000 0:00:00,"1","White","Female",4,2888.000000000
"01","Alameda",4/1/2000 0:00:00,"1","White","Female",5,2891.000000000
"01","Alameda",4/1/2000 0:00:00,"1","White","Female",6,2962.000000000
I can successfully import them with this "proc import" code, but some fields (CountyName and RaceName) are truncated.
proc import datafile="[File Path]\Intercensal_2000-2010_DBInput_csv.txt" out=pop.population replace dbms=csv;
delimiter = ",";
getnames=yes;
run;
My understanding is that using "infile" in a data statement gives me more flexibility in specifying informats, lengths, inputs, etc., but getting the right combination, dealing with the double quotation marks, and so on is also proving to be a hassle. So I tried using the Data Import Wizard, and get exactly the right file, However, I substitute this Wizard-generated code:
INFILE 'C:[FilePath]\Local\Temp\SEG23860\Intercensal_2000-2010_DBInput_csv-e23e36cbe8d24b5d9fb12eca6ee6c71d.txt'
LRECL=75
ENCODING="WLATIN1"
TERMSTR=CRLF
DLM='7F'x
MISSOVER
DSD ;
with this code, containing the local file path:
INFILE "H:\[LocalFilePath]\Population\Intercensal_2000-2010_DBInput_csv.txt"
LRECL=75
ENCODING="WLATIN1"
TERMSTR=CRLF
DLM='7F'x
MISSOVER
DSD ;
I get a SAS dataset with the correct number of rows and columns, and correctly-named variable headers, but all missing data. What gives?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
delimiter = ",";
getnames=yes;
guessingrows=100000;
run;
Try that first.
It'll take longer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
delimiter = ",";
getnames=yes;
guessingrows=100000;
run;
Try that first.
It'll take longer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Have no idea why, but worked like a charm! Thanks, Reeza.
David
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. I didn't post the complete code, which contains informats, input, statements, etc. It's exactly the code generated by the import wizard, and just substitutes the correct local file path (infile "H:\EPICenter\Population\Intercensal_2000-2010_DBInput_csv.txt) for the temporary file path used by the wizard (INFILE 'C:\Users\DCrow\AppData\Local\Temp\SEG23860\Intercensal_2000-2010_DBInput_csv-e23e36cbe8d24b5d9fb12eca6ee6c71d.txt'). That's literally the only difference between the wizard-generated code, which works, and the code with the local file path, which doesn't work.
Any ideas why my code isn't working?
Here's my complete code:
libname pop "H:\EPICenter\Population"; data pop.pop2000_2010_2; LENGTH CountyCode 8 CountyName $ 13 Year 8 RaceCode 8 RaceName $ 16 Gender $ 6 Age 8 Population 8 ; FORMAT CountyCode BEST2. CountyName $CHAR13. Year DATETIME18. RaceCode BEST1. RaceName $CHAR16. Gender $CHAR6. Age BEST3. Population BEST16. ; INFORMAT CountyCode BEST2. CountyName $CHAR13. Year DATETIME18. RaceCode BEST1. RaceName $CHAR16. Gender $CHAR6. Age BEST3. Population BEST16. ; INFILE "H:\EPICenter\Population\Intercensal_2000-2010_DBInput_csv.txt" LRECL=75 ENCODING="WLATIN1" TERMSTR=CRLF DLM='7F'x MISSOVER DSD ; INPUT CountyCode : ?? BEST2. CountyName : $CHAR13. Year : ?? ANYDTDTM16. RaceCode : ?? BEST1. RaceName : $CHAR16. Gender : $CHAR6. Age : ?? BEST3. Population : ?? COMMA16. ; RUN; */
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, SAS is on a server. Let me look into local mode--although, to be accurate, the file I'm trying to read in is on H:\, a server drive.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data WORK.WANT ; infile 'H:\EPICenter\Population\Intercensal_2000-2010_DBInput_csv.txt' delimiter = ',' TRUNCOVER DSD lrecl=100 firstobs=2 termstr=CRLF encoding="WLATIN1"; informat CountyCode $4. ; informat CountyName $11. ; informat Year anydtdtm40. ; informat RaceCode $3. ; informat RaceName $18. ; informat Gender $8. ; informat Age best32. ; informat Population best32. ; format CountyCode $4. ; format CountyName $11. ; format Year datetime. ; format RaceCode $3. ; format RaceName $18. ; format Gender $8. ; format Age best12. ; format Population best12. ; input CountyCode $ CountyName $ Year RaceCode $ RaceName $ Gender $ Age Population ; run;
What happens with this code?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Totally works. Was it adding "truncover" instead of "missover"? Also, what lines in the log pointed out the error?
Thanks for all the hand-holding. I'm still learning the ropes in SAS, so explicit orientation really helps me understand a lot of the steps that more experienced SAS users take for granted.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Reeza-
I just posted the log in reply to ballardw. I'd appreciate any further help you could give.
Regards,
David
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@dbcrow wrote:
Thanks. I didn't post the complete code, which contains informats, input, statements, etc. It's exactly the code generated by the import wizard, and just substitutes the correct local file path (infile "H:\EPICenter\Population\Intercensal_2000-2010_DBInput_csv.txt) for the temporary file path used by the wizard (INFILE 'C:\Users\DCrow\AppData\Local\Temp\SEG23860\Intercensal_2000-2010_DBInput_csv-e23e36cbe8d24b5d9fb12eca6ee6c71d.txt'). That's literally the only difference between the wizard-generated code, which works, and the code with the local file path, which doesn't work.
Any ideas why my code isn't working?
"Isn't working" is awful vague.
Are there errors in the log?: Post the code and log in a code box opened with the {i} to maintain formatting of error messages.
No output? Post any log in a code box.
Unexpected output? Provide input data in the form of text pasted into a code box, the actual results and the expected results.
If you have an INFORMAT there should be no reason to repeat it on the INPUT statement.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for this reply, ballardw.
You'll notice that in my original post, I was more specific about "not working." I get a SAS data file with the correct number of rows and columns, and correctly named variables, but missing data in all the cells. I thought this description was sufficient to give an understanding of the problem, but here's a picture of the data set produced. The following million rows are the same, but I trust the first ten are sufficient to see the problem with the outputted data set.
And here's the code again. Note that this exactly the code that the Import Wizard generated, except that the path file refers to a server drive instead of a temporary file on my local C: drive.
libname pop "H:\EPICenter\Population"; data pop.pop2000_2010_2; LENGTH CountyCode 8 CountyName $ 13 Year 8 RaceCode 8 RaceName $ 16 Gender $ 6 Age 8 Population 8 ; FORMAT CountyCode BEST2. CountyName $CHAR13. Year DATETIME18. RaceCode BEST1. RaceName $CHAR16. Gender $CHAR6. Age BEST3. Population BEST16. ; INFORMAT CountyCode BEST2. CountyName $CHAR13. Year DATETIME18. RaceCode BEST1. RaceName $CHAR16. Gender $CHAR6. Age BEST3. Population BEST16. ; INFILE "H:\EPICenter\Population\Intercensal_2000-2010_DBInput_csv.txt" LRECL=75 ENCODING="WLATIN1" TERMSTR=CRLF DLM='7F'x MISSOVER DSD ; INPUT CountyCode : ?? BEST2. CountyName : $CHAR13. Year : ?? ANYDTDTM16. RaceCode : ?? BEST1. RaceName : $CHAR16. Gender : $CHAR6. Age : ?? BEST3. Population : ?? COMMA16. ; RUN;
And here's the complete log:
1 The SAS System 13:59 Tuesday, January 15, 2019 1 ;*';*";*/;quit;run; 2 OPTIONS PAGENO=MIN; 3 %LET _CLIENTTASKLABEL='Program'; 4 %LET _CLIENTPROCESSFLOWNAME='Process Flow'; 5 %LET _CLIENTPROJECTPATH='H:\EPICenter\Population\2000_2009_pop.egp'; 6 %LET _CLIENTPROJECTPATHHOST='CDI1542D6018062'; 7 %LET _CLIENTPROJECTNAME='2000_2009_pop.egp'; 8 %LET _SASPROGRAMFILE=''; 9 %LET _SASPROGRAMFILEHOST=''; 10 11 ODS _ALL_ CLOSE; 12 OPTIONS DEV=PNG; 13 GOPTIONS XPIXELS=0 YPIXELS=0; 14 FILENAME EGSR TEMP; 15 ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR 16 STYLE=HTMLBlue 17 STYLESHEET=(URL="file:///C:/Program%20Files%20(x86)/SASHome/x86/SASEnterpriseGuide/7.1/Styles/HTMLBlue.css") 18 GPATH=&sasworklocation 19 ENCODING=UTF8 20 options(rolap="on") 21 ; NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR 22 23 GOPTIONS ACCESSIBLE; 24 libname pop "H:\EPICenter\Population"; NOTE: Libref POP was successfully assigned as follows: Engine: V9 Physical Name: H:\EPICenter\Population 25 data pop.pop2000_2010_2; 26 LENGTH 27 CountyCode 8 28 CountyName $ 13 29 Year 8 30 RaceCode 8 31 RaceName $ 16 32 Gender $ 6 33 Age 8 34 Population 8 ; 35 FORMAT 36 CountyCode BEST2. 37 CountyName $CHAR13. 38 Year DATETIME18. 39 RaceCode BEST1. 40 RaceName $CHAR16. 41 Gender $CHAR6. 42 Age BEST3. 43 Population BEST16. ; 44 INFORMAT 45 CountyCode BEST2. 46 CountyName $CHAR13. 47 Year DATETIME18. 48 RaceCode BEST1. 49 RaceName $CHAR16. 50 Gender $CHAR6. 51 Age BEST3. 52 Population BEST16. ; 53 INFILE "H:\EPICenter\Population\Intercensal_2000-2010_DBInput_csv.txt" 54 LRECL=75 2 The SAS System 13:59 Tuesday, January 15, 2019 55 ENCODING="WLATIN1" 56 TERMSTR=CRLF 57 DLM='7F'x 58 MISSOVER 59 DSD ; 60 INPUT 61 CountyCode : ?? BEST2. 62 CountyName : $CHAR13. 63 Year : ?? ANYDTDTM16. 64 RaceCode : ?? BEST1. 65 RaceName : $CHAR16. 66 Gender : $CHAR6. 67 Age : ?? BEST3. 68 Population : ?? COMMA16. ; 69 RUN; NOTE: The infile "H:\EPICenter\Population\Intercensal_2000-2010_DBInput_csv.txt" is: Filename=H:\EPICenter\Population\Intercensal_2000-2010_DBInput_csv.txt, RECFM=V,LRECL=75,File Size (bytes)=73342782, Last Modified=19Mar2013:10:37:42, Create Time=19Mar2013:10:37:42 NOTE: 1001109 records were read from the infile "H:\EPICenter\Population\Intercensal_2000-2010_DBInput_csv.txt". The minimum record length was 61. The maximum record length was 75. One or more lines were truncated. NOTE: The data set POP.POP2000_2010_2 has 1001109 observations and 8 variables. NOTE: DATA statement used (Total process time): real time 2.10 seconds cpu time 0.60 seconds 70 */ 71 72 GOPTIONS NOACCESSIBLE; 73 %LET _CLIENTTASKLABEL=; 74 %LET _CLIENTPROCESSFLOWNAME=; 75 %LET _CLIENTPROJECTPATH=; 76 %LET _CLIENTPROJECTPATHHOST=; 77 %LET _CLIENTPROJECTNAME=; 78 %LET _SASPROGRAMFILE=; 79 %LET _SASPROGRAMFILEHOST=; 80 81 ;*';*";*/;quit;run; 82 ODS _ALL_ CLOSE; 83 84 85 QUIT; RUN; 86
I don't know how to interpret this log file well enough to know what the problem is. Maybe it has to do with "one or more lines are truncated," but that doesn't tell me much. Maybe it tells you more! 🙂
Thanks for the clarification on INFORMATS and INPUTS. As I say, I'm just using the code generated by the import wizard. Maybe I'll streamline it once I get it to work.
Again, I'd appreciate any advice you could give.
David