help with import a .csv using infile statement and @

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

help with import a .csv using infile statement and @

Hey everyone!

I am trying to import a dataset downloaded from US Census website, in csv format. The file has about 1700 rows and 169 columns. Each row represents a zip code. I want to import the file in a way that each zip code will be repeated 166 times (first three columns are variations of the same zip code) with only one variable that has data values. So the target dataset will have only 5 columns (zip, zip1, zip2, count var, value) and 297,804 rows (~166x1700). Here is the code i am using:

filename eddata "C:\Users\path\ACS_12_5YR_B15001_with_ann.csv";

data test;

infile eddata dlm="," missover firstobs=3;

input zip $ zip1 $ zip2$ @;

   do var = 1 to 166;

       input values $ @;

       output;

   end;

run;


I get the correct number of total rows and columns, the values that are read in are alle correct, but almost half of the rows are blank (mostly they have missing values after 83rd iteration of the do loop). The original dataset has no missing values (dataset is attached).


I am trying to understand why this is happening but with no luck so far.


I would appreciate any advice!!


Thanks,

Maria

Attachment

Accepted Solutions
Solution
‎01-17-2014 09:17 PM
Super Contributor
Posts: 307

Re: help with import a .csv using infile statement and @

Just to confirm what I am and Patrick are saying, you probably need to adjust LRECL. Try this:

data test;

  infile eddata dlm="," missover firstobs=3 LRECL=32767;

  input zip :$14. zip1 $ zip2 :$11. @;

  do var = 1 to 166;

    input values $ @;

    output;

  end;

run;

View solution in original post


All Replies
Super Contributor
Posts: 644

Re: help with import a .csv using infile statement and @

The only 'odd' thing about the data is that the third postcode variant has a space in the middle.  It probably should be wrapped in double quotes, though some sources are sloppy about that.  I suspect SAS is reading the second part as a value.  I can't test but it would be worth a try using the & modifier:

filename eddata "C:\Users\path\ACS_12_5YR_B15001_with_ann.csv";

data test;

informat zip $14. zip1 $5. zip2 $Char11. ;

infile eddata dlm="," missover firstobs=3;

input zip $ zip1 $ zip2$ & @;

   do var = 1 to 166;

       input values $ @;

       output;

   end;

run;

Incidentally, why are you importing the data in transposed form?  And why are you importing numerical values as character strings?

Richard

Occasional Contributor
Posts: 5

Re: help with import a .csv using infile statement and @

thanks!

Richard, i dont need to import it as character, my mistake... still, with the corrections to the code that you suggested, all the values of 'value' are missing..

Patrick, i just ran the code (adjusting for truncation) and still getting missing values (for example for the first chunk of values all are missing after row 87).

I am testing it in SAS9.2 and SAS9.3 on WindowsVista/XP..

Respected Advisor
Posts: 3,892

Re: help with import a .csv using infile statement and @

That's weird. If I run the code I see this in the data:

Capture.PNG

Opening your csv with Notepad++ I'm told it's encoded in UTF-8 without BOM. Can you please re-test with below amendment:

infile eddata dlm="," missover firstobs=3 encoding='utf-8';

Occasional Contributor
Posts: 5

Re: help with import a .csv using infile statement and @

Patrixk, i tested it out with adjusted "encoding", didnt help, though LRECL solved the problem Smiley Happy

Solution
‎01-17-2014 09:17 PM
Super Contributor
Posts: 307

Re: help with import a .csv using infile statement and @

Just to confirm what I am and Patrick are saying, you probably need to adjust LRECL. Try this:

data test;

  infile eddata dlm="," missover firstobs=3 LRECL=32767;

  input zip :$14. zip1 $ zip2 :$11. @;

  do var = 1 to 166;

    input values $ @;

    output;

  end;

run;

Occasional Contributor
Posts: 5

Re: help with import a .csv using infile statement and @

Fugue and Patrick, you were absolutely right, LRECL solved the problem!  Smiley Happy Thanks a lot!!!!! i guess I will try now to set LRECL to a default to 32767..

Respected Advisor
Posts: 3,892

Re: help with import a .csv using infile statement and @

The code and data you've posted works fine in my environment (Win7 SAS9.4) and I don't get rows with missing values.

The only thing I saw is that you're truncating your zip data and need longer informats as in below code.

data test;
  infile eddata dlm="," missover firstobs=3;
  input zip :$14. zip1 $ zip2 :$11. @;

  do var = 1 to 166;
    input values $ @;
    output;
  end;
run;

Super Contributor
Posts: 307

Re: help with import a .csv using infile statement and @

Using Patrick's code, I also get the same result. The only complaint in the log window is that one or more lines were truncated (probably need to increase the LRECL option):

test.png

Respected Advisor
Posts: 3,892

Re: help with import a .csv using infile statement and @

That sounds very much like the explanation. My default is set to "32767" so I wouldn't have encountered such an issue.


Super User
Super User
Posts: 6,500

Re: help with import a .csv using infile statement and @

Also note that use of MISSOVER should really be avoided. TRUNCOVER is what you want.  With missover option there is the possibility that data at the end of the line will be discarded when it is less than the number of characters that the input operation requested.  See example below:

NOTE: Remote submit to SAS94 commencing.

2    options generic;

3    filename tmp temp;

4    data _null_;

5      file tmp;

6      x='HELLO';

7      do i=1 to length(x);

8        put x $varying. i ;

9      end;

10   run;

NOTE: The file TMP is:

      (system-specific pathname),

      (system-specific file attributes)

NOTE: 5 records were written to the file (system-specific pathname).

      The minimum record length was 1.

      The maximum record length was 5.

11   data check;

12     infile tmp ;

13     input x $5.;

14     put _n_=1 x=  '_infile_=' _infile_;

15   run;

NOTE: Compression was disabled for data set WORK.CHECK because compression overhead would increase the size of the data set.

NOTE: The infile TMP is:

      (system-specific pathname),

      (system-specific file attributes)

_N_=1 x=HE _infile_=HE

_N_=2 x=HELL _infile_=HELL

_N_=3 x=HELLO _infile_=HELLO

NOTE: 5 records were read from the infile (system-specific pathname).

      The minimum record length was 1.

      The maximum record length was 5.

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

NOTE: The data set WORK.CHECK has 3 observations and 1 variables.

16   data check;

17     infile tmp missover;

18     input x $5.;

19     put _n_=1 x=  '_infile_=' _infile_;

20   run;

NOTE: Compression was disabled for data set WORK.CHECK because compression overhead would increase the size of the data set.

NOTE: The infile TMP is:

      (system-specific pathname),

      (system-specific file attributes)

_N_=1 x=  _infile_=H

_N_=2 x=  _infile_=HE

_N_=3 x=  _infile_=HEL

_N_=4 x=  _infile_=HELL

_N_=5 x=HELLO _infile_=HELLO

NOTE: 5 records were read from the infile (system-specific pathname).

      The minimum record length was 1.

      The maximum record length was 5.

NOTE: The data set WORK.CHECK has 5 observations and 1 variables.

21   data check;

22     infile tmp truncover;

23     input x $5.;

24     put _n_=1 x=  '_infile_=' _infile_;

25   run;

NOTE: Compression was disabled for data set WORK.CHECK because compression overhead would increase the size of the data set.

NOTE: The infile TMP is:

      (system-specific pathname),

      (system-specific file attributes)

_N_=1 x=H _infile_=H

_N_=2 x=HE _infile_=HE

_N_=3 x=HEL _infile_=HEL

_N_=4 x=HELL _infile_=HELL

_N_=5 x=HELLO _infile_=HELLO

NOTE: 5 records were read from the infile (system-specific pathname).

      The minimum record length was 1.

      The maximum record length was 5.

NOTE: The data set WORK.CHECK has 5 observations and 1 variables.

NOTE: Remote submit to SAS94 complete.

Occasional Contributor
Posts: 5

Re: help with import a .csv using infile statement and @

Thanks, Tom!!! your example is excellent!!! TRUNCOVER it is Smiley Happy

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 1045 views
  • 9 likes
  • 5 in conversation