BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
maria_sssss
Calcite | Level 5

Hey everyone!

I am trying to import a dataset downloaded from US Census website, in csv format. The file has about 1700 rows and 169 columns. Each row represents a zip code. I want to import the file in a way that each zip code will be repeated 166 times (first three columns are variations of the same zip code) with only one variable that has data values. So the target dataset will have only 5 columns (zip, zip1, zip2, count var, value) and 297,804 rows (~166x1700). Here is the code i am using:

filename eddata "C:\Users\path\ACS_12_5YR_B15001_with_ann.csv";

data test;

infile eddata dlm="," missover firstobs=3;

input zip $ zip1 $ zip2$ @;

   do var = 1 to 166;

       input values $ @;

       output;

   end;

run;


I get the correct number of total rows and columns, the values that are read in are alle correct, but almost half of the rows are blank (mostly they have missing values after 83rd iteration of the do loop). The original dataset has no missing values (dataset is attached).


I am trying to understand why this is happening but with no luck so far.


I would appreciate any advice!!


Thanks,

Maria

1 ACCEPTED SOLUTION

Accepted Solutions
Fugue
Quartz | Level 8

Just to confirm what I am and Patrick are saying, you probably need to adjust LRECL. Try this:

data test;

  infile eddata dlm="," missover firstobs=3 LRECL=32767;

  input zip :$14. zip1 $ zip2 :$11. @;

  do var = 1 to 166;

    input values $ @;

    output;

  end;

run;

View solution in original post

11 REPLIES 11
RichardinOz
Quartz | Level 8

The only 'odd' thing about the data is that the third postcode variant has a space in the middle.  It probably should be wrapped in double quotes, though some sources are sloppy about that.  I suspect SAS is reading the second part as a value.  I can't test but it would be worth a try using the & modifier:

filename eddata "C:\Users\path\ACS_12_5YR_B15001_with_ann.csv";

data test;

informat zip $14. zip1 $5. zip2 $Char11. ;

infile eddata dlm="," missover firstobs=3;

input zip $ zip1 $ zip2$ & @;

   do var = 1 to 166;

       input values $ @;

       output;

   end;

run;

Incidentally, why are you importing the data in transposed form?  And why are you importing numerical values as character strings?

Richard

maria_sssss
Calcite | Level 5

thanks!

Richard, i dont need to import it as character, my mistake... still, with the corrections to the code that you suggested, all the values of 'value' are missing..

Patrick, i just ran the code (adjusting for truncation) and still getting missing values (for example for the first chunk of values all are missing after row 87).

I am testing it in SAS9.2 and SAS9.3 on WindowsVista/XP..

Patrick
Opal | Level 21

That's weird. If I run the code I see this in the data:

Capture.PNG

Opening your csv with Notepad++ I'm told it's encoded in UTF-8 without BOM. Can you please re-test with below amendment:

infile eddata dlm="," missover firstobs=3 encoding='utf-8';

maria_sssss
Calcite | Level 5

Patrixk, i tested it out with adjusted "encoding", didnt help, though LRECL solved the problem Smiley Happy

Fugue
Quartz | Level 8

Just to confirm what I am and Patrick are saying, you probably need to adjust LRECL. Try this:

data test;

  infile eddata dlm="," missover firstobs=3 LRECL=32767;

  input zip :$14. zip1 $ zip2 :$11. @;

  do var = 1 to 166;

    input values $ @;

    output;

  end;

run;

maria_sssss
Calcite | Level 5

Fugue and Patrick, you were absolutely right, LRECL solved the problem!  Smiley Happy Thanks a lot!!!!! i guess I will try now to set LRECL to a default to 32767..

Patrick
Opal | Level 21

The code and data you've posted works fine in my environment (Win7 SAS9.4) and I don't get rows with missing values.

The only thing I saw is that you're truncating your zip data and need longer informats as in below code.

data test;
  infile eddata dlm="," missover firstobs=3;
  input zip :$14. zip1 $ zip2 :$11. @;

  do var = 1 to 166;
    input values $ @;
    output;
  end;
run;

Fugue
Quartz | Level 8

Using Patrick's code, I also get the same result. The only complaint in the log window is that one or more lines were truncated (probably need to increase the LRECL option):

test.png

Patrick
Opal | Level 21

That sounds very much like the explanation. My default is set to "32767" so I wouldn't have encountered such an issue.


Tom
Super User Tom
Super User

Also note that use of MISSOVER should really be avoided. TRUNCOVER is what you want.  With missover option there is the possibility that data at the end of the line will be discarded when it is less than the number of characters that the input operation requested.  See example below:

NOTE: Remote submit to SAS94 commencing.

2    options generic;

3    filename tmp temp;

4    data _null_;

5      file tmp;

6      x='HELLO';

7      do i=1 to length(x);

8        put x $varying. i ;

9      end;

10   run;

NOTE: The file TMP is:

      (system-specific pathname),

      (system-specific file attributes)

NOTE: 5 records were written to the file (system-specific pathname).

      The minimum record length was 1.

      The maximum record length was 5.

11   data check;

12     infile tmp ;

13     input x $5.;

14     put _n_=1 x=  '_infile_=' _infile_;

15   run;

NOTE: Compression was disabled for data set WORK.CHECK because compression overhead would increase the size of the data set.

NOTE: The infile TMP is:

      (system-specific pathname),

      (system-specific file attributes)

_N_=1 x=HE _infile_=HE

_N_=2 x=HELL _infile_=HELL

_N_=3 x=HELLO _infile_=HELLO

NOTE: 5 records were read from the infile (system-specific pathname).

      The minimum record length was 1.

      The maximum record length was 5.

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

NOTE: The data set WORK.CHECK has 3 observations and 1 variables.

16   data check;

17     infile tmp missover;

18     input x $5.;

19     put _n_=1 x=  '_infile_=' _infile_;

20   run;

NOTE: Compression was disabled for data set WORK.CHECK because compression overhead would increase the size of the data set.

NOTE: The infile TMP is:

      (system-specific pathname),

      (system-specific file attributes)

_N_=1 x=  _infile_=H

_N_=2 x=  _infile_=HE

_N_=3 x=  _infile_=HEL

_N_=4 x=  _infile_=HELL

_N_=5 x=HELLO _infile_=HELLO

NOTE: 5 records were read from the infile (system-specific pathname).

      The minimum record length was 1.

      The maximum record length was 5.

NOTE: The data set WORK.CHECK has 5 observations and 1 variables.

21   data check;

22     infile tmp truncover;

23     input x $5.;

24     put _n_=1 x=  '_infile_=' _infile_;

25   run;

NOTE: Compression was disabled for data set WORK.CHECK because compression overhead would increase the size of the data set.

NOTE: The infile TMP is:

      (system-specific pathname),

      (system-specific file attributes)

_N_=1 x=H _infile_=H

_N_=2 x=HE _infile_=HE

_N_=3 x=HEL _infile_=HEL

_N_=4 x=HELL _infile_=HELL

_N_=5 x=HELLO _infile_=HELLO

NOTE: 5 records were read from the infile (system-specific pathname).

      The minimum record length was 1.

      The maximum record length was 5.

NOTE: The data set WORK.CHECK has 5 observations and 1 variables.

NOTE: Remote submit to SAS94 complete.

maria_sssss
Calcite | Level 5

Thanks, Tom!!! your example is excellent!!! TRUNCOVER it is Smiley Happy

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 4364 views
  • 9 likes
  • 5 in conversation