BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
cleant
Calcite | Level 5

Hi,

 

The code below sets up data for measuring the driving distances between two addresses. It currently works using datalines but filling in those datalines can be onerous. How would I go about updating this code so that I can import a csv that creates the same result? Is there any special format I need to have my csv in?

 

I plan to have it laid out similar to the code below where each row has the ID, the starting address, and then the destination address. I tried using an infile statement but the output that resulted from using it looks different from the output of the code below. Any help is much appreciated.

data addresses;
length addr1 addr2 $50;
input
@01 id $9.
@11 addr1 $40.
@60 addr2 $40.
;
zip = input(scan(addr1,-1),5.);
set sashelp.zipcode (keep=zip poname statecode) key=zip / unique;
addr1 = catx(' ',catx(',+',scan(addr1,1,','),poname,statecode),zip);
zip = input(scan(addr2,-1),5.);
set sashelp.zipcode (keep=zip poname statecode) key=zip / unique;
addr2 = catx(' ',catx(',+',scan(addr2,1,','),poname,statecode),zip);
keep id addr: ;

datalines;
199580996 726 5th St 95901 2615 Chester Ave 93301
199580996 726 5th St 95901 700 River Drive 95437
199580996 726 5th St 95901 869 North Cherry St 93274
199580996 726 5th St 95901 1141 Rose Avenue 93662
;

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

If the file is CSV then you wouldn't need the column indicators as the comma (or other character) used to separate the values tells SAS where to read:

data addresses;
   infile "<path and filename go here" dlm=',' dsd;  /* if there is a header row in the file 
                                                     add Firstobs=2 (or 3 if two header rows etc*/
   length addr1 addr2 $50;
   input
    id : $9.
    addr1 : $40.
    addr2 : $40.
   ;
   zip = input(scan(addr1,-1),5.);
/* likely better to MERGE in the PONAME after you have read the CSV file*/ /*set sashelp.zipcode (keep=zip poname statecode) key=zip / unique;*/ addr1 = catx(' ',catx(',+',scan(addr1,1,','),poname,statecode),zip); zip = input(scan(addr2,-1),5.); set sashelp.zipcode (keep=zip poname statecode) key=zip / unique; addr2 = catx(' ',catx(',+',scan(addr2,1,','),poname,statecode),zip); keep id addr: ; run;

This assumes by CSV you mean Comma separated values. If you use a different character use it in the DLM= option.

 

When you have column indicators line the @11 it forces SAS to start reading in that column, basically overriding the character serparating values. When you specify an informat on the Input statement without the : modifier it forces SAS to read all the characters. So if the addr1 only has 16 characters like "1234 Main Street" then it would read past the comma and get part of addr2. Use the : informat modifier or a separate INFORMAT or ATTRIB statement to associate the variable name with desired Informat.

 

Caution: If part of your addr1 or addr2 have a comma (or other delimiter character used with DLM= ) then the value needs to be inside quotes "123 Main St, Apt 5", "Anytown, SomeState  zip". The DSD option, delimiter sensitive data, will then not break the values at the delimiter in the middle of the value.

 

SET statement coupled with INPUT for external data can get pretty tricky. I would merge the values after I'm sure that the values are read correcly. Note that if you ZIP is correct then the ZIPSTATE function will return the the two character state abbreviation.

View solution in original post

3 REPLIES 3
ballardw
Super User

If the file is CSV then you wouldn't need the column indicators as the comma (or other character) used to separate the values tells SAS where to read:

data addresses;
   infile "<path and filename go here" dlm=',' dsd;  /* if there is a header row in the file 
                                                     add Firstobs=2 (or 3 if two header rows etc*/
   length addr1 addr2 $50;
   input
    id : $9.
    addr1 : $40.
    addr2 : $40.
   ;
   zip = input(scan(addr1,-1),5.);
/* likely better to MERGE in the PONAME after you have read the CSV file*/ /*set sashelp.zipcode (keep=zip poname statecode) key=zip / unique;*/ addr1 = catx(' ',catx(',+',scan(addr1,1,','),poname,statecode),zip); zip = input(scan(addr2,-1),5.); set sashelp.zipcode (keep=zip poname statecode) key=zip / unique; addr2 = catx(' ',catx(',+',scan(addr2,1,','),poname,statecode),zip); keep id addr: ; run;

This assumes by CSV you mean Comma separated values. If you use a different character use it in the DLM= option.

 

When you have column indicators line the @11 it forces SAS to start reading in that column, basically overriding the character serparating values. When you specify an informat on the Input statement without the : modifier it forces SAS to read all the characters. So if the addr1 only has 16 characters like "1234 Main Street" then it would read past the comma and get part of addr2. Use the : informat modifier or a separate INFORMAT or ATTRIB statement to associate the variable name with desired Informat.

 

Caution: If part of your addr1 or addr2 have a comma (or other delimiter character used with DLM= ) then the value needs to be inside quotes "123 Main St, Apt 5", "Anytown, SomeState  zip". The DSD option, delimiter sensitive data, will then not break the values at the delimiter in the middle of the value.

 

SET statement coupled with INPUT for external data can get pretty tricky. I would merge the values after I'm sure that the values are read correcly. Note that if you ZIP is correct then the ZIPSTATE function will return the the two character state abbreviation.

cleant
Calcite | Level 5

Thank you! Did not know that the ":" inside the input statements had that effect. That really helped me out!

Tom
Super User Tom
Super User

First thing is replace the block of in-line data with a RUN statement to make the end of the data step.

 

Do you have the same fixed position file?  If so then you need to make sure to include the TRUNCOVER option on the INFILE statement or else SAS will go to the next line when you read past the end of a line.

data addresses;
  infile 'myfile.txt' truncover;
  length addr1 addr2 $50;
  input
   @01 id $9.
  @@11 addr1 $40.
  @60 addr2 $40.
  ;
....
run

Or do you actually have a CSV file?  That is a file with commas between the values?  If so then you need to change the INPUT statement also so that it uses LIST MODE input style instead of FORMATTED MODE.  And you will probably need to add the DSD option so that missing values are read properly and quotes added around values with commas in them are properly read.

To keep the LIST MODE informat even with in-line informat specifications in the INPUT statement you can add the colon modifier.

data addresses;
  infile 'myfile.csv' dsd truncover;
  input id :$9. addr1 :$50. addr2 :$50.;
....
run

If the delimiter is some other character add the DLM= option to the INFILE statement. If there is a header row then add the FIRSTOBS=2 option 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 208 views
  • 2 likes
  • 3 in conversation