Hi,
The code below sets up data for measuring the driving distances between two addresses. It currently works using datalines but filling in those datalines can be onerous. How would I go about updating this code so that I can import a csv that creates the same result? Is there any special format I need to have my csv in?
I plan to have it laid out similar to the code below where each row has the ID, the starting address, and then the destination address. I tried using an infile statement but the output that resulted from using it looks different from the output of the code below. Any help is much appreciated.
data addresses;
length addr1 addr2 $50;
input
@01 id $9.
@11 addr1 $40.
@60 addr2 $40.
;
zip = input(scan(addr1,-1),5.);
set sashelp.zipcode (keep=zip poname statecode) key=zip / unique;
addr1 = catx(' ',catx(',+',scan(addr1,1,','),poname,statecode),zip);
zip = input(scan(addr2,-1),5.);
set sashelp.zipcode (keep=zip poname statecode) key=zip / unique;
addr2 = catx(' ',catx(',+',scan(addr2,1,','),poname,statecode),zip);
keep id addr: ;
datalines;
199580996 726 5th St 95901 2615 Chester Ave 93301
199580996 726 5th St 95901 700 River Drive 95437
199580996 726 5th St 95901 869 North Cherry St 93274
199580996 726 5th St 95901 1141 Rose Avenue 93662
;
If the file is CSV then you wouldn't need the column indicators as the comma (or other character) used to separate the values tells SAS where to read:
data addresses; infile "<path and filename go here" dlm=',' dsd; /* if there is a header row in the file add Firstobs=2 (or 3 if two header rows etc*/ length addr1 addr2 $50; input id : $9. addr1 : $40. addr2 : $40. ; zip = input(scan(addr1,-1),5.);
/* likely better to MERGE in the PONAME after you have read the CSV file*/ /*set sashelp.zipcode (keep=zip poname statecode) key=zip / unique;*/ addr1 = catx(' ',catx(',+',scan(addr1,1,','),poname,statecode),zip); zip = input(scan(addr2,-1),5.); set sashelp.zipcode (keep=zip poname statecode) key=zip / unique; addr2 = catx(' ',catx(',+',scan(addr2,1,','),poname,statecode),zip); keep id addr: ; run;
This assumes by CSV you mean Comma separated values. If you use a different character use it in the DLM= option.
When you have column indicators line the @11 it forces SAS to start reading in that column, basically overriding the character serparating values. When you specify an informat on the Input statement without the : modifier it forces SAS to read all the characters. So if the addr1 only has 16 characters like "1234 Main Street" then it would read past the comma and get part of addr2. Use the : informat modifier or a separate INFORMAT or ATTRIB statement to associate the variable name with desired Informat.
Caution: If part of your addr1 or addr2 have a comma (or other delimiter character used with DLM= ) then the value needs to be inside quotes "123 Main St, Apt 5", "Anytown, SomeState zip". The DSD option, delimiter sensitive data, will then not break the values at the delimiter in the middle of the value.
SET statement coupled with INPUT for external data can get pretty tricky. I would merge the values after I'm sure that the values are read correcly. Note that if you ZIP is correct then the ZIPSTATE function will return the the two character state abbreviation.
If the file is CSV then you wouldn't need the column indicators as the comma (or other character) used to separate the values tells SAS where to read:
data addresses; infile "<path and filename go here" dlm=',' dsd; /* if there is a header row in the file add Firstobs=2 (or 3 if two header rows etc*/ length addr1 addr2 $50; input id : $9. addr1 : $40. addr2 : $40. ; zip = input(scan(addr1,-1),5.);
/* likely better to MERGE in the PONAME after you have read the CSV file*/ /*set sashelp.zipcode (keep=zip poname statecode) key=zip / unique;*/ addr1 = catx(' ',catx(',+',scan(addr1,1,','),poname,statecode),zip); zip = input(scan(addr2,-1),5.); set sashelp.zipcode (keep=zip poname statecode) key=zip / unique; addr2 = catx(' ',catx(',+',scan(addr2,1,','),poname,statecode),zip); keep id addr: ; run;
This assumes by CSV you mean Comma separated values. If you use a different character use it in the DLM= option.
When you have column indicators line the @11 it forces SAS to start reading in that column, basically overriding the character serparating values. When you specify an informat on the Input statement without the : modifier it forces SAS to read all the characters. So if the addr1 only has 16 characters like "1234 Main Street" then it would read past the comma and get part of addr2. Use the : informat modifier or a separate INFORMAT or ATTRIB statement to associate the variable name with desired Informat.
Caution: If part of your addr1 or addr2 have a comma (or other delimiter character used with DLM= ) then the value needs to be inside quotes "123 Main St, Apt 5", "Anytown, SomeState zip". The DSD option, delimiter sensitive data, will then not break the values at the delimiter in the middle of the value.
SET statement coupled with INPUT for external data can get pretty tricky. I would merge the values after I'm sure that the values are read correcly. Note that if you ZIP is correct then the ZIPSTATE function will return the the two character state abbreviation.
Thank you! Did not know that the ":" inside the input statements had that effect. That really helped me out!
First thing is replace the block of in-line data with a RUN statement to make the end of the data step.
Do you have the same fixed position file? If so then you need to make sure to include the TRUNCOVER option on the INFILE statement or else SAS will go to the next line when you read past the end of a line.
data addresses;
infile 'myfile.txt' truncover;
length addr1 addr2 $50;
input
@01 id $9.
@@11 addr1 $40.
@60 addr2 $40.
;
....
run
Or do you actually have a CSV file? That is a file with commas between the values? If so then you need to change the INPUT statement also so that it uses LIST MODE input style instead of FORMATTED MODE. And you will probably need to add the DSD option so that missing values are read properly and quotes added around values with commas in them are properly read.
To keep the LIST MODE informat even with in-line informat specifications in the INPUT statement you can add the colon modifier.
data addresses;
infile 'myfile.csv' dsd truncover;
input id :$9. addr1 :$50. addr2 :$50.;
....
run
If the delimiter is some other character add the DLM= option to the INFILE statement. If there is a header row then add the FIRSTOBS=2 option
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.