Beginner: How to write input variables for these datalines?

Reply
Contributor
Posts: 32

Beginner: How to write input variables for these datalines?

[ Edited ]

Hello, 

This table below is part of a bigger csv file that I am working with. Some of the observations are in the wrong variable because I believe they lacked proper commas in the original file. How would I manually write input lines for this table? In my previous posts I thought it was interchangeable with an excel file but now I understand the importance of it being a csv file since I guess it gives you more freedom to work with formatting. Below is the table and some of the code I have started off with. Thank you!

 

 

Data test;
infile "/folders/myshortcuts/myfolder/list.csv" dsd delimiter=none;
input Street City State_ZIP Country;
run;
proc print data=test;
run;

 

Super User
Posts: 24,018

Re: Beginner: How to write input variables for these datalines?

Please some some lines directly from the CSV file. Open it in NotePad and then pasted it into the browser using the 6/7th icon in the Rich Text Editor.  This ensures the data is passed as characters. Excel has already interpreted your data and messed it up so any code provided based on the shown data will likely be incorrect.

 

 

 

 

Contributor
Posts: 32

Re: Beginner: How to write input variables for these datalines?

I think reading input from right to left will avoid the issue of length. So the last 3 would be country, last 5 numbers and 2 characters would be state and zip and so on... I'm not exactly sure how to read that though.

Super User
Posts: 13,941

Re: Beginner: How to write input variables for these datalines?


@Ramsha wrote:

Hello, 

This table below is part of a bigger csv file that I am working with. Some of the observations are in the wrong variable because I believe they lacked proper commas in the original file. How would I manually write input lines for this table? In my previous posts I thought it was interchangeable with an excel file but now I understand the importance of it being a csv file since I guess it gives you more freedom to work with formatting. Below is the table and some of the code I have started off with. Thank you!

 

 

Street City State/ZIP Country
15 York Street New Haven CT 06511 USA
301-399 South Boulevard Drive Bainbridge GA 39819 USA
150-151 Tremont Street Boston MA 02111 USA
2395 Ingleside Avenue Macon GA 31204 USA
1007 Merchant Street Ambridge PA 15003 USA
859 Washington Avenue Miami Beach FL 33139 USA
974 Great Plain Avenue Needham MA 02492 USA    
139 Lynnfield Street Peabody MA 01960 USA
180 Nassau Street Princeton NJ 08542 USA  
563 Carlsbad Village Drive Carlsbad CA 92008 USA

 

 

Data test;
infile "/folders/myshortcuts/myfolder/list.csv" dsd delimiter=none;
input Street City State_ZIP Country;
run;
proc print data=test;
run;

 


If values to the right of the country field are in the correct positions then there is nothing wrong with the CSV. It means the data was originally collected or entered incorrectly into the wrong field. Go ahead, ask me how I know Smiley Sad

How many records are we talking about? If the number is "low" it may just be easier to put the commas in the CSV then to try to write code.

 

This type of work I typically charge $100 per hour since it involves fixing someone's data entry mistakes. Code is not going to be easy. As evidenced by not having a clean solution posted already to the previous messages related to this question.

Contributor
Posts: 32

Re: Beginner: How to write input variables for these datalines?

[ Edited ]
 
Super User
Posts: 24,018

Re: Beginner: How to write input variables for these datalines?

Reading backwards, the 4th record from the bottom would still be wrong. 

You have bad data and need to clean it up manually. Changing to text or from PROC IMPORT won’t fix it.

 

I think you should read it all in and see how many records are badly formatted and if it’s worth the time to fix it. Often losing a few records is not problematic.  If it is, you should read it twice, once for the full line as a variable and second reading it correctly. This will give you the ability to parse it out if needed from the full record. 

 

Unfortunately because of the data structure it’s hard to assume anything and you’ll likely have other variations of errors as well. 

 


@Ramsha wrote:
Street,City,State/ZIP,Country
288 York Street,New Haven,CT 06511,USA
301-399 South Boulevard Drive,Bainbridge,GA 39819,USA
150-151 Tremont Street,Boston,MA 02111,USA
2395 Ingleside Avenue,Macon,GA 31204,USA
1007 Merchant Street,Ambridge,PA 15003,USA
859 Washington Avenue,Miami Beach,FL 33139,USA
974 Great Plain Avenue Needham MA 02492,USA, ,
139 Lynnfield Street,Peabody,MA 01960,USA
180 Nassau Street Princeton,NJ 08542,USA,
563 Carlsbad Village Drive,Carlsbad,CA 92008,USA

Isn't this what the data looks like originally from an imported csv file? I thought if you read from right to left and ignore the commas it might be easily configured. All I really need to read correctly is the country and state/zip 


 

Super User
Posts: 13,941

Re: Beginner: How to write input variables for these datalines?


@Ramsha wrote:
Street,City,State/ZIP,Country
288 York Street,New Haven,CT 06511,USA
301-399 South Boulevard Drive,Bainbridge,GA 39819,USA
150-151 Tremont Street,Boston,MA 02111,USA
2395 Ingleside Avenue,Macon,GA 31204,USA
1007 Merchant Street,Ambridge,PA 15003,USA
859 Washington Avenue,Miami Beach,FL 33139,USA
974 Great Plain Avenue Needham MA 02492,USA, ,
139 Lynnfield Street,Peabody,MA 01960,USA
180 Nassau Street Princeton,NJ 08542,USA,
563 Carlsbad Village Drive,Carlsbad,CA 92008,USA

Isn't this what the data looks like originally from an imported csv file? I thought if you read from right to left and ignore the commas it might be easily configured. All I really need to read correctly is the country and state/zip 


Previously you have not indicated that you only needed those 3 elements but wanted to correct the whole thing. Note that state/zip is not a valid SAS variable name so I'm guessing at a variable name for it. This assumes that your state is only ever 2 characters, zip is at most 10, that the country, zip and state are single "words": no embedded spaces such as might occur in New Zealand. The length of country is going to be inherited from the import/ read step. If state might be longer you need to increase the length in the length statement. If the sum of lengths of your current 4 variables exceeds 200 characters make sure that the length for STR is that plus at least 3 more to account for inserted spaces in the catx.

data want;
   set have;
   length str $ 200 state $ 2 zip $ 10;
   str= compbl ( catx(' ', street, city, statezip,country) );
   country=scan(str,-1,' ,');
   zip = scan(str,-2,' ,');
   state= scan(str,-3),' ,');
run;

if you want a combined state zip field (bad idea in general) then statezip= catx(' ', state, zip);

 

 

I am not even going to attempt to right any INPUT statements when to select a some values is easier than to fix garbage. Note that this does leave the entire "street" as is.

Contributor
Posts: 32

Re: Beginner: How to write input variables for these datalines?

This actually helps alot thank you! The code is not running but I think I have more of a general idea for direction! 

Super User
Posts: 13,941

Re: Beginner: How to write input variables for these datalines?


@Ramsha wrote:

This actually helps alot thank you! The code is not running but I think I have more of a general idea for direction! 


"Not running" is awful vague.

Are there errors in the log?: Post the code and log in a code box opened with the {i} to maintain formatting of error messages.

No output? Post any log in a code box.

Unexpected output? Provide input data in the form of data step code pasted into a code box, the actual results and the expected results. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

Contributor
Posts: 32

Re: Beginner: How to write input variables for these datalines?

Also maybe using $varying informat?

Ask a Question
Discussion stats
  • 9 replies
  • 85 views
  • 3 likes
  • 3 in conversation