06-22-2016 06:11 PM
I have data from a national survey database that I received in a txt file. The data is from written surveys that were electronically scanned in so it is a bit messy. The variables are separated by the delimiter |. Unfortunately, there are several string/free text variables throughout the datafile where the delimiter | mistakenly appears in place of similar-looking characters such as / and lower-case L; therefore, when I read the data in SAS breaks these variables into multiple variables. Besides using DSD, which doesn't work in this case, how can I tell SAS to ignore these erroneous delimiters when reading in my data? The preceding and following values are not consistent across cases so there is nothing to anchor an @character pointer or a find-and-replace command to. Some more information in case it helps is that the string variables which pose a problem are varying lengths, and the data breaks across cases so I must use FLOWOVER.
Here is an example of what my data looks like:
Var 1 Var 2 Var 3 Var 4
12345|123 App|e Tree Lane|20051231|E1
67981|5th grade|6th grade|20091231|F2
06-22-2016 06:20 PM
Go back to your tool, and see if you have the option of specifying a different delimiter and/or creating quotes around variables that are text.
There may be ways but it definitely won't be clean or easy.
06-22-2016 06:23 PM
Thank you for your response! Do you mean the tool that scanned in the surveys originally? If so, I do not have that option. I received the txt files from another organization and have no way of re-requesting the data in a different format. Any other suggestions are appreciated!
06-23-2016 04:37 AM
If you cannot go back and fix the Problem, then everything you do from there on in will be guesswork and hence at risk of being wrong. Sure, we can post code with complicated forumla which try to calculate position of text and where to read from, but unless these cover every possible scenario you still run the risk of being wrong. Simply put, you cannot have a delimited file with data which contains delimters and is not quoted. If you export from Excel to CSV for example and a field contains the delimter then the data element has " " surrounding the data to indicate start/end blocks. The application that created this data must have something like that.
06-22-2016 10:33 PM
If there is only messed up delimiter in VAR2 , Try this one :
data have; input x $80.; var1=scan(x,1,'|'); call scan(x,1,p1,l1,'|'); call scan(x,-2,p2,l2,'|'); var2=substr(x,p1+l1+1,p2-p1-l1-2); var3=scan(x,-2,'|'); var4=scan(x,-1,'|'); drop p1 p2 l1 l2; cards; 12345|123 App|e Tree Lane|20051231|E1 67981|5th grade|6th grade|20091231|F2 ; run;