Hi Guys,
I have a CSV which is pipe delimited.and doesn't include any text identifiers (the individual strings are not surrounded by double inverted commas or anything else for that matter). Unfortunately some of the text strings contain embedded carriage returns that I obviously need to remove in order to read the file in effectively.
I have searched the internet high and low for a solution and attempted to use what I have located without success. The solution with the most potential seemed to be a datastep with both infile and file statements using the sharebuffers option. For some unexplained reason this method simply kills my E.G session (which I know sounds crazy, but that is what happens). The following code is a reasonable representation (although an over simplification) of what how the data would look if I imported the CSV file.
DATA STUFF;
LENGTH TEXT $1000;
INPUT TEXT $;
DATALINES;
VAR1|VAR2|VAR3|VAR4|VAR5|VAR6|VAR7|VAR8|VAR9|VAR10|VAR11|VAR12
1|2|3|4
|5|6|7|8|
9|10|11|12 ;
RUN;
To get around this I thought that I could read the first line of the text file (containing the variable names which never have embedded carriage returns) into a single variable and identify the number of pipes (11). I would then do the same for the remainder of the file and where the number of pipes are less than 11 then append the next record (by lagging the first occurance) to the second occurance and then delete any records that don't have 11 pipes. This works perfectly until you come across an instance where there are multiple carriage returns for a single record.
Can anyone offer a suggestion on how I could continue this process until the number of pipes equals 11?
I am open to other solutions, but this is the best I could come up with.
Any help would be greatly appreciated.
Regards,
Scott
You don't mention the OS you are using. That would be helpful.
I think the TERMSTR INFILE statement option will solve this problem. Try TERMSTR=CRLF.
If that doesn't work be more specific about the offending character (is is '0A'x?) and your OS.
Depending on how messy your data is you might also try the FLOWOVER option on the INFILE statement. This option attempts to continue to read data from the next line when an unexpected end of line occurs.
Hi BallardW,
I did try the Flowover option, but was getting some unexpected results.
Thank you for your suggestion though..
Regards,
Scott
Perhaps you could show an actual copy of the infile and input statements you are using.
My comment about messy data basically meant that if you have empty fields there's likely to be problems keeping the fields synched.
Instead of LAG you might be better off looking at the pointer control input and @ and @@ and parsing data.
It sounds like you have multiple files with different variables from your comment about reading the first line. Will that give you enough information to know whether the variables are text or numeric? Or will you assume all are text and covert any numeric later?
Hi data_null_,
I have attempted to use TERMSTR but with no joy. From the documentation I have read this is only useful when processing a file created in Unix using Windows (and vice versa) as it specifies the end-of-line character for the specified file, which is a CR ('0D'x in this example.
We are using Windows Server 2008 R2 if that helps.
Regards,
Scott
Attache the file or a few records.
Sorry, I can't provide a sample as the data is confidential.
I cannot reproduce your problem with imbeded CR '0D'x in the records. Using SAS 9.2 for winders. I can produce the problem with LF '0A'x and it can be fixed with TERMSTR. If you are using SAS 9.1 I recall that TERMSTR did not work.
1981 filename FT44F001 temp;
1982 data _null_;
1983 file FT44F001;
1984 put 'VAR1|VAR2|VAR3|VAR4|VAR5|VAR6|VAR7|VAR8|VAR9|VAR10|VAR11|VAR12';
1985 put '1|2|3|4' '0a'x '|5|6|7|8|' '0a'x '9|10|11|12';
1986 put '1|2|3|4|5|6|7|8|9|10|11|12';
1987 put '1|2|3|4' '0a'x '|5|6|7|8|' '0a'x '9|10|11|12';
1988 run;
NOTE: The file FT44F001 is:
(system-specific pathname),
(system-specific file attributes)
NOTE: 4 records were written to the file (system-specific pathname).
The minimum record length was 26.
The maximum record length was 62.
NOTE: DATA statement used (Total process time):
real time 0.06 seconds
cpu time 0.01 seconds
1989 data _null_;
1990 infile FT44F001;
1991 input;
1992 list;
1993 run;
NOTE: The infile FT44F001 is:
(system-specific pathname),
(system-specific file attributes)
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1 VAR1|VAR2|VAR3|VAR4|VAR5|VAR6|VAR7|VAR8|VAR9|VAR10|VAR11|VAR12 62
2 1|2|3|4 7
3 |5|6|7|8| 9
4 9|10|11|12 10
5 1|2|3|4|5|6|7|8|9|10|11|12 26
6 1|2|3|4 7
7 |5|6|7|8| 9
8 9|10|11|12 10
NOTE: 8 records were read from the infile (system-specific pathname).
The minimum record length was 7.
The maximum record length was 62.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
1994
1995 data _null_;
1996 infile FT44F001 termstr=crlf;
1997 input;
1998 list;
1999 run;
NOTE: The infile FT44F001 is:
(system-specific pathname),
(system-specific file attributes)
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1 VAR1|VAR2|VAR3|VAR4|VAR5|VAR6|VAR7|VAR8|VAR9|VAR10|VAR11|VAR12 62
2 CHAR 1|2|3|4.|5|6|7|8|.9|10|11|12 28
ZONE 3737373073737373703733733733
NUMR 1C2C3C4AC5C6C7C8CA9C10C11C12
3 1|2|3|4|5|6|7|8|9|10|11|12 26
4 CHAR 1|2|3|4.|5|6|7|8|.9|10|11|12 28
ZONE 3737373073737373703733733733
NUMR 1C2C3C4AC5C6C7C8CA9C10C11C12
NOTE: 4 records were read from the infile (system-specific pathname).
The minimum record length was 26.
The maximum record length was 62.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.01 seconds
Or Can you make some dummy data to appear your problem again.
And Best is posting the output you want too.
This may be helpful:
https:/communities.sas.com/thread/34015
Oleg
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.