Solved: Re: Exporting Delimited file specifications

jbscholten · Posted 02-10-2017 04:57 PM

Hello,

I have a dataset in SAS that I am exporting to a delmited file (not a CSV). Is it possible to specify which lines to delimit? There are some lines which I do not want to delimit, as the file is being read incorrectly by another program and the delimiter (a space) may cause the files to be read in wrong by the other program.

Here's an example of the dataset:

Indicator	Var1	Var2	Var3	Var4	Var5	Var6	Var7	Var8	Var9
0001
1116	1	2	3	4	5	6	7	8	9
1116	1	2	3	4	5	6	7	8	9
1216	1	2	3	4	5	6	7	8	9
0117	1	2	3	4	5	6	7	8	9
0217	1	2	3	4	5	6	7	8	9
0317	1	2	3	4	5	6	7	8	9
END
0002
1116	1	2	3	4	5	6	7	8	9
1216	1	2	3	4	5	6	7	8	9
0117	1	2	3	4	5	6	7	8	9
0217	1	2	3	4	5	6	7	8	9
0317	1	2	3	4	5	6	7	8	9
END
0003
1116	1	2	3	4	5	6	7	8	9
1216	1	2	3	4	5	6	7	8	9
0117	1	2	3	4	5	6	7	8	9
0217	1	2	3	4	5	6	7	8	9
0317	1	2	3	4	5	6	7	8	9

The 'END' and '0001, 0002, 0003, etc' specify the end of a grouping and the start of the next grouping. '0001, 0002, etc' ranges from 1-1001, but lets pretend these increase infinitely. the "1116, 1116, 1216, 0117, 0217, 0317" are dates (mmyy) and run up to 1166, but this could also change. When exporting to a txt file, I need the spaces that are being inserted as delimiters between var1-var9 to NOT be placed in the observations that have "end" or grouping number present.

Any help would be greatly appreciated.

Thanks

-Jbscholten

ballardw · Posted 02-10-2017 05:23 PM

<Rant>

First, having done a fair amount of work around Y2K issues once upon a time I would suggest to whoever is using that file structure that they have a significant problem with 2 digit years. Coupled with your 0001 = Jan 2000 possibly confusion this is heading for a significant failure at some point.

</Rant>

Because of the aforementioned 0001 issue you really cannot rely on the value.

You also did not specify what the delimiter should be. And do you need INDENTS or is that an artifact or what ever you are copying from?

Add a FILE statement to data _null_ below, set the length for the STR variable, indicate the delimiter and this may work if indents are needed.

data have;
   infile datalines truncover;
   input Indicator $ Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 ;
datalines;
0001                   
1116 1 2 3 4 5 6 7 8 9 
1116 1 2 3 4 5 6 7 8 9 
1216 1 2 3 4 5 6 7 8 9 
0117 1 2 3 4 5 6 7 8 9 
0217 1 2 3 4 5 6 7 8 9 
0317 1 2 3 4 5 6 7 8 9 
END                   
0002                   
1116 1 2 3 4 5 6 7 8 9 
1216 1 2 3 4 5 6 7 8 9 
0117 1 2 3 4 5 6 7 8 9 
0217 1 2 3 4 5 6 7 8 9 
0317 1 2 3 4 5 6 7 8 9 
END                   
0003                   
1116 1 2 3 4 5 6 7 8 9 
1216 1 2 3 4 5 6 7 8 9 
0117 1 2 3 4 5 6 7 8 9 
0217 1 2 3 4 5 6 7 8 9 
0317 1 2 3 4 5 6 7 8 9 
END
;
run;

data _null_;
   set have;
   length str $100; /* long enough to hold all of the variables plus the delimiters*/
   if _n_=1 or Indicator='END' or lag(indicator)='END' then put indicator;
   else do;
      str= catx('|',indicator,of var:);
      put str;
   end;
run;

View solution in original post

Reeza · Posted 02-10-2017 05:12 PM

You need to use a manual data step, and then you have full control of the file with the PUT statements.

BY groups will allow you to identify the groups you need.

Post sample input data, I'm assuming this is output, and sample output that align. You can include only 3 or 4 variables to make it easier. Then we can mock up some code to show you what htis may look like.

Untested sketch of what this may look like:

data _null_;
file 'path to output file';

set dataset;
by group;

if _n_ = 1 then do;
put 'indicator' 'var1' 'var2' 'var3' 'var4' ... 'var9';
end;

if first.group then do;
put group;
end;

put var1-var9;

if last.group then do;
put 'END';
end;

run;

ballardw · Posted 02-10-2017 05:23 PM

<Rant>

First, having done a fair amount of work around Y2K issues once upon a time I would suggest to whoever is using that file structure that they have a significant problem with 2 digit years. Coupled with your 0001 = Jan 2000 possibly confusion this is heading for a significant failure at some point.

</Rant>

Because of the aforementioned 0001 issue you really cannot rely on the value.

You also did not specify what the delimiter should be. And do you need INDENTS or is that an artifact or what ever you are copying from?

Add a FILE statement to data _null_ below, set the length for the STR variable, indicate the delimiter and this may work if indents are needed.

data have;
   infile datalines truncover;
   input Indicator $ Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 ;
datalines;
0001                   
1116 1 2 3 4 5 6 7 8 9 
1116 1 2 3 4 5 6 7 8 9 
1216 1 2 3 4 5 6 7 8 9 
0117 1 2 3 4 5 6 7 8 9 
0217 1 2 3 4 5 6 7 8 9 
0317 1 2 3 4 5 6 7 8 9 
END                   
0002                   
1116 1 2 3 4 5 6 7 8 9 
1216 1 2 3 4 5 6 7 8 9 
0117 1 2 3 4 5 6 7 8 9 
0217 1 2 3 4 5 6 7 8 9 
0317 1 2 3 4 5 6 7 8 9 
END                   
0003                   
1116 1 2 3 4 5 6 7 8 9 
1216 1 2 3 4 5 6 7 8 9 
0117 1 2 3 4 5 6 7 8 9 
0217 1 2 3 4 5 6 7 8 9 
0317 1 2 3 4 5 6 7 8 9 
END
;
run;

data _null_;
   set have;
   length str $100; /* long enough to hold all of the variables plus the delimiters*/
   if _n_=1 or Indicator='END' or lag(indicator)='END' then put indicator;
   else do;
      str= catx('|',indicator,of var:);
      put str;
   end;
run;

jbscholten · Posted 02-13-2017 08:27 AM

This seems to work; however, in my data set all of the values in var1-9 are in the format 12.34. In my previous code (with the delimiter column), I specified var1-9 as z5.2 to keep leading and trailing zeros, so that 1.2 is written as 01.20 in the export file.

How can I keep this when using the data _null_ statement?

jbscholten · Posted 02-13-2017 08:50 AM

Also, thanks for the note about y2k. The groupings (0001 ...) are being read in by another program which identifies these as sets. I do see the problem though.

jbscholten · Posted 02-16-2017 10:04 AM

Used this code to keep 0's and only delimit appropriate lines:

Set work.clearbreak;
Length str $100; /* long enough to hold all of the variables plus the delimiters*/
If _n_=1 or Indicator='END' or lag(indicator)='END' then put indicator;
Else do;
str = catx(' ',indicator, vvalue(VAR4), vvalue(VAR5), vvalue(VAR6), vvalue(VAR7), vvalue(VAR8), vvalue(VAR9), vvalue(VAR10), vvalue(VAR11), vvalue(VAR12));
Put str;
End;
Run;

Ksharp · Posted 02-12-2017 08:58 AM

ODS CSVALL is good for this scenario.


ods csvall file='/folders/myfolders/xx.csv';
options nobyline;
title '#byval1';
footnote 'END';
proc report data=sashelp.class noheader;
by sex notsorted;
run;
ods csvall close;

Registration is open