Happy New Year, Folks!
Say we have the following 'various text lines' in a single column:
a
b
c
d
e
f
g
h
I need to get the long column (10,000 such lines as above) into the following form:
a b c d
e f g h
Always four lines. Lines 2-4 need to shift up to go beside line 1.
I may end up trying to get this done in some text-based program. Perhaps somehow using RegEx. But wondering if SAS can do it.
Thanks much!
Nicholas Kormanik
It is IML things, if you have it .
data have; input have $ ; cards; a b c d e f g h ; proc iml; use have; read all var {have}; close; want=shape(have,0,4); print want; create want from want; append from want; close; quit;
It is IML things, if you have it .
data have; input have $ ; cards; a b c d e f g h ; proc iml; use have; read all var {have}; close; want=shape(have,0,4); print want; create want from want; append from want; close; quit;
Well, I didn't need to go past the very first suggestion, that of @Ksharp . As always, thank YOU so much. And thanks to all who so quickly contributed... on New Year's Day!
For posterity, here's my code. Any improvements, please advise:
data sas_1.parameters_long;
infile "C:\2\parameters (long).txt" truncover ;
input Text_Lines $200.;
run;
proc iml;
use sas_1.parameters_long;
read all var {Text_Lines};
close;
want=shape(Text_Lines,0,4);
* print sas_1.parameters_long;
create sas_1.parameters_long_4_across from want;
append from want;
close;
quit;
Seems like an enormous task, so easily accomplished. Unreal.
Well, I'm back already.
I tried with another file, and received an error message. I thought I made all the appropriate changes. "long" --> "short". Everything else remains exactly the same. What am I missing??
10 proc iml;
NOTE: IML Ready
11 use sas_1.parameters_short;
12 read all var {Text_Lines};
13 close;
NOTE: Closing SAS_1.PARAMETERS_SHORT
14
15 want=shape(Text_Lines,0,4);
ERROR: (execution) Invalid operand to operation.
operation : SHAPE at line 15 column 11
operands : Text_Lines, *LIT1002, *LIT1003
Text_Lines 1.84E7 rows 1 col (character, size 200)
*LIT1002 1 row 1 col (numeric)
0
*LIT1003 1 row 1 col (numeric)
4
statement : ASSIGN at line 15 column 1
16
17 * print sas_1.parameters_short;
18
19 create sas_1.parameters_short_4_across from want;
ERROR: Matrix want has not been set to a value.
statement : CREATE at line 19 column 1
20 append from want;
ERROR: No data set is currently open for output.
statement : APPEND at line 20 column 1
21 close;
22 quit;
NOTE: Exiting IML.
NOTE: The SAS System stopped processing this step because of errors.
Try this
data have;
input have $ ;
cards;
a
b
c
d
e
f
g
h
;
data want;
set have;
array v{*} $ v1 - v4;
m = mod(_N_ - 1, 4) + 1;
v[m] = have;
if m = 4;
retain v:;
keep v:;
run;
data one two three four;
set have;
if mod(_n_,4)=1 then output one;
else if mod(_n_,4)=2 then output two;
else if mod(_n_,4)=3 then output three;
else if mod(_n_,4)=0 then output four;
run;
data want;
merge one(rename=(text=text1)) two(rename=(text=text2))
three(rename=(text=text3)) four(rename=(text=text4));
run;
As I always do with others, @NKormanik, I feel compelled to question why a wide data set is better than a long data set. I guess it depends on what the next analysis is, which you don't say, but for many situations this re-arranging of your data set only makes the next steps harder.
@NKormanik wrote:
Paige, I'll answer you first, even though I haven't yet tried any of the offered solutions.
After getting the 'alpha-numeric text lines' into the proper arrangement, I'll have to edit out various extraneous 'strings' (such as "or Missing", "*", etc.
Then I'll import the whole thing into a SAS dataset, each then-existing string a variable, delimited by spaces.
My question when I see such specific words described as "extraneous" is why they were read into the data set in the first place? A custom informat can set specific values when read as missing. I have several of those because of people that create data with such text in it.
And I am really unsure why grouping the values by 4 first is any improvement before attempting such. Especially since the "Then I'll import" seems to imply a "write this mangled data out to a text file and then read it back into SAS". To run any approach to manipulate the data into that shape from SAS means you have a SAS data set. The out and in approach sounds like added work for little actual gain.
Perhaps a more complete example of your data set with sensitive values replaced with random values and what you would expect from it.
Thanks @ballardw Paige. I appreciate your comments. All well warranted, in my opinion. (I could see your eyes rolling, justifiably.)
What I'm importing is actually output from another SAS procedure. That output includes the 'extraneous' characters/strings. There may be a more direct way of capturing said output, possibly directly to a SAS dataset. But I didn't surmise a way, as of yet.
I first wanted to get the lot into the four-across format to see if my idea was on track. But, yes, as you point out, if the idea works, then prior to the four-across conversion would be the best time to edit out extraneous strings.
@NKormanik wrote:
Thanks @ballardw Paige. I appreciate your comments. All well warranted, in my opinion. (I could see your eyes rolling, justifiably.)
What I'm importing is actually output from another SAS procedure. That output includes the 'extraneous' characters/strings. There may be a more direct way of capturing said output, possibly directly to a SAS dataset. But I didn't surmise a way, as of yet.
I first wanted to get the lot into the four-across format to see if my idea was on track. But, yes, as you point out, if the idea works, then prior to the four-across conversion would be the best time to edit out extraneous strings.
It might be a good idea to share the code for the procedure generating the output. There is almost always a way to get the output into a data set directly once we know what it is. Even Procs Report and Tabulate generate data sets though typically a tad cumbersome to work with.
data have;
input col $ ;
cards;
a
b
c
d
e
f
g
h
;
run;
data want;
array var[4] $ ;
do x = 1 by 1 until( done | x > 3);
set have end=done;
var[x] = col;
end;
output;
drop x col;
run;
As others have shown, read them in as 4 per row. I use the / to explicitly indicate that you're reading from a new line. You can implicitly use the MISSOVER option as well.
data have;
input var1 $
/var2 $
/var3 $
/var4 $;
cards;
a
b
c
d
e
f
g
h
;;;;
run;
I think @Astounding @r_behata have already given you good solution .
If you really want IML code. try this :
data have;
input have $ ;
cards;
a
b
c
d
e
f
g
h
i
;
proc iml;
use have;
read all var {have};
close;
want=shape(have,0,4,blankstr(nleng(have)));
print want;
create want from want;
append from want;
close;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.