Following suggestions from yesterday's question, we have converted a single long column of text to four text strings across -- a text string in each of four columns, 1000 rows of such. In image below, 'a' is a text string, etc.
The actual context is more the following:
The next step is to separate out the individual strings -- composed of numbers and characters. Each and every string has the same components. The components separated by a single space.
After the conversion, based on the above image, there will be 3x4=12 variables. 'a' and 'm' will make up the first two observations in variable, say, COL1.1. 'b' and 'n' --> COL1.2. . . . 'l' and 'x' --> COL4.3.
So, the question to the community is how to code this transition?
Before the four-across conversion, it was a simple matter to import the single column of text strings, from a text file:
data sas_1.importance_long;
infile "C:\2\Importance, Long.txt";
input
Indicator $10. Relative Importance Count NSurrog;
run;
In the present case I'm hoping not to have to send the entire lot out to a text file, just to have to then import it back into SAS. Although I'm completely amenable to doing that.
Any thoughts appreciated!!
Nicholas Kormanik
You want this ?
data have;
infile cards dlm=',';
input (col1-col4) ($);
cards;
a b c,d e f,g h i,j k l
m n o,p q r,s t u,v w x
;
proc iml;
use have;
read all var _char_ into have;
close;
want=cshape(compress(have),0,12,1,' ');
print want;
create want from want;
append from want;
close;
quit;
Let me paraphrase classic: "Use the array Nicholas":
data have;
col1 = "A B C";
run;
%let size = 3;
%let length = 12;
data want;
set have;
array col1_[&size.] $ &length.;
do _N_ = 1 to &size.;
col1_[_N_] = scan(col1, _N_, " ");
end;
run;
proc print;
run;
Bart
You want this ?
data have;
infile cards dlm=',';
input (col1-col4) ($);
cards;
a b c,d e f,g h i,j k l
m n o,p q r,s t u,v w x
;
proc iml;
use have;
read all var _char_ into have;
close;
want=cshape(compress(have),0,12,1,' ');
print want;
create want from want;
append from want;
close;
quit;
Why did you convert the original into 4 instead of 12?
The original 4 I didn't convert, but shifted over and up.
After the shifting, I needed to convert the 12 to variables.
Original SAS output from HPSplit (more to the far right):
ID Path Count Success Fail
0 Root Node 3900 0.8244 * 0.1756
1 Root Node 3900 0.8244 0.1756
i_22304_Z < 0.621804 or Missing 2948 0.8457 * 0.1543
2 Root Node 3900 0.8244 0.1756
i_22304_Z >= 0.621804 952 0.7584 * 0.2416
3 Root Node 3900 0.8244 0.1756
i_22304_Z < 0.621804 or Missing 2948 0.8457 0.1543
i_21603_Z < -1.27055 496 0.7601 * 0.2399
4 Root Node 3900 0.8244 0.1756
i_22304_Z < 0.621804 or Missing 2948 0.8457 0.1543
i_21603_Z >= -1.27055 or Missing 2452 0.8630 * 0.1370
5 Root Node 3900 0.8244 0.1756
i_22304_Z >= 0.621804 952 0.7584 0.2416
i_21104_Z < 0.385132 or Missing 522 0.8218 * 0.1782
6 Root Node 3900 0.8244 0.1756
i_22304_Z >= 0.621804 952 0.7584 0.2416
i_21104_Z >= 0.385132 430 0.6814 * 0.3186
After successfully editing out extra unwanted 'strings', and shifting:
Then four-across:
At this point, while feeling I'm on track, I need to interpret what it actually means.
Here is an example of a good split (graph produced by HPSplit):
On the right the number 0.8563 represents 'Success', based on variable i_22801, parameter being >= -2.379. The success rate can be further increased by additionally using variable i_21501a, with parameter value >= 0.566. Percentage success in that branch rises to 89.19%.
The challenge is to interpret, wrap it all up, and come to some conclusion as to the best course of action.
Are you saying you are trying to parse the listing output of the proc? Doesn't it produce any actual datasets you can use? Did you try running with ODS TRACE on to at least see if there might be an ODS output you could capture as a dataset?
Thanks @Tom for that suggestion. Haven't yet run ODS Trace. Good idea.
Well, for the moment, I went about it the long way, taking the Proc HPSplit listing output, and converting it to a seemingly usable dataset.
The key now is to try to logically consider what exactly I have. Not really a programming conundrum. A tricky puzzle.
Thanks!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.