BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NKormanik
Barite | Level 11

Following suggestions from yesterday's question, we have converted a single long column of text to four text strings across -- a text string in each of four columns, 1000 rows of such.  In image below, 'a' is a text string, etc.

4 across 1.png

The actual context is more the following:

4 across 2.png

The next step is to separate out the individual strings -- composed of numbers and characters.  Each and every string has the same components.  The components separated by a single space.

 

After the conversion, based on the above image, there will be 3x4=12 variables. 'a' and 'm' will make up the first two observations in variable, say, COL1.1.  'b' and 'n' --> COL1.2.   . . .   'l' and 'x' --> COL4.3.

 

So, the question to the community is how to code this transition?

 

Before the four-across conversion, it was a simple matter to import the single column of text strings, from a text file:

data sas_1.importance_long;
infile "C:\2\Importance, Long.txt";
input
Indicator $10. Relative Importance Count NSurrog;
run;

In the present case I'm hoping not to have to send the entire lot out to a text file, just to have to then import it back into SAS.  Although I'm completely amenable to doing that.

 

Any thoughts appreciated!!

 

Nicholas Kormanik

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

You want this ?

 

data have;
infile cards dlm=',';
input (col1-col4) ($);
cards;
a b c,d e f,g h i,j k l
m n o,p q r,s t u,v w x
;


proc iml;
use have;
read all var _char_ into have;
close;

want=cshape(compress(have),0,12,1,' ');
print want;

create want from want;
append from want;
close;
quit;

View solution in original post

7 REPLIES 7
yabwon
Onyx | Level 15

Let me paraphrase classic: "Use the array Nicholas":

data have;
  col1 = "A B C";
run;

%let size = 3;
%let length = 12;

data want;
  set have;

  array col1_[&size.] $ &length.;

  do _N_ = 1 to &size.;
    col1_[_N_]  = scan(col1, _N_, " ");
  end;
run;
proc print;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Ksharp
Super User

You want this ?

 

data have;
infile cards dlm=',';
input (col1-col4) ($);
cards;
a b c,d e f,g h i,j k l
m n o,p q r,s t u,v w x
;


proc iml;
use have;
read all var _char_ into have;
close;

want=cshape(compress(have),0,12,1,' ');
print want;

create want from want;
append from want;
close;
quit;
NKormanik
Barite | Level 11

@Ksharp , @yabwon , You both seem to be from a different planet.  Certainly NOT the Earth that I'm aware of.

 

Thanks a million for your help!

 

Tom
Super User Tom
Super User

Why did you convert the original into 4 instead of 12?

NKormanik
Barite | Level 11

The original 4 I didn't convert, but shifted over and up.

 

After the shifting, I needed to convert the 12 to variables.

 

Original SAS output from HPSplit (more to the far right):

 

ID       Path                                                    Count   Success         Fail

0        Root Node                                                3900    0.8244    *    0.1756
1        Root Node                                                3900    0.8244         0.1756
         i_22304_Z < 0.621804 or Missing                          2948    0.8457    *    0.1543
2        Root Node                                                3900    0.8244         0.1756
         i_22304_Z >= 0.621804                                     952    0.7584    *    0.2416
3        Root Node                                                3900    0.8244         0.1756
         i_22304_Z < 0.621804 or Missing                          2948    0.8457         0.1543
         i_21603_Z < -1.27055                                      496    0.7601    *    0.2399
4        Root Node                                                3900    0.8244         0.1756
         i_22304_Z < 0.621804 or Missing                          2948    0.8457         0.1543
         i_21603_Z >= -1.27055 or Missing                         2452    0.8630    *    0.1370
5        Root Node                                                3900    0.8244         0.1756
         i_22304_Z >= 0.621804                                     952    0.7584         0.2416
         i_21104_Z < 0.385132 or Missing                           522    0.8218    *    0.1782
6        Root Node                                                3900    0.8244         0.1756
         i_22304_Z >= 0.621804                                     952    0.7584         0.2416
         i_21104_Z >= 0.385132                                     430    0.6814    *    0.3186

After successfully editing out extra unwanted 'strings', and shifting:

2022-01-06 18_59_49-SAS Universal Viewer - [c__0 desktops_06_desktop_parameters_long_merge_4_across..png

Then four-across:

2022-01-06 19_06_06-SAS Universal Viewer - [c__0 desktops_06_desktop_parameters_long_four_across.sas.png

At this point, while feeling I'm on track, I need to interpret what it actually means.

 

Here is an example of a good split (graph produced by HPSplit):

2022-01-06 19_12_14-C__0 Desktops_06_Desktop_First Cut Decision Tree - Copy_First Cut Long_00002.png.png

On the right the number 0.8563 represents 'Success', based on variable i_22801, parameter being >= -2.379.  The success rate can be further increased by additionally using variable i_21501a, with parameter value >= 0.566.  Percentage success in that branch rises to 89.19%.

 

The challenge is to interpret, wrap it all up, and come to some conclusion as to the best course of action.

 

 

  

Tom
Super User Tom
Super User

Are you saying you are trying to parse the listing output of the proc?  Doesn't it produce any actual datasets you can use?  Did you try running with ODS TRACE  on to at least see if there might be an ODS output you could capture as a dataset?

NKormanik
Barite | Level 11

Thanks @Tom for that suggestion.  Haven't yet run ODS Trace.  Good idea.

 

Well, for the moment, I went about it the long way, taking the Proc HPSplit listing output, and converting it to a seemingly usable dataset.

 

The key now is to try to logically consider what exactly I have.  Not really a programming conundrum.  A tricky puzzle.

 

Thanks!

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1560 views
  • 4 likes
  • 4 in conversation