BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
gsk
Obsidian | Level 7 gsk
Obsidian | Level 7

Let's say I have a variable named text, and let's say it has values of some long texts for each observation. 

 

I want to tell sas this: make a variable named var1 if the text variable length is greater than 1, and put the text variable's index number of 1 to 100 into var 1; make a variable named var2 if a variable length is greater than 101, and put the text variable's index number of 101 to 200 into var 2, and so forth. Could we achieve this using array?

 

So the resulting dataset would be:

 

text                  var1                  var2              ,...,                                      var10

blah blah         blah,.,a             b,..b                             (may not exist if text is not long enough) 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

If you just want to split the string into smaller byte lengths it is trivial. So assume you have existing dataset named HAVE which has a long variable named SOURCE.  Here is code to generate 9 variables named SPLIT1 to SPLIT9 that are each 200 bytes long.

data want ;
  set have ;
  array split [9] $200 ;
  do _n_=1 to 9;
    split[_n_]=substr(source,1+(_n_-1)*200,200);
  end;
run;

Note that if your string has UTF-8 (or other multi-byte encoding) then you might want to add more complexity to prevent splitting a multi-byte character between two string.

View solution in original post

4 REPLIES 4
Tom
Super User Tom
Super User

@gsk wrote:

Let's say I have a variable named text, and let's say it has values of some long texts for each observation. 

 

I want to tell sas this: make a variable named var1 if the text variable length is greater than 1, and put the text variable's index number of 1 to 100 into var 1; make a variable named var2 if a variable length is greater than 101, and put the text variable's index number of 101 to 200 into var 2, and so forth. Could we achieve this using array?

 

So the resulting dataset would be:

 

text                  var1                  var2              ,...,                                      var10

blah blah         blah,.,a             b,..b                             (may not exist if text is not long enough) 

 

 

 


Does not make any sense.  Perhaps if you provided an example that matched the description?

Note you cannot change the number of variables while a step is running.  You can either create as many variables as you could possibly need, or make a more vertical structure where the number of observations vary based on the value of the input.

gsk
Obsidian | Level 7 gsk
Obsidian | Level 7

Sorry for not clarifying my question. If I have a variable with values of long sentences. For example, for an observation, let's say I have: 

 

"SAS was re-designed in SAS 76 with an open architecture that allowed for compilers and procedures. The INPUT and INFILE statements were improved so they could read most data formats used by IBM mainframes. Generating reports was also added through the PUT and FILE statements. The ability to analyze general linear models was also added[27] as was the FORMAT procedure, which allowed developers to customize the appearance of data.[23] In 1979, SAS 79 added support for the CMS operating system and introduced the DATASETS procedure. Three years later, SAS 82 introduced an early macro language and the APPEND procedure.[23]

SAS version 4 had limited features, but made SAS more accessible. Version 5 introduced a complete macro language, array subscripts, and a full-screen interactive user interface called Display Manager.[23] In 1985, SAS was rewritten in the C programming language. This allowed for the SAS' Multivendor Architecture that allows the software to run on UNIX, MS-DOS, and Windows. It was previously written in PL/I, Fortran, and assembly language.[19][23]

In the 1980s and 1990s, SAS released a number of components to complement Base SAS. SAS/GRAPH, which produces graphics, was released in 1980, as well as the SAS/ETS component, which supports econometric and time series analysis. A component intended for pharmaceutical users, SAS/PH-Clinical, was released in the 1990s. The Food and Drug Administration standardized on SAS/PH-Clinical for new drug applications in 2002.[19] Vertical products like SAS Financial Management and SAS Human Capital Management (then called CFO Vision and HR Vision respectively) were also introduced.[28] JMP was developed by SAS co-founder John Sall and a team of developers to take advantage of the graphical user interface introduced in the 1984 Apple Macintosh[29] and shipped for the first time in 1989.[29]Updated versions of JMP were released continuously after 2002 with the most recent release being from 2016.[30][31][32][33][34][35][36]

SAS version 6 was used throughout the 1990s and was available on a wider range of operating systems, including Macintosh, OS/2, Silicon Graphics, and Primos. SAS introduced new features through dot-releases. From 6.06 to 6.09, a user interface based on the windows paradigm was introduced and support for SQL [37] was added.[23] Version 7 introduced the Output Delivery System (ODS) and an improved text editor. ODS was improved upon in successive releases. For example, more output options were added in version 8. The number of operating systems that were supported was reduced to UNIX, Windows and z/OS, and Linux was added.[23][38] SAS version 8 and SAS Enterprise Miner were released in 1999.[19]"

 

 

I want to create variables off of this variable. First derived variable would be: "SAS was re-designed in SAS 76 with an open architecture that allowed for compilers and procedures. The INPUT and INFILE statements were improved so they could read most data formats used by IBM mainfr" because that's the first 1-200 letters. Second derived variable would be: "ames. Generating reports was also added through the PUT and FILE statements. The ability to analyze general linear models was also added[27] as was the FORMAT procedure, which allowed developers to cu" because this is 201-400 letters. I want to create 9 variables off of the variable even though 9 variables might not be able to contain all the sentences. 

 

 

Tom
Super User Tom
Super User

If you just want to split the string into smaller byte lengths it is trivial. So assume you have existing dataset named HAVE which has a long variable named SOURCE.  Here is code to generate 9 variables named SPLIT1 to SPLIT9 that are each 200 bytes long.

data want ;
  set have ;
  array split [9] $200 ;
  do _n_=1 to 9;
    split[_n_]=substr(source,1+(_n_-1)*200,200);
  end;
run;

Note that if your string has UTF-8 (or other multi-byte encoding) then you might want to add more complexity to prevent splitting a multi-byte character between two string.

PaigeMiller
Diamond | Level 26

I can't follow the logic here. How do you go from the text string of "blah blah" which is 9 characters to var1 which is "blah,.,a" and var2  which is "b,..b" (I thought if text was < 100 characters, var2 should be empty)??

 

As requested by @Tom , we need a realistic example (of more than one record) to illustrate the desired results.

--
Paige Miller

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 678 views
  • 0 likes
  • 3 in conversation