SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Trying to break up variable string within a cell

Accepted Solution Solved
Reply
Contributor
Posts: 50
Accepted Solution

Trying to break up variable string within a cell

Hello all, 

 

I have a dataset in SAS, which was imported from excel (unfortunately a SAS download from the master database is not an option). Also, I would prefer to manipulate this data in SAS as opposed to excel. Anyhow... 

 

I have cells with multiple codes within them... usually like so 384849495959.... They are usually in groups of 4, with atleast 2 separate groups in a cell, and potentially up to 10 or more groups. 

 

When I break them up using the substring command. This seems to work well... However, SAS seems confused about the number groups beyond the first one. For example. 

 


data adt.all2;
set adt.all1;
if ap =2 then psic1 = substrn(var3,1,4);
if ap =2 then psic2 = substrn(var3,5,6);

run; 

 

 

Cell 1                    after code      newcell1                    newcell2

35554555             -------->             3555                           4555 

 

 

I have to specific the psic2 length at 6 even though it is really 4 or it unecessarily cuts off the number i want in the cell.. 

 

 

From this, I can reformat newcell1 as a numeric cell, while newcell2 will give me problems. Ive tried the to multiply, as well as compress, left, put and input functions to try and get newcell2 into a 'numeric variable.' However, I get errors everytime. From my reading it seems as though this has something to do with the hexadecimal representation. I get that it likely has to do with that, but that doesnt really help me in resolving the problem so that I have a numeric variable in newcell2 to work with. And re-inputting data is not really an option. 

 

 

Anyhow, thanks for your help - advice is certainly appreciated. 


Accepted Solutions
Solution
‎04-29-2016 08:37 PM
Trusted Advisor
Posts: 1,115

Re: Trying to break up variable string within a cell

[ Edited ]

Wonderful, thank you!

 

So, at least these five examples all have hex characters 0D0A -- carriage return, line feed -- as "delimiters" between the first two 4-digit codes. So, to get rid of these (and only these) unwanted characters, you can use:

var3_new=compress(var3,'0D0A'x);

Edit: Or even better: var3=...  

View solution in original post


All Replies
Trusted Advisor
Posts: 1,115

Re: Trying to break up variable string within a cell

[ Edited ]

Hello @r4321,

 

If your codes consist of a known set of characters, e.g., only digits 0, ..., 9, you could apply the COMPRESS function with the K ("keep") modifier.

 

Example:

var3c=compress(var3,,'kd'); /* removes all characters but digits */

 

 

If you're interested in the "offending" non-numeric characters that are contained in VAR3, you could post the output (5 lines) of the following PROC PRINT step:

proc print data=adt.all1(obs=5);
where verify(var3, ' 0123456789');
format var3 $hex44.;
var var3;
run;

This will show the hex codes of the first (up to) 22 characters of the first 5 VAR3 values in dataset ADT.ALL1 which contain characters other than blanks or digits. Possibly, these first 5 values are representative for the dataset.

 

Edit: As soon as we know the offending characters, we can specify them in the second argument of the COMPRESS function (then without the K modifier) in order to remove them. But the "keep" approach might be more elegant in your case.

Contributor
Posts: 50

Re: Trying to break up variable string within a cell

Freelance, thanks for your help --- Ive tried a couple things based on your recommendations. Still stuck, however. 

 

 


Screenshot (16).png
Contributor
Posts: 50

Re: Trying to break up variable string within a cell


Screenshot (17).png
Contributor
Posts: 50

Re: Trying to break up variable string within a cell


Screenshot (18).png
Contributor
Posts: 50

Re: Trying to break up variable string within a cell

One more 


Screenshot (19).png
Trusted Advisor
Posts: 1,115

Re: Trying to break up variable string within a cell


r4321 wrote:

One more 

[Screenshot (19).png]


Thanks for posting the screenshots.

 

This last one is easy: PSIC2 contains 4-digit codes such as 3534. In this case, the result of input(psic2,hex.) is the integer represented by the hexadecimal number 3534, which is 13620. BYTE(13620) requests the 13621th character of the ASCII collating sequence, which contains only 256 characters, hence the "Invalid argument" notes.

 

Are your 4-digit codes hexadecimal values? This could at least explain why not all of them can be converted to numeric values using standard numeric informats.

Contributor
Posts: 50

Re: Trying to break up variable string within a cell

It would seem that the first 4 digit code of a group (e.g., psic1) can break off into a column quite nicely and be switched into numeric format. But it would seem as though when I try to 'substring' other codes from the main variable string, they must get turned into hexadecimal values... even though all it shows me are the numbers in the individual cells...

The question is how to rectify this?
Thanks for your attention / help.
Trusted Advisor
Posts: 1,115

Re: Trying to break up variable string within a cell

I think, the best way to find out what's going on is to look (with a $HEXw. format) at the original long character strings (before they are cut into pieces), especially those which cause problems.

 

This is why I had suggested the PROC PRINT step applying $HEX44. format to (a few values of) variable VAR3 in dataset ADT.ALL3.

 

Your screenshot no. 18 shows that no observations in ADT.ALL3 have VAR3C values which contain non-blank, non-numeric characters. I am surprised to see that your permanent dataset ADT.ALL3 contains a variable named VAR3C. I had used this name in my example code

var3c=compress(var3,,'kd');

Therefore, I suspect that you newly created VAR3C in ADT.ALL3 using the above line of code. In this case, the result "No observations ..." is only logical, because this VAR3C, by definition, contains only digits if anything.

 

So, to obtain a non-trivial result from the abovementioned PROC PRINT step, please replace "var3c" by the name of one of the variables containing those raw "multiple codes" (variable length $70 or so).

Contributor
Posts: 50

Re: Trying to break up variable string within a cell

Thanks for your comments.. heres a couple screenshots based on them 


Screenshot (20).png
Contributor
Posts: 50

Re: Trying to break up variable string within a cell

Heres another...which shows what the digits actually are for the first 5 observations in the dataset. Thanks. 


Screenshot (21).png
Solution
‎04-29-2016 08:37 PM
Trusted Advisor
Posts: 1,115

Re: Trying to break up variable string within a cell

[ Edited ]

Wonderful, thank you!

 

So, at least these five examples all have hex characters 0D0A -- carriage return, line feed -- as "delimiters" between the first two 4-digit codes. So, to get rid of these (and only these) unwanted characters, you can use:

var3_new=compress(var3,'0D0A'x);

Edit: Or even better: var3=...  

Contributor
Posts: 50

Re: Trying to break up variable string within a cell

Freelance, 

 

Not quite sure how the you figured that out! but it works! 

Thanks a lot, I unfortunately spent most of the day trying to get this figured out...So I really appreciate your patience and help!!

 

This command works great

 

 

data adt.all2;
set adt.all1;
var3_new =compress(var3,'0D0A'x);
run;

 

But did you mean try this instead?


data adt.all2;
set adt.all1;
var3=...;
run;

 

 

Or... 

 

 

data adt.all2;
set adt.all1;
var3_new =compress(var3=...);
run;

 

 

The first one worked, but not sure what you mean by the var3=... 

 

 

 

Anyhow, youre great, thanks! 

 

Trusted Advisor
Posts: 1,115

Re: Trying to break up variable string within a cell


r4321 wrote:
But did you mean try this instead?


data adt.all2;
set adt.all1;
var3=...;
run; 



You're welcome. Sorry for being too lazy (in the middle of the night, CET) to write out the alternative code:

var3=compress(var3,'0D0A'x);

i.e., the suggesition to overwrite VAR3 rather than create a new variable VAR3_NEW (so as to "save" the original value with the "precious" '0D0A'x characters). The three periods "..." were meant to denote an ellipsis! :-)

Super User
Posts: 10,538

Re: Trying to break up variable string within a cell

Please provide some additional details about your variables. I am unable to duplicate your issue with numeric or a string variable that that starts with the characters shown.

Since you mention up to 10 groups I have to assume that var3 is character of at least 40 characters. Do you explicitly assign a length to the psic variables? a format?

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 16 replies
  • 568 views
  • 6 likes
  • 3 in conversation