BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
r4321
Pyrite | Level 9

Hello all, 

 

I have a dataset in SAS, which was imported from excel (unfortunately a SAS download from the master database is not an option). Also, I would prefer to manipulate this data in SAS as opposed to excel. Anyhow... 

 

I have cells with multiple codes within them... usually like so 384849495959.... They are usually in groups of 4, with atleast 2 separate groups in a cell, and potentially up to 10 or more groups. 

 

When I break them up using the substring command. This seems to work well... However, SAS seems confused about the number groups beyond the first one. For example. 

 


data adt.all2;
set adt.all1;
if ap =2 then psic1 = substrn(var3,1,4);
if ap =2 then psic2 = substrn(var3,5,6);

run; 

 

 

Cell 1                    after code      newcell1                    newcell2

35554555             -------->             3555                           4555 

 

 

I have to specific the psic2 length at 6 even though it is really 4 or it unecessarily cuts off the number i want in the cell.. 

 

 

From this, I can reformat newcell1 as a numeric cell, while newcell2 will give me problems. Ive tried the to multiply, as well as compress, left, put and input functions to try and get newcell2 into a 'numeric variable.' However, I get errors everytime. From my reading it seems as though this has something to do with the hexadecimal representation. I get that it likely has to do with that, but that doesnt really help me in resolving the problem so that I have a numeric variable in newcell2 to work with. And re-inputting data is not really an option. 

 

 

Anyhow, thanks for your help - advice is certainly appreciated. 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Wonderful, thank you!

 

So, at least these five examples all have hex characters 0D0A -- carriage return, line feed -- as "delimiters" between the first two 4-digit codes. So, to get rid of these (and only these) unwanted characters, you can use:

var3_new=compress(var3,'0D0A'x);

Edit: Or even better: var3=...  

View solution in original post

16 REPLIES 16
FreelanceReinh
Jade | Level 19

Hello @r4321,

 

If your codes consist of a known set of characters, e.g., only digits 0, ..., 9, you could apply the COMPRESS function with the K ("keep") modifier.

 

Example:

var3c=compress(var3,,'kd'); /* removes all characters but digits */

 

 

If you're interested in the "offending" non-numeric characters that are contained in VAR3, you could post the output (5 lines) of the following PROC PRINT step:

proc print data=adt.all1(obs=5);
where verify(var3, ' 0123456789');
format var3 $hex44.;
var var3;
run;

This will show the hex codes of the first (up to) 22 characters of the first 5 VAR3 values in dataset ADT.ALL1 which contain characters other than blanks or digits. Possibly, these first 5 values are representative for the dataset.

 

Edit: As soon as we know the offending characters, we can specify them in the second argument of the COMPRESS function (then without the K modifier) in order to remove them. But the "keep" approach might be more elegant in your case.

r4321
Pyrite | Level 9

Freelance, thanks for your help --- Ive tried a couple things based on your recommendations. Still stuck, however. 

 

 


Screenshot (16).png
r4321
Pyrite | Level 9

One more 


Screenshot (19).png
FreelanceReinh
Jade | Level 19

@r4321 wrote:

One more 

[Screenshot (19).png]


Thanks for posting the screenshots.

 

This last one is easy: PSIC2 contains 4-digit codes such as 3534. In this case, the result of input(psic2,hex.) is the integer represented by the hexadecimal number 3534, which is 13620. BYTE(13620) requests the 13621th character of the ASCII collating sequence, which contains only 256 characters, hence the "Invalid argument" notes.

 

Are your 4-digit codes hexadecimal values? This could at least explain why not all of them can be converted to numeric values using standard numeric informats.

r4321
Pyrite | Level 9
It would seem that the first 4 digit code of a group (e.g., psic1) can break off into a column quite nicely and be switched into numeric format. But it would seem as though when I try to 'substring' other codes from the main variable string, they must get turned into hexadecimal values... even though all it shows me are the numbers in the individual cells...

The question is how to rectify this?
Thanks for your attention / help.
FreelanceReinh
Jade | Level 19

I think, the best way to find out what's going on is to look (with a $HEXw. format) at the original long character strings (before they are cut into pieces), especially those which cause problems.

 

This is why I had suggested the PROC PRINT step applying $HEX44. format to (a few values of) variable VAR3 in dataset ADT.ALL3.

 

Your screenshot no. 18 shows that no observations in ADT.ALL3 have VAR3C values which contain non-blank, non-numeric characters. I am surprised to see that your permanent dataset ADT.ALL3 contains a variable named VAR3C. I had used this name in my example code

var3c=compress(var3,,'kd');

Therefore, I suspect that you newly created VAR3C in ADT.ALL3 using the above line of code. In this case, the result "No observations ..." is only logical, because this VAR3C, by definition, contains only digits if anything.

 

So, to obtain a non-trivial result from the abovementioned PROC PRINT step, please replace "var3c" by the name of one of the variables containing those raw "multiple codes" (variable length $70 or so).

r4321
Pyrite | Level 9

Thanks for your comments.. heres a couple screenshots based on them 


Screenshot (20).png
r4321
Pyrite | Level 9

Heres another...which shows what the digits actually are for the first 5 observations in the dataset. Thanks. 


Screenshot (21).png
FreelanceReinh
Jade | Level 19

Wonderful, thank you!

 

So, at least these five examples all have hex characters 0D0A -- carriage return, line feed -- as "delimiters" between the first two 4-digit codes. So, to get rid of these (and only these) unwanted characters, you can use:

var3_new=compress(var3,'0D0A'x);

Edit: Or even better: var3=...  

r4321
Pyrite | Level 9

Freelance, 

 

Not quite sure how the you figured that out! but it works! 

Thanks a lot, I unfortunately spent most of the day trying to get this figured out...So I really appreciate your patience and help!!

 

This command works great

 

 

data adt.all2;
set adt.all1;
var3_new =compress(var3,'0D0A'x);
run;

 

But did you mean try this instead?


data adt.all2;
set adt.all1;
var3=...;
run;

 

 

Or... 

 

 

data adt.all2;
set adt.all1;
var3_new =compress(var3=...);
run;

 

 

The first one worked, but not sure what you mean by the var3=... 

 

 

 

Anyhow, youre great, thanks! 

 

FreelanceReinh
Jade | Level 19

@r4321 wrote:
But did you mean try this instead?


data adt.all2;
set adt.all1;
var3=...;
run; 



You're welcome. Sorry for being too lazy (in the middle of the night, CET) to write out the alternative code:

var3=compress(var3,'0D0A'x);

i.e., the suggesition to overwrite VAR3 rather than create a new variable VAR3_NEW (so as to "save" the original value with the "precious" '0D0A'x characters). The three periods "..." were meant to denote an ellipsis! 🙂

ballardw
Super User

Please provide some additional details about your variables. I am unable to duplicate your issue with numeric or a string variable that that starts with the characters shown.

Since you mention up to 10 groups I have to assume that var3 is character of at least 40 characters. Do you explicitly assign a length to the psic variables? a format?

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 16 replies
  • 4103 views
  • 6 likes
  • 3 in conversation