Re: How to remove brackets and quotes from observations in a dataset

cfun · Posted 04-09-2021 01:20 PM

Hello all,

I have a list of artists that is formatted like so:

['Justin Bieber']

['Brockhampton']

etc

and I want to make it so these variables no longer have quotes or brackets and instead look like:

Justin Bieber

Brockhampton

How would I do this?

cfun · Posted 04-09-2021 01:25 PM

Additionally, this dataset contains many other variables, but I am fine with how they are formatted.

Reeza · Posted 04-09-2021 01:25 PM

COMPRESS()

x = compress(originalVar, "[]'");

Note that the double quotes contain both brackets and single quote.

@cfun wrote:

Hello all,

I have a list of artists that is formatted like so:

['Justin Bieber']

['Brockhampton']

etc

and I want to make it so these variables no longer have quotes or brackets and instead look like:

Justin Bieber

Brockhampton

How would I do this?

Sri4_ · Posted 09-29-2021 01:58 AM

How to remove brackets and quotes from observations in a dataset.

in my data having values 22.9() and 1.9(mg/dl) like this but i want to remove () without removing this brackets 1.9mg/dl

Patrick · Posted 09-29-2021 02:42 AM

@Sri4_ Please as this as a NEW question and don't just post into the middle of an "old" discussion that unfortunately never got closed.

You can always reference another discussion in your NEW question.

LeonidBatkhan · Posted 04-09-2021 03:03 PM

Using compress() function as suggested by Reeza may have unintended consequences. For example, if you have legitimate single quotes in a name, such as ['O'Connor'], compress will remove ' after O as well. A cleaner way would be

NAME = transtrn(NAME,"['",trimn(''));
NAME = transtrn(NAME,"']",trimn(''));

This will selectively remove substring

['

and substring

']

For more information on this technique, see Deleting a substring from a SAS string.

Hope this helps.

➤ Leonid's SAS blog

Ksharp · Posted 04-10-2021 06:23 AM

Couldl be simpler.

data have;
input str $20.;
cards;
['Justin Bieber']
['Brockhampton']
['O'Connor']
;

data want;
 set have;
 want=prxchange("s/^\['|'\]$//",-1,strip(str));
 run;

Tom · Posted 04-09-2021 03:14 PM

Depends on whether those characters every appear in the actual data or not.

If not then it is pretty simple. So if your existing datasets is named HAVE and the existing variable is named VAR then you could use COMPRESS() to remove those three characters.

data want;
  set have;
  var = compress(var,"[']");
run;

Or you could use SCAN() .

  var = scan(var,1,"[']");

If they appear in the data and the quotes are properly escaped then you might want to use SCAN() and DEQUOTE().

  var = dequote(scan(var,1,"[]",'q'));

If the embedded single quotes are "escaped" with \ instead of being doubled up then use:

  var = dequote(scan(tranwrd(var,"\'","''"),1,"[]",'q'));

If they are not protected in anyway then perhaps you just want to remove the first 2 and last 2 characters.

 var = substrn(var,3,length(var)-4);

Patrick · Posted 04-11-2021 02:49 AM

As a variant to the RegEx @Ksharp posted below RegEx will only remove ['....'] if it encloses the string and both parts exist.

data have;
  input str $20.;
  cards;
['Justin Bieber']
['Brockhampton']
['O'Connor']
['O'['Con']nor']
['O'['Connor']
['O'Con']nor']
['O'Connor
O'Connor']
O'['Con']nor
;

data want;
  set have;
  length new_str $20;
  new_str=prxchange("s/^\['(.*)'\]/$1/oi",1,strip(str));
run;

andreas_lds · Posted 04-12-2021 02:50 AM

I am regex-fanboy, but in this case i don't see the need to use a regex.

The first and last two chars should be removed, so why not just substr:

clean = substr(str, 3, lengthn(str) - 4);

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away