Incidencia #39094

quoting issues with [[ string =~ re ]]

Abrir Fecha: 2019-03-30 20:53 Última actualización: 2020-12-05 21:56

Informador:
Propietario:
Tipo:
Estado:
Cerrado
Componente:
Hito:
(Ninguno)
Prioridad:
5 - Medium
Gravedad:
5 - Medium
Resolución:
Fixed
Fichero:
Ninguno
Vote
Score: 0
No votes
0.0% (0/0)
0.0% (0/0)

Details

2.48 has introduced a Korn-style [[...]] construct. For the =~ operator, I see the bash32+ approach, as opposed to the bash31/zsh one was chosen with regards to quoting.

{ a =~ '.' ] does match as it does in zsh.

But:

[[ a =~ '.' ]] doesn't match, because quotes remove their special meaning of regex operators.

Now, a problem with that is that (, ), and | are regex operators but cannot appear in a normal shell word. At the moment in yash:

yash -c '[[ a =~ a||a =~ b ]]'

works (like in zsh) where || is the "OR" token inside [[...]]

But you can't use the | ERE operator:

$ ./yash -c '[[ a =~ a|b ]]'
yash -c:1: syntax error: invalid word `|' between `[[' and `]]'

Same as in zsh, but in zsh, like in bash3.1, you'd write [[ a =~ 'a|b' ]], but that doesn't work in yash because those quotes remove its special meaning to |.

In zsh, [[ a =~ (a|b) ]] works because (a|b) is the same syntax as the (a|b) glob operator (specific to zsh, ksh has @(a|b) instead).

There's a similar problem with ( and ):

$ ./yash -c '[[ x =~ (aa)* ]]'
yash -c:1: syntax error: `(' is not a valid operand in the conditional expression

yash also has the same bug (actually worse) as bash originally had in that, to remove the special meaning of re operators, it escapes them with \ before calling regcomp.

But it inserts that backslash even when it should not, like inside bracket expressions (as bash originally did), but also when before characters that are not regexp operators (bash didn't have that bug).

That means that [[ '\' =~ ["."] ]] matches (like in old bash versions), but also [[ x =~ "<" ]] on systems where \< is the word boundary operator for instance.

yash should insert that \ only where needed (where [...] is a special case, also beware of [^]")"]).

There's also the question of whether [[ b =~ [a"-"c] ]] should work the same as [[ b = [a"-"c] ]]

Ticket History (3/12 Histories)

2019-03-30 20:53 Updated by: stephane-c
  • New Ticket "quoting issues with [[ string =~ re ]]" created
2019-03-31 04:50 Updated by: stephane-c
  • Details Updated
2019-03-31 05:04 Updated by: stephane-c
Comentario

Obviously, the easiest (and I'd argue cleanest) resolution is to adopt the bash31/zsh approach instead.

You may also want to consider adding support for PCREs instead of EREs in the future (as zsh does with the rematchpcre option; PCREs are the new de-facto regex standard these days). And with the bash32+ approach, do a correct escaping could become tricky.

ksh93 behaves a bit like bash32+, but quoting works differently with quotes and with backslashes and quotes only disable some RE operators ([[ a =~ ".+" ]] matches there but not [[ a =~ \.\+ ]] nor [[ a = "a*" ]])

2019-04-29 17:56 Updated by: magicant
Comentario

Since yash introduced the double-bracket command only for compatibility reasons, I'm not willing to intentionally diverge from the original ksh behaviors. To support ksh-like handling of | and parentheses, however, I need to implement the quirky syntax parser that treats them as normal word characters. *sigh*

2019-04-29 23:24 Updated by: stephane-c
Comentario

Reply To magicant

Since yash introduced the double-bracket command only for compatibility reasons, I'm not willing to intentionally diverge from the original ksh behaviors.

Note that [[ =~ ]] comes from bash, not ksh. ksh93 added it later, but it's unfinished and pretty bogus there as mentioned above. ksh88, pdksh and all its derivatives don't have it.

2020-09-24 12:40 Updated by: magicant
Comentario

I still need more time to learn what bash's and ksh's parsers are doing to handle special characters after the =~ token.

bash5.0 ksh2020 zsh5.8  yash2.50
0       0       SE      SE      [[ a =~ a|b ]]
0       0       SE      SE      [[ a =~ |a|b ]]
0       0       SE      SE      [[ a =~ a|| ]]
0       0       SE      SE      [[ a =~ ||a ]]

0       0       0       SE      [[ a =~ (a) ]]
0       0       0       SE      [[ a =~ (((a))) ]]
1       1       1       0       [[ a =~ "<" ]]

1       1       0       1       [[ a =~ "a|b" ]]
0       0       SE      0       [[ \\ =~ \\ ]]
1       1       0       1       [[ \\ =~ \\\\ ]]
1       1       0       1       [[ a =~ \(a\) ]]
1       1       0       1       [[ a =~ \.\+ ]]
1       1       0       1       [[ a =~ "a*" ]]

0       0       0       0       [[ a =~ a$ ]]
0       0       0       0       [[ z =~ [[:alpha:]] ]]
0       0       0       0       [[ \(\) =~ \(\) ]]
0       0       0       0       [[ \| =~ \| ]]
0       0       0       0       [[ aaa =~ a{3} ]]

1       0       0       1       [[ a =~ ".+" ]]
(SE stands for syntax error)
2020-11-17 22:39 Updated by: magicant
Comentario

Bash seems to treat | and parentheses specially when parsing the token after =~.

Parentheses are so special that they can contain spaces inside and can be nested:

  1. $ bash -c '[[ a =~ ( |(a| ).*) ]]; echo $?'
  2. 0

Next: How unquoted backquotes and dollars are handled in the regex token?

2020-11-30 21:57 Updated by: magicant
Comentario
bash5.0 ksh2020 zsh5.8  yash2.50
0       0       0       0       [[ 2 =~ $((1+1)) ]]
0       0       0       0       [[ a =~ `echo a` ]]
0       0       0       0       [[ a =~ `echo "a|b"` ]]

0       0       0       0       v=a; [[ a =~ ${v} ]]
0       0       0       0       s=\*; [[ abc =~ ab${s}c ]]
0       0       0       0       s=\|; [[ a =~ a${s}b ]]
1       1       1       1       s="\|"; [[ a =~ a${s}b ]]
0       0       0       0       e=\(a\|b\); [[ a =~ ${e} ]]

0       0       0       0       v=a; [[ a =~ "${v}" ]]
1       1       0       1       s=\*; [[ abc =~ "ab${s}c" ]]
1       1       0       1       s=\|; [[ abc =~ "a${s}b" ]]
1       1       1       1       s="\|"; [[ abc =~ "a${s}b" ]]
1       1       0       1       e=\(a\|b\); [[ a =~ "${e}" ]]
2020-12-01 03:24 Updated by: stephane-c
Comentario

Reply To magicant

> bash5.0 ksh2020 zsh5.8  yash2.50

Note that ksh2020 (based on ksh93v-) development has been abandoned (and was very buggy). For a version of ksh93 still maintained and in the open, you can have a look at https://github.com/ksh93/ksh (based on ksh93u+). It's likely not to make a difference for your test cases in any case.

In any case, yes, for a [[ =~ ]] portable to all shells that have it, at the moment, the only viable option is to store the regexp in a variable and use [[ $subject =~ $regexp ]] (with $regexp unquoted).

2020-12-02 00:29 Updated by: magicant
Comentario

Fixed the ( ) and | issue in r4151, but the escaping issue is still remaining.

2020-12-02 23:27 Updated by: magicant
Comentario
bash5.0 ksh2020 zsh5.8  yash2.50
0       0       0       0       [[ b =~ [a"-"c] ]]
1       0       1       1       [[ b =  [a"-"c] ]]
1       1       1       1       [[ - =~ [a"-"c] ]]
0       1       0       0       [[ - =  [a"-"c] ]]
1       1       1       0       [[ \\ =~ ["."] ]]
1       1       1       1       [[ \\ =  ["."] ]]
0       1       0       1       [[ \\ =~ [a[.\\.]c] ]]
1       1       1       0       [[ \\ =  [a[.\\.]c] ]]
0       0       0       0       [[ a] =~ ^[a"]"]$ ]]
1       1       1       1       [[ a] =   [a"]"]  ]]
0       0       0       0       [[ [a] =~ "["a] ]]
0       0       0       0       [[ [a] =  "["a] ]]
(Edited, 2020-12-05 21:39 Updated by: magicant)
2020-12-05 21:56 Updated by: magicant
  • Resolución Update from Ninguno to Fixed
  • Estado Update from Open to Cerrado
Comentario

Dealt with the escaping issue in r4155 and r4156

I believe I made reasonable effort to make yash behave much like bash, but I don't intend to make it exactly the same. The detail of escaping in the double-bracket command is undocumented, non-portable part of the shell after all.

Attachment File List

No attachments

Editar

You are not logged in. I you are not logged in, your comment will be treated as an anonymous post. » Entrar