JavaScript的RegExp | 黯羽轻扬

一.问题

Where’s Wally

Description:

Write a function that returns the index of the first occurence of the word “Wally”. “Wally” must not be part of another word, but it can be directly followed by a punctuation mark. If no such “Wally” exists, return -1.

Examples:
"Wally" => 0

"Where's Wally" => 8

"Where's Waldo" => -1

"DWally Wallyd .Wally" => -1

"Hi Wally." => 3

"It's Wally's." => 5

"Wally Wally" => 0

"'Wally Wally" => 7

摘自codewars

嗯，在字符串里找笨蛋（Wally）的位置，模式匹配问题

二.解法

2种方案：自己解析 or 正则表达式

自己解析肯定能搞定，不再赘述，这里主要讨论正则表达式解法

1.最初版本（有bug）

function wheresWally(string){console.log(string);
  var regex = /(^| )Wally($|[.' ])/;
  var pos = -1;

  if(string.test(regex)) {
    pos = string.indexOf('Wally');
  }

  return pos;
}

这个实现看起来好像没什么问题，其实存在bug:

'aWally Wally' => 1

期望返回结果是7，原因在于string.indexOf只返回第一次匹配成功的位置，我们想要知道的是regex第一次匹配成功的位置，所以string.lastIndexOf也没有用

regex.test只简单地返回true/false，我们无从得知index，所以regex.test不适用于这个问题

2.修正版本（有bug）

function wheresWally(string){console.log(string);
  var regex = /(^| )Wally($|[.' ])/;
  var pos = -1;

  var res = regex.exec(string);
  if (res !== null) {
    pos = res.index;
  }

  return pos;
}

这次改用regex.exec来做，exec是最强大的正则表达式方法了，肯定能够提供我们需要的信息，MDN的API如下：

// Match "quick brown" followed by "jumps", ignoring characters in between
// Remember "brown" and "jumps"
// Ignore case
var re = /quick\s(brown).+?(jumps)/ig;
var result = re.exec('The Quick Brown Fox Jumps Over The Lazy Dog');

Object	Property/Index	Description	Example
`result`	`[0]`	The full string of characters matched	`Quick Brown Fox Jumps`
	`[1], ...[n ]`	The parenthesized substring matches, if any. The number of possible parenthesized substrings is unlimited.	`[1] = Brown [2] = Jumps`
	`index`	The 0-based index of the match in the string.	`4`
	`input`	The original string.	`The Quick Brown Fox Jumps Over The Lazy Dog`
`re`	`lastIndex`	The index at which to start the next match. When “g” is absent, this will remain as 0.	`25`
	`ignoreCase`	Indicates if the “`i`” flag was used to ignore case.	`true`
	`global`	Indicates if the “`g`” flag was used for a global match.	`true`
	`multiline`	Indicates if the “`m`” flag was used to search in strings across multiple line.	`false`
	`source`	The text of the pattern.	`quick\s(brown).+?(jumps)`

注意：index是exec返回值的属性，而lastIndex是regex的属性，很容易搞错

我们只关注index，它携带了匹配成功的位置。但上面的实现还存在bug：

'aWally Wally' => 6

为什么是6而不是7？仔细看看我们的正则表达式(^| )Wally($|[.' ])，发现本次成功的匹配是从空格开始的，空格的位置确实是7，这个好办，简单修复下就行：

if (string.charAt(pos) === ' ') {
  pos++;
}

其实也可以用string.replace(regex, func(match, p1, p2..., offset, string))来完成同样的任务，offset携带了与index相同的信息

特别注意：global模式对exec的影响是，如果不开g模式，regex.lastIndex的值一直都是0，如果开了g模式，每执行一次exec，regex.lastIndex的值都会更新，还可以手动修改这个值，改变下一次exec开始的位置，例如：

var regex = /^abc/;
undefined
var regex_g = /^abc/g;
undefined
var str = 'abc abcd';
undefined
regex.lastIndex;
0
regex_g.lastIndex;
0
regex.exec(str);
["abc"]
regex.lastIndex;
0
regex_g.exec(str);
["abc"]
regex_g.lastIndex;
3
regex_g.lastIndex = 1;
1
regex_g.exec(str);
null

3.网友的解法

解法1

function wheresWally(string){
  return (" "+string).search(/ Wally\b/) 
}

先改原串再匹配，很巧妙

解法2

function wheresWally(string) {
  var match = /(^|[ ])Wally\b/.exec(string)
  return match ? match.index + match[1].length : -1
}

和笔者思路一致，但简短得多，用match[1].length巧妙解决空格问题，比if...charAt漂亮多了

解法3

function wheresWally(string){
  var mtch = " ".concat(string).match(/\sWally\b/)
  var idx = mtch ? " ".concat(string).indexOf(" Wally") : -1;
  return idx;
}

有空格/无空格分开处理，复杂问题简单化的一般方法：分情况

解法4

function wheresWally(string) {
  var match = string.match(/(^|\s)Wally($|[^\w])/); 
  return match ? match.index + match[0].indexOf('Wally') : -1;
}

配合indexOf解决空格问题，也算不错的思路

三.反思

前辈说的没错，社区是一种重要的学习途径

1.String里与RegExp有关的方法

str.match(regexp)

返回匹配结果数组，或者null

不开g模式就只匹配一次，开g模式就把所有匹配项都装入结果数组
str.replace(regexp, func)

func的参数依次是match, p1, p2..., offset, string ~ 匹配部分, 捕获部分1, 捕获部分2…, 匹配位置, 整串
str.search(regexp)

返回匹配位置，或者-1

注意：开不开g模式都只返回第一个匹配位置，这一点和regex.test一样（开g模式纯属浪费）

2.RegExp

1.属性

regex.lastIndex

下一次匹配将要开始的位置，初始值是0
regex.global/regex.ignoreCase/regex.multiline

对应g/i/m三种模式（全局/忽略大小写/多行），返回true/false
regex.source

返回模式串本身（字面量方式中两条斜线之间的部分，或者new方式中转换为字面量后两条斜线之间的部分）

2.方法

regex.exec(str)

返回匹配结果数组，或者null

结果数组中第一个元素匹配部分，后面的元素依次是捕获部分

注意：结果数组还有两个属性
- index
  
  匹配位置
- input
  
  整串
regex.test()

返回true/false，开不开g模式不影响结果
~~regex.compile()~~

过时了，不建议使用

3.联系

regex.test(str)等价于str.search(regex) !== -1
regex.exec(str)相当于str.replace(regex, func)，exec能提供更多的控制能力（regex.lastIndex）