如何从数组php中删除类似的条目 - How to remove similar entries from array php

How to remove similar entries from array php

本文关键字：删除数组 php | 更新日期: 2023-09-27

所以我有这样的数组，有时它有非常相似的条目：

Array
(
    [0] => greys anatomy
    [1] => element 3d
    [2] => interstellar
    [3] => monster ball
    [4] => scorpion
    [5] => taken 3
    [6] => the flash
    [7] => wild card
    [8] => big bang theory
    [9] => the big bang theory
    [10] => fredrik kempe vincero
    [11] => fredrik kempe vicero
)

我想删除较长的类似条目。因此，例如在这个数组中：应该删除[9] => the big bang theory和[10] => fredrik kempe vincero条目。因为它们类似于第8个和第11个条目，但更长。

编辑：所以，如果有人需要，我从下面的两个答案中找到了可行的解决方案：

function check_similar($first, $second)
{
    similar_text($first, $second, $percent);
  if ($percent >= 80) { //needed percent value
    return true;
  }
  else {
    return false;
  }
}
for ($i = 0; $i < count($array); $i++) {
   for ($j = $i; $j < count($array); $j++) {
      if ($j > $i && check_similar($array[$i],$array[$j]) == true) {
         $array[$j] = null;
      }
   }
}
// filter array to remove null values and reindex
$array = array_values(array_filter($array));
print_r($array);

字符串相似性是一个很难解决的问题。有几种复杂的方法，但没有一种能像人类所做的那样有效。

看看php-soundhex和levenstein，这对于您的特定情况来说可能是一个简单的解决方案。

在任何情况下，给定一个自定义函数来定义一个字符串是否与另一个字符串相似，要使您的数组唯一，您必须执行以下操作：

// set to null all subsequent similar strings
for ($i = 0; $i < count($array); $i++) {
   for ($j = $i; $j < count($array); $j++) {
      if ($j > $i && similar($array[$i],$array[$j])) {
         $array[$j] = null;
      }
   }
}
// filter array to remove null values
$array = array_filter($array);

看看类似的_text函数。

similar_text('the big bang theory','big bang theory', $percent);
echo $percent; // 88%

这显然比看起来更困难，但可以在制作此数组时进行此检查。

请参阅此链接以获取替代实现。