解释使用位向量来确定是否所有字符都是唯一的

H

Hussain

我有一个偷偷摸摸的怀疑你从我正在阅读的同一本书中得到这个代码......这里的代码本身并不像运算符那么神秘 - |=，&，和 << 通常不使用我们外行 - 作者没有费心花额外的时间来解释这个过程，也没有解释这里涉及的实际机制是什么。一开始我对这个线程上的先前答案感到满意，但只是在抽象层面上。我回到它是因为我觉得需要一个更具体的解释——缺少一个总是让我感到不安。

此运算符 << 是一个左位移位器，它采用该数字或操作数的二进制表示并将其移动到由右侧的操作数或数字指定的许多位置，例如仅在二进制中的十进制数。我们乘以 2 为底——当我们向上移动时，很多地方不是以 10 为底——所以右边的数字是指数，左边的数字是 2 的基数倍数。

此运算符 |=（称为按位或赋值）将左侧的操作数与右侧的操作数进行或运算并将结果分配给左侧的操作数（x |= y 等价于 x = x | y）。同样，运算符（'&'）将'和'左侧的运算符与右侧的运算符。这也有一个按位与赋值（x &= y 等价于 x = x & y）。

所以我们这里有一个哈希表，每次检查器获取或'd（checker |= (1 << val)）时，它都存储在一个32位二进制数中，其中一个字母的指定二进制值将其对应的位设置为true。字符的值与检查器 (checker & (1 << val)) > 0) 是与 - 如果它大于 0，我们知道我们有一个欺骗 - 因为两个相同的位设置为 true 和'd 一起将返回 true 或 '1''。

有 26 个二进制位置，每个位置对应一个小写字母——作者确实说过假设字符串只包含小写字母——这是因为我们只剩下 6 个（32 位整数）位置可供使用——而且比我们发生碰撞

00000000000000000000000000000001 a 2^0

00000000000000000000000000000010 b 2^1

00000000000000000000000000000100 c 2^2

00000000000000000000000000001000 d 2^3

00000000000000000000000000010000 e 2^4

00000000000000000000000000100000 f 2^5

00000000000000000000000001000000 g 2^6

00000000000000000000000010000000 h 2^7

00000000000000000000000100000000 i 2^8

00000000000000000000001000000000 j 2^9

00000000000000000000010000000000 k 2^10

00000000000000000000100000000000 l 2^11

00000000000000000001000000000000 m 2^12

00000000000000000010000000000000 n 2^13

00000000000000000100000000000000 o 2^14

00000000000000001000000000000000 p 2^15

00000000000000010000000000000000 q 2^16

00000000000000100000000000000000 r 2^17

00000000000001000000000000000000 s 2^18

00000000000010000000000000000000 t 2^19

00000000000100000000000000000000 u 2^20

00000000001000000000000000000000 v 2^21

00000000010000000000000000000000 w 2^22

00000000100000000000000000000000 x 2^23

00000001000000000000000000000000 y 2^24

00000010000000000000000000000000 z 2^25

因此，对于输入字符串 'azya'，随着我们一步一步地移动

字符串'a'

a      =00000000000000000000000000000001
checker=00000000000000000000000000000000

checker='a' or checker;
// checker now becomes = 00000000000000000000000000000001
checker=00000000000000000000000000000001

a and checker=0 no dupes condition

字符串'az'

checker=00000000000000000000000000000001
z      =00000010000000000000000000000000

z and checker=0 no dupes 

checker=z or checker;
// checker now becomes 00000010000000000000000000000001

字符串 'azy'

checker= 00000010000000000000000000000001    
y      = 00000001000000000000000000000000 

checker and y=0 no dupes condition 

checker= checker or y;
// checker now becomes = 00000011000000000000000000000001

字符串'azya'

checker= 00000011000000000000000000000001
a      = 00000000000000000000000000000001

a and checker=1 we have a dupe

现在，它声明了一个重复

@ivan-tichy 你测试过这个吗？似乎它将无法检测到重复的“a”字符，因为它被设置为 0 并且左移它仍会将其保持在 0。

@Riz不，它总是从'1'开始，算法根据字母移动1。因此，如果字母“a”出现一次，它将是 1，即 (....000001)。

@Ivan Man，我也在想同样的事情。即使选择的答案也没有解释运营商。感谢您提供详细信息。

我应该假设上述唯一检查仅适用于排序字符集 (abcd...z) 吗？不与（bcad ...）

“我有一个偷偷摸摸的怀疑你从我正在阅读的同一本书中得到了这个代码”同样在这里 :) 让我发笑

S

Snowbear

int checker 在这里用作位存储。整数值中的每一位都可以视为一个标志，因此 int 最终是一个位数组（标志）。代码中的每个位都说明是否在字符串中找到了具有位索引的字符。出于同样的原因，您可以使用位向量而不是 int。它们之间有两个区别：

尺寸。 int 具有固定大小，通常为 4 个字节，这意味着 8*4=32 位（标志）。位向量通常可以有不同的大小，或者您应该在构造函数中指定大小。

API。使用位向量，您将更容易阅读代码，可能是这样的：vector.SetFlag(4, true); // 将索引 4 处的标志设置为 true，对于 int，您将拥有较低级别的位逻辑代码：checker |= (1 << 5); // 将索引 5 处的标志设置为 true

也可能 int 可能会快一点，因为位操作非常低级，可以由 CPU 按原样执行。 BitVector 允许编写更少的神秘代码，而且它可以存储更多的标志。

供将来参考：位向量也称为 bitSet 或 bitArray。以下是针对不同语言/平台的此数据结构的一些链接：

CPP：比特集

Java：位集

C#：BitVector32 和 BitArray

java有BitVector类吗？我找不到任何文档！

大小具有固定大小，即 32 位。这是否意味着它只能测试 32 个字符的唯一性？我已经测试过，这个函数可以测试“abcdefgZZ”是否为假，但“abcdefg@@”返回真。

谷歌把我带到了这里。 @Dejel 这是您可以使用的 java 数据结构：docs.oracle.com/javase/7/docs/api/java/util/BitSet.html。希望这可以帮助人们通过 intertubes。

@nattyddubbs，谢谢，我已经添加了这个和其他几个链接到答案

M

Miguel Durazo

我认为所有这些答案确实解释了它是如何工作的，但是我想通过重命名一些变量、添加一些其他变量并添加评论来更好地了解我的看法：

public static boolean isUniqueChars(String str) {

    /*
    checker is the bit array, it will have a 1 on the character index that
    has appeared before and a 0 if the character has not appeared, you
    can see this number initialized as 32 0 bits:
    00000000 00000000 00000000 00000000
     */
    int checker = 0;

    //loop through each String character
    for (int i = 0; i < str.length(); ++i) {
        /*
        a through z in ASCII are charactets numbered 97 through 122, 26 characters total
        with this, you get a number between 0 and 25 to represent each character index
        0 for 'a' and 25 for 'z'

        renamed 'val' as 'characterIndex' to be more descriptive
         */
        int characterIndex = str.charAt(i) - 'a'; //char 'a' would get 0 and char 'z' would get 26

        /*
        created a new variable to make things clearer 'singleBitOnPosition'

        It is used to calculate a number that represents the bit value of having that 
        character index as a 1 and the rest as a 0, this is achieved
        by getting the single digit 1 and shifting it to the left as many
        times as the character index requires
        e.g. character 'd'
        00000000 00000000 00000000 00000001
        Shift 3 spaces to the left (<<) because 'd' index is number 3
        1 shift: 00000000 00000000 00000000 00000010
        2 shift: 00000000 00000000 00000000 00000100
        3 shift: 00000000 00000000 00000000 00001000

        Therefore the number representing 'd' is
        00000000 00000000 00000000 00001000

         */
        int singleBitOnPosition = 1 << characterIndex;

        /*
        This peforms an AND between the checker, which is the bit array
        containing everything that has been found before and the number
        representing the bit that will be turned on for this particular
        character. e.g.
        if we have already seen 'a', 'b' and 'd', checker will have:
        checker = 00000000 00000000 00000000 00001011
        And if we see 'b' again:
        'b' = 00000000 00000000 00000000 00000010

        it will do the following:
        00000000 00000000 00000000 00001011
        & (AND)
        00000000 00000000 00000000 00000010
        -----------------------------------
        00000000 00000000 00000000 00000010

        Since this number is different than '0' it means that the character
        was seen before, because on that character index we already have a 
        1 bit value
         */
        if ((checker & singleBitOnPosition) > 0) {
            return false;
        }

        /* 
        Remember that 
        checker |= singleBitOnPosition is the same as  
        checker = checker | singleBitOnPosition
        Sometimes it is easier to see it expanded like that.

        What this achieves is that it builds the checker to have the new 
        value it hasnt seen, by doing an OR between checker and the value 
        representing this character index as a 1. e.g.
        If the character is 'f' and the checker has seen 'g' and 'a', the 
        following will happen

        'f' = 00000000 00000000 00000000 00100000
        checker(seen 'a' and 'g' so far) = 00000000 00000000 00000000 01000001

        00000000 00000000 00000000 00100000
        | (OR)
        00000000 00000000 00000000 01000001
        -----------------------------------
        00000000 00000000 00000000 01100001

        Therefore getting a new checker as 00000000 00000000 00000000 01100001

         */
        checker |= singleBitOnPosition;
    }
    return true;
}

很好的解释。谢谢你！

清楚的解释..谢谢

很好的解释。容易理解。谢谢

那个是最好的

这就是为什么要发明评论的原因。

A

Alex B

我还假设您的示例来自本书 Cracking The Code Interview，而我的回答与此上下文相关。

为了使用这个算法来解决这个问题，我们不得不承认我们只会将字符从 a 传递到 z（小写）。

由于只有 26 个字母，并且它们在我们使用的编码表中正确排序，这保证了所有潜在差异 str.charAt(i) - 'a' 将低于 32（int 变量 checker 的大小）。

正如 Snowbear 所解释的，我们即将使用 checker 变量作为位数组。让我们举个例子：

假设str equals "test"

第一遍 (i = t)

检查器 == 0 (00000000000000000000000000000000)

In ASCII, val = str.charAt(i) - 'a' = 116 - 97 = 19
What about 1 << val ?
1          == 00000000000000000000000000000001
1 << 19    == 00000000000010000000000000000000
checker |= (1 << val) means checker = checker | (1 << val)
so checker = 00000000000000000000000000000000 | 00000000000010000000000000000000
checker == 524288 (00000000000010000000000000000000)

第二遍（i = e）

检查器 == 524288 (00000000000010000000000000000000)

val = 101 - 97 = 4
1          == 00000000000000000000000000000001
1 << 4     == 00000000000000000000000000010000
checker |= (1 << val) 
so checker = 00000000000010000000000000000000 | 00000000000000000000000000010000
checker == 524304 (00000000000010000000000000010000)

依此类推..直到我们通过条件在检查器中找到特定字符的已设置位

(checker & (1 << val)) > 0

希望能帮助到你

比其他 IMO 更好的解释，但我仍然没有得到的一件事是 checker = 00000000000010000000000000000000 | 00000000000000000000000000010000 不是按位 |= OR 运算符。从那以后不会选择一个值或另一个值吗？为什么它使用和设置两个位？

@CodeCrack 你说它是按位或。它在位级别而不是位阵列级别进行比较。注意：int 是位数组

M

Matthew Hinea

阅读上面伊万的回答确实对我有所帮助，尽管我会有所不同。

(1 << val) 中的 << 是位移运算符。它需要 1（在二进制中表示为 000000001，前面有任意数量的零 / 由内存分配）并将其向左移动 val 个空格。由于我们只假设 az 并且每次减去 a，每个字母的值将是 0-25，这将是该字母在 checker 整数的布尔表示中从右边开始的索引，因为我们将移动 { 3} 向左 checker val 次。

在每次检查结束时，我们都会看到 |= 运算符。这将合并两个二进制数，如果在该索引的任一操作数中存在 1，则将所有 0 替换为 1。在这里，这意味着只要 (1 << val) 中存在 1，该 1 将被复制到 checker，而 checker 的所有现有 1 都将被保留。

正如您可能猜到的那样，1 在这里用作布尔标志，表示 true。当我们检查一个字符是否已经在字符串中表示时，我们比较 checker，此时它本质上是一个布尔标志（1 值）数组，位于已经表示的字符的索引处，与本质上是一个布尔值数组，在当前字符的索引处带有 1 标志。

& 运算符完成此检查。与 |= 类似，如果两个操作数在该索引处都有 1，则 & 运算符将复制 1 only。因此，本质上，只有 checker 中已经存在且在 (1 << val) 中表示的标志才会被复制。在这种情况下，这意味着只有当当前字符已经被表示时，才会在 checker & (1 << val) 的结果中的任何位置出现 1。如果 1 出现在该操作的结果中，则返回的布尔值是 > 0，并且该方法返回 false。

我猜这就是为什么位向量也称为位数组的原因。因为，即使它们不是数组数据类型，也可以像使用数组一样使用它们来存储布尔标志。

非常有帮助，感谢您提供的 Java 信息。

P

Prabhash Rathore

上面已经提供了几个很好的答案。所以我不想重复已经说过的一切。但是确实想添加一些东西来帮助上述程序，因为我刚刚完成了同一个程序并且有几个问题但是花了一些时间之后，我对这个程序有了更清晰的了解。

首先，“检查器”用于跟踪字符串中已经遍历的字符，以查看是否有任何字符重复。

现在“checker”是一个 int 数据类型，所以它只能有 32 位或 4 个字节（取决于平台），所以这个程序只能在 32 个字符范围内的字符集上正常工作。这就是原因，该程序从每个字符中减去“a”，以使该程序仅针对小写字符运行。但是，如果您混合使用小写和大写字符，则它将不起作用。

顺便说一句，如果您不从每个字符中减去“a”（请参见下面的语句），那么该程序将仅适用于带有大写字符的字符串或仅带有小写字符的字符串。所以上述程序的范围也从小写字符增加到大写字符，但它们不能混合在一起。

int val = str.charAt(i) - 'a';

但是我想使用按位运算编写一个通用程序，它应该适用于任何 ASCII 字符，而不必担心大写、小写、数字或任何特殊字符。为了做到这一点，我们的“检查器”应该足够大以存储 256 个字符（ASCII 字符集大小）。但是 Java 中的 int 不能工作，因为它只能存储 32 位。因此，在下面的程序中，我使用了 JDK 中可用的 BitSet 类，它可以在实例化 BitSet 对象时传递任何用户定义的大小。

这是一个与上面使用位运算符编写的程序执行相同操作的程序，但该程序适用于具有 ASCII 字符集中任何字符的字符串。

public static boolean isUniqueStringUsingBitVectorClass(String s) {

    final int ASCII_CHARACTER_SET_SIZE = 256;

    final BitSet tracker = new BitSet(ASCII_CHARACTER_SET_SIZE);

    // if more than  256 ASCII characters then there can't be unique characters
    if(s.length() > 256) {
        return false;
    }

    //this will be used to keep the location of each character in String
    final BitSet charBitLocation = new BitSet(ASCII_CHARACTER_SET_SIZE);

    for(int i = 0; i < s.length(); i++) {

        int charVal = s.charAt(i);
        charBitLocation.set(charVal); //set the char location in BitSet

        //check if tracker has already bit set with the bit present in charBitLocation
        if(tracker.intersects(charBitLocation)) {
            return false;
        }

        //set the tracker with new bit from charBitLocation
        tracker.or(charBitLocation);

        charBitLocation.clear(); //clear charBitLocation to store bit for character in the next iteration of the loop

    }

    return true;

}

我一直在寻找这个解决方案，但是不需要两个 BitSet 变量。只是跟踪器就足够了。更新循环代码：

for(int i = 0; i < s.length(); i++) { int charVal = s.charAt(i); if(tracker.get(charVal)) { return false; } tracker.set(charVal); }

C

Community

简单说明（下面有JS代码）

每个机器码的整数变量是一个 32 位数组

所有按位操作都是 32 位的

它们与 OS/CPU 架构或语言的所选数字系统无关，例如 JS 的 DEC64。

这种重复查找方法类似于将字符存储在大小为 32 的数组中，如果我们在字符串中找到 a，我们设置第 0 个索引，为 b 设置第 1 个等。

字符串中的重复字符将占用相应的位，或者在这种情况下设置为 1。

伊万已经解释过：这个指数计算是如何在这个先前的答案中工作的。

操作总结：

在字符的检查器和索引之间执行 AND 操作

内部都是 Int-32-Arrays

这是这两者之间的逐位运算。

检查操作的输出是否为 1

if output == 1 checker 变量在两个数组中都设置了特定的索引位因此它是重复的。

checker 变量在两个数组中都设置了特定的索引位

因此它是重复的。

if output == 0 这个字符目前还没有找到在 checker 和字符的 index 之间执行 OR 操作从而将 index-th 位更新为 1 将输出分配给 checker

暂时没有找到这个角色

在字符的检查器和索引之间执行 OR 操作

因此，将第 index 位更新为 1

将输出分配给检查器

假设：

我们假设我们会得到所有小写字符

而且，32号就够了

因此，考虑到 a 的 ascii 代码是 97，我们从 96 作为参考点开始计算索引

下面给出的是 JavaScript 源代码。

function checkIfUniqueChars (str) {

    var checker = 0; // 32 or 64 bit integer variable 

    for (var i = 0; i< str.length; i++) {
        var index = str[i].charCodeAt(0) - 96;
        var bitRepresentationOfIndex = 1 << index;

        if ( (checker & bitRepresentationOfIndex) > 1) {
            console.log(str, false);
            return false;
        } else {
            checker = (checker | bitRepresentationOfIndex);
        }
    }
    console.log(str, true);
    return true;
}

checkIfUniqueChars("abcdefghi");  // true
checkIfUniqueChars("aabcdefghi"); // false
checkIfUniqueChars("abbcdefghi"); // false
checkIfUniqueChars("abcdefghii"); // false
checkIfUniqueChars("abcdefghii"); // false

请注意，在 JS 中，尽管整数是 64 位的，但按位运算总是在 32 位上完成。

示例： 如果字符串为 aa，则：

// checker is intialized to 32-bit-Int(0)
// therefore, checker is
checker= 00000000000000000000000000000000

我 = 0

str[0] is 'a'
str[i].charCodeAt(0) - 96 = 1

checker 'AND' 32-bit-Int(1) = 00000000000000000000000000000000
Boolean(0) == false

// So, we go for the '`OR`' operation.

checker = checker OR 32-bit-Int(1)
checker = 00000000000000000000000000000001

我 = 1

str[1] is 'a'
str[i].charCodeAt(0) - 96 = 1

checker= 00000000000000000000000000000001
a      = 00000000000000000000000000000001

checker 'AND' 32-bit-Int(1) = 00000000000000000000000000000001
Boolean(1) == true
// We've our duplicate now

g

garg10may

让我们逐行分解代码。

整数检查器 = 0;我们正在启动一个检查器，它将帮助我们找到重复值。

int val = str.charAt(i) - 'a';我们在字符串的第 i 个位置获取字符的 ASCII 值，然后用 'a' 的 ASCII 值减去它。由于假设字符串仅为低位字符，因此字符数限制为 26。因此，'val' 的值将始终 >= 0。

if ((checker & (1 << val)) > 0) 返回 false；

检查器 |= (1 << val);

现在这是棘手的部分。让我们考虑一个带有字符串“abcda”的示例。理想情况下，这应该返回 false。

对于循环迭代 1：

检查器：000000000000000000000000000000000

值：97-97 = 0

1 << 0: 00000000000000000000000000000001

检查器 & (1 << val): 00000000000000000000000000000000 不是 > 0

因此检查器：000000000000000000000000000000001

对于循环迭代 2：

检查器：00000000000000000000000000000001

值：98-97 = 1

1 << 1: 00000000000000000000000000000010

检查器 & (1 << val): 00000000000000000000000000000000 不是 > 0

因此检查器：00000000000000000000000000000011

对于循环迭代 3：

检查器：00000000000000000000000000000011

值：99-97 = 2

1 << 2: 00000000000000000000000000000100

检查器 & (1 << val): 00000000000000000000000000000000 不是 > 0

因此检查器：00000000000000000000000000000111

对于循环迭代 4：

检查器：00000000000000000000000000000111

值：100-97 = 3

1 << 3: 00000000000000000000000000001000

检查器 & (1 << val): 00000000000000000000000000000000 不是 > 0

因此检查器：00000000000000000000000000001111

对于循环迭代 5：

检查器：000000000000000000000000000001111

值：97-97 = 0

1 << 0: 00000000000000000000000000000001

检查器 & (1 << val): 00000000000000000000000000000001 > 0

因此返回false。

val: 99-97 = 0 应该是 val: 99-97 = 2 和 val: 100-97 = 0 应该是 3

a

asdf1234

public static void main (String[] args)
{
    //In order to understand this algorithm, it is necessary to understand the following:

    //int checker = 0;
    //Here we are using the primitive int almost like an array of size 32 where the only values can be 1 or 0
    //Since in Java, we have 4 bytes per int, 8 bits per byte, we have a total of 4x8=32 bits to work with

    //int val = str.charAt(i) - 'a';
    //In order to understand what is going on here, we must realize that all characters have a numeric value
    for (int i = 0; i < 256; i++)
    {
        char val = (char)i;
        System.out.print(val);
    }

    //The output is something like:
    //             !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
    //There seems to be ~15 leading spaces that do not copy paste well, so I had to use real spaces instead

    //To only print the characters from 'a' on forward:
    System.out.println();
    System.out.println();

    for (int i=0; i < 256; i++)
    {
        char val = (char)i;
        //char val2 = val + 'a'; //incompatible types. required: char found: int
        int val2 = val + 'a';  //shift to the 'a', we must use an int here otherwise the compiler will complain
        char val3 = (char)val2;  //convert back to char. there should be a more elegant way of doing this.
        System.out.print(val3);
    }

    //Notice how the following does not work:
    System.out.println();
    System.out.println();

    for (int i=0; i < 256; i++)
    {
        char val = (char)i;
        int val2 = val - 'a';
        char val3 = (char)val2;
        System.out.print(val3);
    }
    //I'm not sure why this spills out into 2 lines:
    //EDIT I cant seem to copy this into stackoverflow!

    System.out.println();
    System.out.println();

    //So back to our original algorithm:
    //int val = str.charAt(i) - 'a';
    //We convert the i'th character of the String to a character, and shift it to the right, since adding shifts to the right and subtracting shifts to the left it seems

    //if ((checker & (1 << val)) > 0) return false;
    //This line is quite a mouthful, lets break it down:
    System.out.println(0<<0);
    //00000000000000000000000000000000
    System.out.println(0<<1);
    //00000000000000000000000000000000
    System.out.println(0<<2);
    //00000000000000000000000000000000
    System.out.println(0<<3);
    //00000000000000000000000000000000
    System.out.println(1<<0);
    //00000000000000000000000000000001
    System.out.println(1<<1);
    //00000000000000000000000000000010 == 2
    System.out.println(1<<2);
    //00000000000000000000000000000100 == 4
    System.out.println(1<<3);
    //00000000000000000000000000001000 == 8
    System.out.println(2<<0);
    //00000000000000000000000000000010 == 2
    System.out.println(2<<1);
    //00000000000000000000000000000100 == 4
    System.out.println(2<<2);
    // == 8
    System.out.println(2<<3);
    // == 16
    System.out.println("3<<0 == "+(3<<0));
    // != 4 why 3???
    System.out.println(3<<1);
    //00000000000000000000000000000011 == 3
    //shift left by 1
    //00000000000000000000000000000110 == 6
    System.out.println(3<<2);
    //00000000000000000000000000000011 == 3
    //shift left by 2
    //00000000000000000000000000001100 == 12
    System.out.println(3<<3);
    // 24

    //It seems that the -  'a' is not necessary
    //Back to if ((checker & (1 << val)) > 0) return false;
    //(1 << val means we simply shift 1 by the numeric representation of the current character
    //the bitwise & works as such:
    System.out.println();
    System.out.println();
    System.out.println(0&0);    //0
    System.out.println(0&1);       //0
    System.out.println(0&2);          //0
    System.out.println();
    System.out.println();
    System.out.println(1&0);    //0
    System.out.println(1&1);       //1
    System.out.println(1&2);          //0
    System.out.println(1&3);             //1
    System.out.println();
    System.out.println();
    System.out.println(2&0);    //0
    System.out.println(2&1);       //0   0010 & 0001 == 0000 = 0
    System.out.println(2&2);          //2  0010 & 0010 == 2
    System.out.println(2&3);             //2  0010 & 0011 = 0010 == 2
    System.out.println();
    System.out.println();
    System.out.println(3&0);    //0    0011 & 0000 == 0
    System.out.println(3&1);       //1  0011 & 0001 == 0001 == 1
    System.out.println(3&2);          //2  0011 & 0010 == 0010 == 2, 0&1 = 0 1&1 = 1
    System.out.println(3&3);             //3 why?? 3 == 0011 & 0011 == 3???
    System.out.println(9&11);   // should be... 1001 & 1011 == 1001 == 8+1 == 9?? yay!

    //so when we do (1 << val), we take 0001 and shift it by say, 97 for 'a', since any 'a' is also 97

    //why is it that the result of bitwise & is > 0 means its a dupe?
    //lets see..

    //0011 & 0011 is 0011 means its a dupe
    //0000 & 0011 is 0000 means no dupe
    //0010 & 0001 is 0011 means its no dupe
    //hmm
    //only when it is all 0000 means its no dupe

    //so moving on:
    //checker |= (1 << val)
    //the |= needs exploring:

    int x = 0;
    int y = 1;
    int z = 2;
    int a = 3;
    int b = 4;
    System.out.println("x|=1 "+(x|=1));  //1
    System.out.println(x|=1);     //1
    System.out.println(x|=1);      //1
    System.out.println(x|=1);       //1
    System.out.println(x|=1);       //1
    System.out.println(y|=1); // 0001 |= 0001 == ?? 1????
    System.out.println(y|=2); // ??? == 3 why??? 0001 |= 0010 == 3... hmm
    System.out.println(y);  //should be 3?? 
    System.out.println(y|=1); //already 3 so... 0011 |= 0001... maybe 0011 again? 3?
    System.out.println(y|=2); //0011 |= 0010..... hmm maybe.. 0011??? still 3? yup!
    System.out.println(y|=3); //0011 |= 0011, still 3
    System.out.println(y|=4);  //0011 |= 0100.. should be... 0111? so... 11? no its 7
    System.out.println(y|=5);  //so we're at 7 which is 0111, 0111 |= 0101 means 0111 still 7
    System.out.println(b|=9); //so 0100 |= 1001 is... seems like xor?? or just or i think, just or... so its 1101 so its 13? YAY!

    //so the |= is just a bitwise OR!
}

public static boolean isUniqueChars(String str) {
    int checker = 0;
    for (int i = 0; i < str.length(); ++i) {
        int val = str.charAt(i) - 'a';  //the - 'a' is just smoke and mirrors! not necessary!
        if ((checker & (1 << val)) > 0) return false;
        checker |= (1 << val);
    }
    return true;
}

public static boolean is_unique(String input)
{
    int using_int_as_32_flags = 0;
    for (int i=0; i < input.length(); i++)
    {
        int numeric_representation_of_char_at_i = input.charAt(i);
        int using_0001_and_shifting_it_by_the_numeric_representation = 1 << numeric_representation_of_char_at_i; //here we shift the bitwise representation of 1 by the numeric val of the character
        int result_of_bitwise_and = using_int_as_32_flags & using_0001_and_shifting_it_by_the_numeric_representation;
        boolean already_bit_flagged = result_of_bitwise_and > 0;              //needs clarification why is it that the result of bitwise & is > 0 means its a dupe?
        if (already_bit_flagged)
            return false;
        using_int_as_32_flags |= using_0001_and_shifting_it_by_the_numeric_representation;
    }
    return true;
}

B

Bachiri Taoufiq Abderrahman

以前的帖子很好地解释了代码块的作用，我想使用 BitSet java 数据结构添加我的简单解决方案：

private static String isUniqueCharsUsingBitSet(String string) {
  BitSet bitSet =new BitSet();
    for (int i = 0; i < string.length(); ++i) {
        int val = string.charAt(i);
        if(bitSet.get(val)) return "NO";
        bitSet.set(val);
    }
  return "YES";
}

a

abdul rashid

Line 1:   public static boolean isUniqueChars(String str) {
Line 2:      int checker = 0;
Line 3:      for (int i = 0; i < str.length(); ++i) {
Line 4:          int val = str.charAt(i) - 'a';
Line 5:          if ((checker & (1 << val)) > 0) return false;
Line 6:         checker |= (1 << val);
Line 7:      }
Line 8:      return true;
Line 9:   }

我理解使用 Javascript 的方式。假设输入 var inputChar = "abca"; //find if inputChar has all unique characters

开始吧

Line 4: int val = str.charAt(i) - 'a';

上面的行找到 inputChar 中第一个字符的二进制值是 a，ascii 中的 a = 97，然后将 97 转换为二进制变为 1100001。

在 Javascript 例如："a".charCodeAt().toString(2) 返回 1100001

checker = 0 // 二进制 32 位表示 = 0000000000000000000000000

checker = 1100001 | checker; //checker 变为 1100001（在 32 位表示中，它变为 000000000.....00001100001）

但我希望我的位掩码（int checker）只设置一位，但检查器是 1100001

Line 4:          int val = str.charAt(i) - 'a';

现在上面的代码就派上用场了。我总是减去 97（a 的 ASCII 值）

val = 0; // 97 - 97  Which is  a - a
val = 1; // 98 - 97 Which is b - a
val = 1;  // 99 - 97 Which is c - a

让我们使用已重置的 val

第 5 行和第 6 行很好解释 @Ivan 答案

i

ik024

以防万一有人正在使用位向量在字符串中寻找与唯一字符等效的 kotlin

fun isUnique(str: String): Boolean {
    var checker = 0
    for (i in str.indices) {
        val bit = str.get(i) - 'a'
        if (checker.and(1 shl bit) > 0) return false
        checker = checker.or(1 shl bit)
    }
    return true
}

参考：https://www.programiz.com/kotlin-programming/bitwise

解释使用位向量来确定是否所有字符都是唯一的

关注公众号

想领先一步获取最新的外包任务吗？

相似问题

平台

支持

联系我们