×

Search anything:

VPOPCNT: Number of set bits

Binary Tree book by OpenGenus

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

In this article at OpenGenus, you must have the complete idea of VPOPCNT assembly instruction that is used to get the number of set bits.

Table of contents:

  1. VPOPCNT
  2. Assembly code with VPOPCNT
  3. C++ Implementation using VPOPCNT with intrinsic

VPOPCNT

VPOPCNT is a vectorized assembly instruction to count the number of set bits in a given register/ data. A data consist of 0 and 1. A set bit refer to 1.

It is supported in AVX512 and AVX256 as well. If you have a system that supports AVX512, it means VPOPCNT can be used to get the number of set bits in 512 bits using one instruction.

VPOPCNT is executed in 3 clock cycles.

Assembly code with VPOPCNT

The code in this section has a data in eax register. It uses vpopcnt to find the number of set bits and stores the result in eax (same register).

Following is the assembly code with VPOPCNT instruction:

section .text
global _start

_start:
  ; input integer in eax
  ; use vpopcnt instruction to count number of set bits
  vpopcnt eax, eax 
  ; result in eax

  ; exit program
  mov eax, 1
  xor ebx, ebx
  int 0x80

C++ Implementation using VPOPCNT with intrinsic

It use 2 main intrinsics:

  • _mm256_popcnt_epi32 intrinsic to count number of set bits in a data of 256 bits
  • _mm256_add_epi32 to add the number of set bits from _mm256_popcnt_epi32 across all data

Following is the complete C++ code using VPOPCNT as an intrinsic:

#include <immintrin.h>

// Compute the number of set bits in the given array of integers
int popcnt(const uint32_t* data, size_t n) {
  size_t i = 0;
  __m256i sum = _mm256_setzero_si256();
  // Process the data in chunks of 8 integers
  for (; i + 8 <= n; i += 8) {
    __m256i chunk = _mm256_loadu_si256(reinterpret_cast<const __m256i*>(data + i));
    sum = _mm256_add_epi32(sum, _mm256_popcnt_epi32(chunk));
  }
  // Process any remaining integers
  uint32_t remaining[8] = {0};
  for (; i < n; ++i) {
    remaining[i % 8] = data[i];
  }
  __m256i chunk = _mm256_loadu_si256(reinterpret_cast<const __m256i*>(remaining));
  sum = _mm256_add_epi32(sum, _mm256_popcnt_epi32(chunk));
  // Compute the total popcount by summing the individual counts
  uint32_t count[8];
  _mm256_storeu_si256(reinterpret_cast<__m256i*>(count), sum);
  return count[0] + count[1] + count[2] + count[3] + count[4] + count[5] + count[6] + count[7];
}

With this article at OpenGenus, you must have the complete idea of VPOPCNT vectorized instruction.

VPOPCNT: Number of set bits
Share this