link#

Why programmers group bits in 4
Real kernel example(13번 참고하면됨.
- align up vs align down vs check aligned
15 Legendary Bit Hacks
(외부링크) Bit Twiddling Hacks
Boolean Logic
- https://introcs.cs.princeton.edu/java/71boolean/
Operator Truth Table 기초
- NOT
- AND
- OR
- XOR
만들면서 익히기
- Base64기초
- SWAR popcount algorithm (the famous 5-mask method)
  - Why this is called SWAR
  - Why this was famous(SWAR)

2진수 이쁘게 출력하기|🔝|#

fn format_bin8(x: u8) -> String {
    let s = format!("{:08b}", x);
    format!("0b{} {}", &s[0..4], &s[4..8])
}

fn main() {
    let x: u8 = 4;
    println!("{}", format_bin8(x));
}

result

0b0000 0100

Boolean Logic|🔝|#

https://introcs.cs.princeton.edu/java/71boolean/

	logic	java boolean	java bitwise	circuit design
NOT	$\neg x$	`!x`	`~x`	x’
AND	$x\land y$	`x && y`	`x & y`	xy
OR	$x\lor y$	`x \|\| y`	`x \| y`	x+y
XOR	$x \oplus y$	`x ^ y`	`x ^ y`	$x \oplus y$

NOT operator truth table|🔝|#

x	x’ $\neg x$ `~x` `!x`
0	1
1	0

AND operator truth table|🔝|#

x	y	x AND y xy $x\land y$ `x && y`
0	0	0
0	1	0
1	0	0
1	1	1

OR operator truth table|🔝|#

x	y	x OR y x+y $x\lor y$ `x \|\| y`
0	0	0
0	1	1
1	0	1
1	1	1

XOR operator truth table|🔝|#

x	y	x XOR y $x \oplus y$ `x ^ y`
0	0	0
0	1	1
1	0	1
1	1	0

Real kernel example(13번 참고하면됨. | 13. Align to power-of-two boundary)|🔝|#

13. Align to power-of-two boundary

Linux uses similar macros:

// macro_c_lang.c
#define ALIGN(x, a) (((x) + (a) - 1) & ~((a) - 1))

Example:

ALIGN(13,8) = 16
ALIGN(16,8) = 16
ALIGN(17,8) = 24

Rust version used in allocators

// main.rs
fn align_up(x: usize, align: usize) -> usize {
    debug_assert!(align.is_power_of_two());
    (x + align - 1) & !(align - 1)
}

The debug_assert! ensures correctness.

Purpose	Expression
align up	`(x + a - 1) & !(a - 1)`
align down	`x & !(a - 1)`
check aligned	`x & (a - 1) == 0`

15 Legendary Bit Hacks|🔝|#

#	Expression	Purpose	쓰이는 곳
1	`x & (x - 1)`	clear lowest set bit	Used for bit counting loops.
2	`x & -x`	extract lowest set bit
3	`(x & (x - 1)) == 0`	check power of two	kernel page sizes memory alignment ring buffers
4	`x \| (x - 1)`	set bits below lowest set bit
5	`x & (x + 1)`	clear trailing ones
6	`~x & (x + 1)`	isolate lowest zero bit
7	`x ^ (x >> 1)`	binary → Gray code	rotary encoders hardware counters error correction
8	`x ^ (x >> 1) ^ (x >> 2) ...`	Gray → binary	Used in hardware decoding.
9	`x >> n & 1`	test nth bit	CPU flags protocol parsers
10	`x ^ (1 << n)`	toggle bit
11	`x & ~(1 << n)`	clear bit
12	`x \| (1 << n)`	set bit
13	`(x + (1 << n)) & ~((1 << n) - 1)`	align to power-of-two boundary	memory allocators page alignment kernel slabs
14	`(x + y) ^ ((x ^ y) & -(x < y))`	branchless min/max	Advantages: avoids branch misprediction useful in SIMD
15	`x -= ((x >> 1) & 0x55555555)`	fast bit count (popcount step)	Part of the SWAR popcount algorithm. Counts bits in parallel inside a register.

Why programmers group bits in 4|🔝|#

Because 1 hex digit = 4 bits
Example:

0000 0100  = 0x04 = 4(decimal)
0010 1100  = 0x2C = 44(decimal)
1111 1111  = 0xFF = 255(decimal)

So this format makes debugging:
- CPU registers
- kernel bitmasks
- network packets
- SIMD values
much easier.

1. Clear lowest set bit|🔝|#

x & (x - 1)

Example

x      = 101100    # 0x2C    44(10dec)
x - 1  = 101011    # 0x2B    43(10dec)
result = 101000    # 0x14    20(10dec)

Used for bit counting loops.

2. Extract lowest set bit|🔝|#

x & -x

Example

101100
000100

Used in
- Fenwick trees
- memory allocators
- bitboards
Rust연습

// main.rs

fn main() {
    let x = 44;        // 0010 1100    // 0x2C
                      //  1101 0100  -x = -44 
    let res = x & -x;
    let res02 = x & !x;
    let res03 = x | !x;
    let minus_x = -x;  // 1101 0100    // -44(10dec)
    let nagative_x = !x;
    println!("res : {res}"); // 0000 0100
    println!("res02 : {res02}");
    println!("res03 : {res03}");
    println!("minus_x : {minus_x}");
    println!("minus_x : {:08b}", minus_x as u8);
    println!("nagative_x : {nagative_x}");
    println!("4(dec) : 0b{:08b}", res as u8); // 0000 0100
    println!("-x(44dec) binary : 0b{:08b}", minus_x as u8);
}

result

res : 4
res02 : 0
res03 : -1
minus_x : -44
minus_x : 11010100
nagative_x : -45
4(dec) : 0b00000100
-x(44dec) binary : 0b11010100

3. Check power of two|🔝|#

(x & (x - 1)) == 0

Example

Used everywhere:
- kernel page sizes
- memory alignment
- ring buffers
rust코드로 연습

// main.rs

fn main() {
    let x = 47; // 0010 1111
    //&(x-1) dec 46  // 0010 1110
    //       dec 46  // 0010 1110
    let res = (x & (x - 1)) == 0;
    println!("{res}");
}

result

false

4. Set bits below lowest set bit|🔝|#

x | (x - 1)

Example

101100
→
101111

Rust code로 연습

// main.rs

fn main() {
    let x = 44; // 0010 1100
// (x-1) dec 43 // 0010 1011
//              // 0010 1111
    let res = x | (x - 1);
    println!("{res}");
}

result

47   // 0010 1111

5. Clear trailing ones |🔝|#

x & (x + 1)

Example

101111
→
100000

rust로 연습

// main.rs

fn main() {
    let x = 47; //      0010 1111
    //&(x+1) dec 48  // 0011 0000
    //       dec 32  // 0010 0000
    let res = x & (x + 1);
    println!("{res}");
}

result

32 # 0010 0000

6. Find lowest zero bit |🔝|#

//main.c
~x & (x + 1)

Useful for bit allocation algorithms.
rust코드로 연습

// main.rs

fn main() {
    let x = 47; // 0010 1111
    // !x            // 1101 0000
    //&(x+1) dec 48  // 0011 0000
    //       dec 16  // 0001 0000
    let res = !x & (x + 1);
    println!("{res}");
}

result

16  # 0001 0000

7. Binary → Gray code |🔝|#

// main.c
gray = x ^ (x >> 1)

Example

binary:  1011
gray:    1110

rust code로 연습

// main.rs

fn bin4(x: u8) -> String {
    format!("{:04b}", x)
}

fn main() {
    let binary = 11; // 1011
    //        x >> 1 // 0101
    //   x ^ (x >>1) // 1110
    let gray = binary ^ (binary >> 1);

    println!("x    : {}", bin4(binary));
    println!("gray : {}", bin4(gray));
}

result

binary:  1011
gray:    1110

쓰이는곳#

Property:
Only one bit changes between consecutive numbers, which is why Gray code is used in:
- rotary encoders
- Karnaugh maps
- digital circuits
- error-resistant counters

8. Gray → Binary|🔝|#

// main.c
b = g
b ^= b >> 1
b ^= b >> 2
b ^= b >> 4
b ^= b >> 8
b ^= b >> 16

Used in hardware decoding.

Rust Code로 연습

// main.rs

fn gray_to_binary(mut b: u32) -> u32 {
    b ^= b >> 1;
    b ^= b >> 2;
    b ^= b >> 4;
    b ^= b >> 8;
    b ^= b >> 16;
    b
}

fn main() {
    let gray: u32 = 0b1110; // Gray code
    let binary = gray_to_binary(gray);

    println!("gray   : {:04b}", gray);
    println!("binary : {:04b}", binary);
}

result

gray   : 1110
binary : 1011

똑같은 기능(이게 더 눈에 들어온다.ㅋ)

// main.rs

fn gray_to_binary(mut x: u32) -> u32 {
    let mut shift = 1;
    while shift < 32 {
        x ^= x >> shift;
        shift <<= 1;
    }
    x
}

9. Test bit |🔝|#

// main.c
(x >> n) & 1

Example

x = 101101
n = 2
→ 1

Used in:
- CPU flags
- protocol parsers
Rust code로 연습

// main.rs

fn main() {
    let n = 2; //  0000 0010
    let x = 45; // 0010 1101
    // 1             // 0000 0001
    //&(x >> 2)dec11 // 0000 1011
    //       dec 1   // 0000 0001
    let res = (x >> n) & 1;
    println!("{res}");
}

result

10. Toggle bit|🔝|#

// main.c
x ^ (1 << n)

101100
toggle bit2
→ 101000

// main.rs

fn main() {
    let n = 2; //  0000 0010
    let x = 44; // 0010 1100
//   let x: u8 = 0b101100; 44dec
    // 1             // 0000 0001
    // XOR ^
    //^(1 >> 2)dec11 // 0000 0100
    //       dec 1   // 0000 0001
    // res      40   // 0010 1000
    let res = x ^ (1 << n);
    println!("{res}");
}

result

11. Clear bit|🔝|#

// main.c
x & ~(1 << n)

Rust Code로 연습

// main.rs

fn main() {
    let negative_x = !(1 << 2);
    let n = 2; //   0000 0010
    let x = 45; //  0010 1101
    //  1            //  0000 0001
    // (1 << 2) 4dec //  0000 0100
    //\!(1 << 2)-5dec//  1111 1101
    //&!(1 << 2)     //  0000 1011
    //  res  41 dec  //  0010 1001
    let res = x & !(1 << n);
    println!("{negative_x}");
    println!("negative x binariy : {:8b}", negative_x as u8);
    println!("{res}");
}

result

-5
negative x binari : 11111011
41

12. Set bit |🔝|#

// main.c
x | (1 << n)

Example

101000
set bit2
→ 101100

// main.rs

fn main() {
    //         1dec //   0000 0001
    let n = 2; //   0000 0010
    let x = 45; //  0010 1101
    // (1 << 2) 4dec //  0000 0100
    //| (1 << 2)     //
    //  res  45 dec  //  0010 1101
    let res = x | (1 << n);
    println!("{res}");
}

result

13. Align to power-of-two boundary|🔝|#

// main.c 
(x + (a - 1)) & ~(a - 1)

is a very famous systems-programming trick used to align a number x up to the nearest multiple of a, where a is typically a power of two.
Example align to 16

x = 37
→ 48

Used in:
- memory allocators
- page alignment
- kernel slabs
This appears in:
- OS kernels
- memory allocators
- page alignment
- SIMD buffers
Rust Code로 연습하기

// (a - 1) // 7  , a = 8
// 8   0000 1000
// 7   0000 0111
// 13  0000 1101
// +   0001 0100  (20dec)

//                       0001 0100  (20dec)
// !(a - 1) // 248u8   0b1111 1000
//                       0001 0000  (16dec)  align 16

fn align_up(x: usize, a: usize) -> usize {
    (x + (a - 1)) & !(a - 1)
}

fn main() {
    let x = 13;
    let a = 8;

    let aligned = align_up(x, a);

    let negative_a_minus = !(a - 1);

    println!("x       : {}", x);
    println!("aligned : {}", aligned);

    println!(" ~~~~");
    println!("negative!(8 -1) : {}", negative_a_minus as u8);
    println!("negative!(8 -1) : {:08b}", negative_a_minus as u8);
}

result

x       : 13
aligned : 16
 ~~~~
negative!(8 -1) : 248
negative!(8 -1) : 11111000

14. Branchless min/max|🔝|#

// main.c

min = y ^ ((x ^ y) & -(x < y))

Advantages:
- avoids branch misprediction
- useful in SIMD
Rust code로 연습하기
MIN

fn branchless_min(x: i32, y: i32) -> i32 {
    y ^ ((x ^ y) & -((x < y) as i32))
}

fn branchless_max(x: i32, y: i32) -> i32 {
    x ^ ((x ^ y) & -((x < y) as i32))
}

MIN#

// 10   // 0000 1010
// 6    // 0000 0110
// x ^ y // 0000 1100    // 10 dec

fn branchless_min(x: i32, y: i32) -> i32 {
    y ^ ((x ^ y) & -((x < y) as i32))
}

fn main() {
    let x = 10;
    let y = 6;

    let min = branchless_min(x, y);

    println!("x   : {}", x);
    println!("y   : {}", y);
    println!("min : {}", min);

    let test_eval = x < y as i32;
    println!("10 < 6 as i32 : {}", test_eval);
    println!("-(10 < 6 as i32) : {}", -(test_eval as i32));
    // println!("10 < 6 as i32 : {:08b}", test_eval);
}

result(MIN)

x   : 10
y   : 6
min : 6
10 < 6 as i32 : false
-(10 < 6 as i32) : 0

MAX#

// 10   // 0000 1010
// 6    // 0000 0110
// x ^ y // 0000 1100    // 10 dec

fn branchless_max(x: i32, y: i32) -> i32 {
    x ^ ((x ^ y) & -((x < y) as i32))
}

fn main() {
    let x = 10;
    let y = 6;

    let max = branchless_max(x, y);

    println!("x   : {}", x);
    println!("y   : {}", y);
    println!("max : {}", max);

    let test_eval = x < y as i32;
    println!("10 < 6 as i32 : {}", test_eval);
    println!("-(10 < 6 as i32) : {}", -(test_eval as i32));
    // println!("10 < 6 as i32 : {:08b}", test_eval);
}

- result(MAX)

```bash
x   : 10
y   : 6
max : 10
10 < 6 as i32 : false
-(10 < 6 as i32) : 0

Why this trick exists#

This appears in:
- CPU micro-optimizations
- vectorized math libraries
- GPU shaders
- compilers
- cryptography
because branches can cause pipeline stalls.
Modern compilers often convert

// max
x.max(y);

// min
x.min(y);

into branchless instructions automatically.

15. Fast popcount step |🔝|#

// main.c

x -= (x >> 1) & 0x55555555

is the first step of the SWAR popcount algorithm
Counts bits in parallel inside a register.
- (SIMD Within A Register) used to count bits in parallel.

What 0x55555555 means#

0x55555555
= 01010101 01010101 01010101 01010101

It selects alternating bits.

// 214_dec                                    1101 0110
// 0x5555_5555  (1,431,655,765_dec)
// 0x5555_5555  0101 0101 0101 0101 0101 0101 0101 0101
// (x >> 1)                                   0110 1011
// &         65_dec                           0100 0001
//
// result (149_dec)
//              0000_0000_0000_0000_0000_0000_1001_0101
fn main() {
    let mut x: u32 = 0b1101_0110;

    x -= (x >> 1) & 0x5555_5555;

    println!("result: {:032b}", x);
}

result

result: 0000 0000 0000 0000 0000 0000 1001 0101

Base64기초|🔝|#

Rust Code로 해보기

fn main() {
    let input = "test";
    let bytes = input.as_bytes();

    let i: usize = 0;

    let b0 = bytes[i];
    let b1 = if i + 1 < bytes.len() { bytes[i + 1] } else { 0 };
    let b2 = if i + 2 < bytes.len() { bytes[i + 2] } else { 0 };

    // Combine into 24 bits
    let triple: u32 =
        ((b0 as u32) << 16) |
        ((b1 as u32) << 8)  |
        (b2 as u32);

    println!("{input}");
    println!("{b0}");
    println!("{b1}");
    println!("{triple}");
    println!("{:b}", triple);
}

Why input[i] fails
- Rust strings are UTF-8, not byte arrays.
So you must do:

input.as_bytes()[i]

You are building a 24-bit integer from 3 bytes using bit operations.
The operator | is bitwise OR,
- and << is left shift.
Let’s go step-by-step with your example:

input = "test"
bytes = [116, 101, 115, 116]

ASCII values:

t = 116 = 0b0111 0100
e = 101 = 0b0110 0101
s = 115 = 0b0111 0011

b0 = 116
b1 = 101
b2 = 115

Step 1 — Shift b0 left 16 bits

(b0 as u32) << 16

01110100 << 16

= 01110100 00000000 00000000

Binary (24 bits):

01110100 00000000 00000000

Decimal:

116 << 16 = 7602176

Step 2 — Shift b1 left 8 bits

(b1 as u32) << 8

01100101 << 8

= 00000000 01100101 00000000

Binary:

00000000 01100101 00000000

Decimal:

101 << 8 = 25856

Step 3 — b2 stays in lowest 8 bits

(b2 as u32)

00000000 00000000 01110011

Step 4 — Combine using | (bitwise OR)

((b0 << 16) | (b1 << 8) | b2)

Bitwise OR combines bits:

01110100 00000000 00000000
00000000 01100101 00000000
00000000 00000000 01110011
--------------------------------
01110100 01100101 01110011

Final result:

0111 0100 0110 0101 0111 0011

Decimal:

7602176
+ 25856
+   115
---------
7628147

Step 5 — Why this is done
This packs 3 bytes into one 24-bit number.
Layout:

[ b0 ][ b1 ][ b2 ]
 8bit 8bit 8bit

b0 -> high byte
b1 -> middle byte
b2 -> low byte

This is used in:
- Base64 encoding
- Network protocols
- Image formats
- Compression
- Crypto
- Binary parsers

Step 6 — Why | works
Bitwise OR:

0 | 0 = 0
1 | 0 = 1
0 | 1 = 1
1 | 1 = 1

Since each byte is shifted to a different region, OR just merges them.

AAAA0000
0000BBBB
000000CC
---------
AAAABBBBCC

Step 7 — This is exactly Base64 step
Base64 takes:

3 bytes → 24 bits → split into 4 × 6 bits

You just finished the first step of Base64 encoder.

SWAR popcount algorithm (the famous 5-mask method)|🔝|#

one of the coolest tricks in low-level programming.
The SWAR popcount (5-mask method) counts bits in parallel inside one register.
- It is called SIMD Within A Register because each step sums bits in groups:

1-bit → 2-bit → 4-bit → 8-bit → 16-bit → 32-bit

This is the famous algorithm used in old compilers, kernels, and graphics code before POPCNT.

Full SWAR popcount code (reference)#

// main.c

x -= (x >> 1) & 0x55555555;
x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
x = (x + (x >> 4)) & 0x0F0F0F0F;
x += x >> 8;
x += x >> 16;
x &= 0x3F;

rust

// main.rs

fn popcount(mut x: u32) -> u32 {
    x -= (x >> 1) & 0x5555_5555;
    x = (x & 0x3333_3333) + ((x >> 2) & 0x3333_3333);
    x = (x + (x >> 4)) & 0x0F0F_0F0F;
    x += x >> 8;
    x += x >> 16;
    x & 0x3F
}

Example value#

x = 11010110  (5 bits set)

We will track how the register changes.

Step 1 — count per 2 bits#

mask = 0x55555555
= 01010101 01010101 01010101 01010101

x        11010110
x>>1     01101011
mask     01010101
&        01000001
----------------
result   10010101

Interpret as pairs:

10 01 01 01

Each pair = number of 1s in original pair.

2  1  1  1

Step 2 — count per 4 bits#

mask = 0x33333333
= 00110011 00110011

x = 10010101

x & mask         00010001
(x>>2) & mask    00100001
-------------------------
sum              00110010

Now per 4 bits:

0011 0010

Meaning:

3   2

Step 3 — count per 8 bits#

mask = 0x0F0F0F0F
= 00001111 00001111

x = 00110010

x + (x>>4)

00110010
00000011
---------
00110101

& mask

00000101

Now:

00000101

Meaning:

We already have the popcount in 8 bits.

Step 4 — sum 8-bit groups#

x += x >> 8

For 32-bit numbers this merges bytes.
Visual:

00000000 00000000 00000000 00000101

No change here because small number.

Step 5 — sum 16-bit groups#

x += x >> 16

Still:

00000101

Final mask#

x & 0x3F

Why?
- Max popcount of 32-bit = 32

32 = 100000

Needs 6 bits → mask 0x3F

00111111

Final result#

11010110 → 5

Why this is called SWAR|🔝|#

Because operations happen in parallel:

Step	Group size
1	2 bits
2	4 bits
3	8 bits
4	16 bits
5	32 bits

This is like SIMD but inside one register.

Why this was famous(SWAR)|🔝|#

Used in:
- old GCC / Clang
- Linux kernel
- Quake engine
- graphics pipelines
- cryptography
- chess engines
- Before CPUs had:

POPCNT

instruction.

Modern Rust#

Today you just write:

x.count_ones()

But internally the compiler may still use tricks like this on old CPUs.

link#

2진수 이쁘게 출력하기|🔝|#

Boolean Logic|🔝|#

NOT operator truth table|🔝|#

AND operator truth table|🔝|#

OR operator truth table|🔝|#

XOR operator truth table|🔝|#

Real kernel example(13번 참고하면됨. | 13. Align to power-of-two boundary)|🔝|#

Related famous alignment tricks#

15 Legendary Bit Hacks|🔝|#

Why programmers group bits in 4|🔝|#

1. Clear lowest set bit|🔝|#

2. Extract lowest set bit|🔝|#

3. Check power of two|🔝|#

4. Set bits below lowest set bit|🔝|#

5. Clear trailing ones |🔝|#

6. Find lowest zero bit |🔝|#

7. Binary → Gray code |🔝|#

쓰이는곳#

8. Gray → Binary|🔝|#

9. Test bit |🔝|#

10. Toggle bit|🔝|#

11. Clear bit|🔝|#

12. Set bit |🔝|#

13. Align to power-of-two boundary|🔝|#

14. Branchless min/max|🔝|#

MIN#

MAX#

Why this trick exists#

15. Fast popcount step |🔝|#

What 0x55555555 means#

Base64기초|🔝|#

SWAR popcount algorithm (the famous 5-mask method)|🔝|#

Full SWAR popcount code (reference)#

Example value#

Step 1 — count per 2 bits#

Step 2 — count per 4 bits#

Step 3 — count per 8 bits#

Step 4 — sum 8-bit groups#

Step 5 — sum 16-bit groups#

Final mask#

Final result#

Why this is called SWAR|🔝|#

Why this was famous(SWAR)|🔝|#

Modern Rust#