Categories
Tags
algorithms APIT arm assembly asynchronous base64 Blogging box c clang-format cmake compiler concurrency const_fn contravariant cos covariant cpp Customization cybersecurity DataStructure db Demo deserialization discrete doc DP Dynamic Example FFI flat_map FP Functional functions futures Fuwari GATs gccrs generics gitignore GUI hacking hashmap haskell heap interop invariant iterator justfile kernel LaTeX LFU linux MachineLearning Markdown math ML OnceLock optimization OS parallels perf physics pin postgresql release RPIT rust science Science serialization shift sin SmallProjects std String surrealdb swisstable synchronous tan traits triangulation utf16 utf8 Video x86_64 xilem zig
2389 words
12 minutes
260130Deep_Dive_into_HashMap002_Try_making
link
나만의 HashMap만들면서(Assembly code추가) 익히기 (어렵게hard) Hash부분을 assembly code로 바꿈 ㅋ
[
cargo bench를 활용해서 성능을 객관적으로 판단해 보자]MRU & LRU 개념 이해하기
std::collections::HashMap을 활용해서 나만의 HashMap을 만들어 보자|🔝|
- Below is a very detailed, line-by-line explanation of your
std::collections::HashMap-based LRU cache, including what each line does, why it exists, and how it participates in LRU semantics. - I will not change the code, only explain and comment it conceptually.
- 아래는 각 줄이 무엇을 하는지, 왜 존재하는지, 그리고 LRU 의미론에 어떻게 참여하는지를 포함하여, 당신의
std::collections::HashMap기반 LRU 캐시에 대한 매우 상세한 줄별 설명입니다. - 코드를 바꾸지 않고 개념적으로만 설명하고 댓글을 달겠습니다.
1️⃣ Imports and constants|🔝|
use std::collections::HashMap;
use std::hash::Hash;HashMap- → Used to map keys → entry index in O(1) average time.
Hash- → Required trait so keys can be hashed by HashMap.
‘해시맵’
- → O(1) 평균 시간 내에 키 → 입력 인덱스를 매핑하는 데 사용됩니다.
‘해시’
→ 해시맵으로 키를 해시할 수 있도록 요구되는 특성.
pub const CACHE_SIZE: usize = 128;- A default capacity constant
- Not used directly inside the struct
- Intended as a convenient default for users
- 기본 용량 상수
- 구조물 내부에서 직접 사용하지 않음
- 사용자에게 편리한 기본 설정으로 제공됩니다
2️⃣ Entry structure (node in the LRU list)|🔝|
struct Entry<K, V> {- Defines one cache node, storing both:
- the key/value
- links for the LRU doubly-linked list
- 하나의 캐시 노드를 정의하고 둘 다 저장합니다:
- 핵심/가치
- LRU 이중 링크 목록 링크
key: K,- Stores the key inside the entry
- Required so we can:
- remove the key from the HashMap when evicting
- avoid reverse lookups
- 항목 내부에 키를 저장합니다
- 우리가 할 수 있도록 필요합니다:
- 퇴거할 때 해시맵에서 키 제거
- 거꾸로 보기 피하기
val: Option<V>,- Stores the cached value
- Wrapped in Option so we can:
- invalidate entries (None) without removing the slot
- reuse indices safely
- 캐시된 값을 저장합니다
- 옵션으로 감싸서 할 수 있습니다:
- 슬롯을 제거하지 않고 항목 무효화(없음)
- 인덱스를 안전하게 재사용
prev: Option<usize>,
next: Option<usize>,- Indices into entries:
Vec<Entry<K,V>> - This forms a doubly-linked list
- usize instead of pointers → safe and cache-friendly
- 항목에 인덱스:
Vec<Entry<K,V>> - 이것은 이중 연결된 목록을 형성합니다
- 포인터 대신 사용하기 → 안전하고 캐시 친화적
3️⃣ LRUCache structure
pub struct LRUCache<K, V> {- Defines the main LRU cache container.
- 주요 LRU 캐시 컨테이너를 정의합니다.
capacity: usize,- Maximum number of live cache entries
- When exceeded → evict least-recently-used entry
- 최대 라이브 캐시 항목 수
- 초과 시 → 최소 recently 사용 항목 퇴거
head: Option<usize>, // MRU(Most Recently Used)- Index of most recently used entry
- Front of the linked list
- 가장 최근에 사용된 항목의 색인
- 링크된 목록의 앞쪽
tail: Option<usize>, // LRU(Least-Recently-Used)- Index of least recently used entry
- Back of the linked list
- Evicted when capacity is exceeded
- 최근에 사용된 항목의 색인
- 연결된 목록의 뒷면
- 용량 초과 시 퇴거
map: HashMap<K, usize>,- Maps
key → index into entries - Enables O(1) lookup
- HashMap does not store values, only indices
entries: Vec<Entry<K, V>>,Stores all entries contiguously
Acts as:
- node storage
- indexable linked list memory
Never shrinks
4️⃣ Trait bounds
impl<K, V> LRUCache<K, V>
where
K: Eq + Hash + Clone,Eq→ key comparisonHash→ usable byHashMapClone→ needed during eviction (copy key out)
5️⃣ Constructor
Self {
capacity,Creates an empty LRU cache.
Store max capacity
head: None,
tail: None,- Empty linked list
- No MRU or LRU yet
map: HashMap::new(),- Empty key → index mapping
entries: Vec::with_capacity(capacity),- Pre-allocate space
- Avoid reallocations for first
capacityinserts
6️⃣ Internal list helpers
🔹 detach — remove node from LRU list
fn detach(&mut self, i: usize) {- Removes entry
ifrom the linked list without deleting it.
let (prev, next) = {
let e = &self.entries[i];
(e.prev, e.next)
};- Extract neighbors
- Done in a block to avoid borrow conflicts
if let Some(p) = prev {
self.entries[p].next = next;If
ihad a previous nodeLink previous → next
} else {
// i was head
self.head = next;
}If no prev,
iwas MRUUpdate
head
if let Some(n) = next {
self.entries[n].prev = prev;- If
ihad a next node - Link next → previous
} else {
// i was tail
self.tail = prev;
}- If no
next,iwas LRU - Update
tail
self.entries[i].prev = None;
self.entries[i].next = None;- Fully detach node
- Prevent dangling links
- 완전 분리 노드
- 매달린 링크 방지
🔹 push_front — move node to MRU
fn push_front(&mut self, i: usize) {- Adds entry
ito the front (MRU position).
self.entries[i].prev = None;
self.entries[i].next = self.head;- New head has no previous
- Points to old head
if let Some(old_head) = self.head {
self.entries[old_head].prev = Some(i);- Old head now points back to
i
} else {
// list was empty
self.tail = Some(i);
}- First element
- Head and tail are same
self.head = Some(i);- pdate MRU pointer
🔹 access — mark key as recently used
fn access(&mut self, key: &K) {- Called on every successful get or insert.
let i = self.map[key];- Lookup index
- Safe because caller checked existence
if Some(i) == self.head {
return; // already MRU
}- No work needed if already MRU
self.detach(i);
self.push_front(i);- Move entry to front
- Core LRU behavior
🔹 remove_tail — evict LRU entry
fn remove_tail(&mut self) {- Evicts the least recently used entry.
- 가장 최근에 사용된 항목을 삭제합니다.
if let Some(i) = self.tail {- Only if cache is non-empty
let key = self.entries[i].key.clone();- Extract key for HashMap removal
self.detach(i);- Remove from linked list
self.map.remove(&key); self.entries[i].val = None;- Mark entry as empty
- Slot stays allocated
7️⃣ Public API
🔹 insert
pub fn insert(&mut self, key: K, value: V) -> Option<V> {- Insert or update a value.
if let Some(&i) = self.map.get(&key) {- Key already exists
let old = self.entries[i].val.replace(value);- Replace value
- Keep old for return
self.detach(i);
self.push_front(i);- Update recency
return old;- Return replaced value
if self.len() == self.capacity {
self.remove_tail();
}- Evict LRU if cache is full
let i = self.entries.len();- New index
self.entries.push(Entry {- Create new entry node
key: key.clone(),
val: Some(value),
prev: None,
next: None,- Initially detached
self.push_front(i);- New entry becomes MRU
self.map.insert(key, i);- Register key → index
None- No old value replaced
🔹 get
pub fn get(&mut self, key: &K) -> Option<&V> {- Lookup + update recency.
if self.map.contains_key(key) {
self.access(key);
}- If exists → mark as MRU
self.map
.get(key)
.and_then(|&i| self.entries[i].val.as_ref())- Return reference to value
🔹 get_mut
- Same as get, but mutable reference.
🔹 remove
pub fn remove(&mut self, key: &K) -> Option<V> {- Remove entry entirely.
self.map.remove(key).map(|i| {- Remove from map
- Retrieve index
self.detach(i);- Remove from LRU list
self.entries[i].val.take().unwrap()- Take ownership of value
🔹 Utility methods
pub fn contains(&self, key: &K) -> bool
pub fn len(&self) -> usize
pub fn is_full(&self) -> bool
pub fn is_empty(&self) -> boolSimple wrappers around HashMap
O(1) operations
🧠 Final mental model
HashMap<K, usize>
↓
entries Vec<Entry>
↓
index-based doubly linked list- This design gives:
- ✅ O(1) get / insert / remove
- ✅ No unsafe code
- ✅ No heap allocation per node
- ✅ Cache-friendly memory layout
Assembly로 Hash부분을 std::Hash 버리고, 극한의 최적화(hard mode)|🔝|
- I’ll do this in a systems-programming style explanation, not a beginner gloss.
- 초보자용 글로스가 아닌 시스템 프로그래밍 스타일의 설명으로 하겠습니다.
WARNINGHash부분이 암호화 되지 않아 보안에는 취약하다. 공부용이라 생각하자.
🧪 Mental model summary (one line)
- 아이디어는 이 한줄이다. 아주 단순하다.
"Take the first 2 bytes, take the last 2 bytes, glue them together."
"처음 2바이트, 마지막 2바이트를 가져와서 서로 붙입니다."Hash부분을 속도를 올리기 위해 Assembly를 활용하였다.(x86_64)
rust code로 넣은모습.
#[cfg(target_os = "linux")]
#[inline(always)]
pub unsafe fn look_up_identifier(data: *const u8, len: usize) -> u32 {
unsafe {
let result: u32;
std::arch::asm!(
"cmp rsi, 4",
"jb 2f",
"movzx eax, WORD PTR [rdi]",
"movzx edx, WORD PTR [rdi + rsi - 2]",
"shl eax, 16",
"or eax, edx",
"jmp 3f",
"2:",
"mov eax, 80",
"3:",
in("rdi") data,
in("rsi") len,
lateout("eax") result,
clobber_abi("system")
);
result
}
}main.asm
/* main.s — x86_64 Linux
*
* uint32_t look_up_identifier(const uint8_t *data, size_t len)
*
* rdi = data pointer
* rsi = len
* return value in eax
*/
.section .text
.global look_up_identifier
.type look_up_identifier, @function
look_up_identifier:
/* if (len < 4) goto fail */
cmp rsi, 4
jb fail
/* load first 2 bytes */
movzx eax, WORD PTR [rdi] # eax = data[0..1]
/* load last 2 bytes */
movzx edx, WORD PTR [rdi + rsi - 2] # edx = data[len-2..len-1]
/* shift first half */
shl eax, 16
/* combine */
or eax, edx
ret
fail:
mov eax, 80
ret🧠 Register mapping (Linux x86_64 ABI)
rdi → data pointer
rsi → len
eax → return value (u32)
edx → temporary (last 2 bytes)📦 Bit-level result layout
eax (u32)
31 16 15 0
┌────────────────┬────────────────┐
│ data[0..1] │ data[len-2..] │
└────────────────┴────────────────┘🧩 One-to-one mapping with your Rust inline asm
| Rust inline asm | main.s |
|---|---|
cmp rsi, 4 | cmp rsi, 4 |
jb 2f | jb fail |
movzx eax, WORD PTR [rdi] | same |
movzx edx, WORD PTR [rdi + rsi - 2] | same |
shl eax, 16 | same |
or eax, edx | same |
mov eax, 80 | same |
🧠 What the function computes (one-screen view)
look_up_identifier(data: *const u8, len: usize) -> u32- 기본적인 아이디어는 문자열 앞 2가리, 뒷자리 2자리를 가져와서 hash 값으로 쓰는 아주 단순한 원리.
Memory (bytes)
┌───────────────────────────────────────────┐
│ data[0] data[1] ... data[len-2] │ data[len-1]
└───┬────────┬───────────────────┬─────────┘
│ │ │
│ │ └─ last 2 bytes
│ └─ first 2 bytes
│
▼
If len >= 4:
────────────────────────────────────────────────────────
first WORD (16 bits) last WORD (16 bits)
┌───────────────┐ ┌───────────────┐
│ data[0..1] │ │ data[len-2..] │
└───────┬───────┘ └───────┬───────┘
│ │
│ │
▼ ▼
eax = first_word << 16
edx = last_word
eax = eax | edx
result (u32):
┌─────────────────────────────────────────┐
│ first_word (16 bits) | last_word (16) │
└─────────────────────────────────────────┘
If len < 4:
────────────────────────────────────────────────────────
result = 80 (constant fallback)🧩 Instruction-by-instruction flow (diagram)
Registers on entry:
────────────────────────────────────────
rdi → data pointer
rsi → len
Assembly flow:
────────────────────────────────────────
cmp rsi, 4
│
├─ len < 4 ───────────────► eax = 80
│ │
│ └──► return eax
│
└─ len >= 4
│
├─ movzx eax, WORD PTR [rdi]
│ ↑
│ │
│ └─ load first 2 bytes
│
├─ movzx edx, WORD PTR [rdi + rsi - 2]
│ ↑
│ │
│ └─ load last 2 bytes
│
├─ shl eax, 16
│
├─ or eax, edx
│
└──► return eax
📦 Bit-level view (hash shape)
32-bit result layout
────────────────────────────────────────
31 16 15 0
┌────────────────┬────────────────┐
│ first 2 bytes │ last 2 bytes │
│ data[0..1] │ data[len-2..] │
└────────────────┴────────────────┘🎯 What this means conceptually
HASH(data) =
if len >= 4:
(first_two_bytes << 16) | last_two_bytes
else:
80Properties (important for intuition)
✔ O(1) time
✔ Zero loops
✔ Zero branches after length check
✔ Extremely cache-friendly
✘ Very weak hash
✘ Collisions are easy
✘ NOT cryptographically safe🧠 Why this is fast (visual intuition)
Standard hash (SipHash):
┌─────────────────────────────┐
│ mix → rotate → xor → repeat │ (many rounds)
└─────────────────────────────┘
This hash:
┌──────────┐ ┌──────────┐
│ load 16b │ │ load 16b │
└────┬─────┘ └────┬─────┘
│ │
└─── combine ─┘이번에는 Arm버젼이다.
#[cfg(target_os = "macos")]
#[inline(always)]
pub unsafe fn look_up_identifier(data: *const u8, len: usize) -> u32 {
unsafe {
let result: u32;
std::arch::asm!(
"cmp {len}, #4",
"b.lo 2f",
"ldrh w0, [{data}]",
"sub x1, {len}, #2",
"ldrh w2, [{data}, x1]",
"lsl w0, w0, #16",
"orr w0, w0, w2",
"b 3f",
"2:",
"mov w0, #80",
"3:",
data = in(reg) data,
len = in(reg) len,
lateout("w0") result,
clobber_abi("system")
);
result
}
}✅ main.s (macOS / ARM64)
/* main.s — macOS ARM64 (AArch64)
*
* uint32_t look_up_identifier(const uint8_t *data, size_t len)
*
* x0 = data pointer
* x1 = len
* return value in w0
*/
.section __TEXT,__text
.global _look_up_identifier
.align 2
_look_up_identifier:
/* if (len < 4) goto fail */
cmp x1, #4
b.lo fail
/* load first 2 bytes */
ldrh w0, [x0] // w0 = data[0..1]
/* compute (len - 2) */
sub x2, x1, #2
/* load last 2 bytes */
ldrh w3, [x0, x2] // w3 = data[len-2..len-1]
/* shift first half */
lsl w0, w0, #16
/* combine */
orr w0, w0, w3
ret
fail:
mov w0, #80
ret🧠 Register usage (quick reference)
x0 → data pointer
x1 → len
x2 → len - 2
w0 → result (return register)
w3 → last 2 bytes📦 Bit layout of the return value
w0 (u32)
31 16 15 0
┌────────────────┬────────────────┐
│ data[0..1] │ data[len-2..] │
└────────────────┴────────────────┘260130Deep_Dive_into_HashMap002_Try_making
https://younghakim7.github.io/blog/posts/260130deep_dive_into_hashmap002_try_making/