Cache coherence. When you scan horizontally, your data will be closer together in memory, so you will have less cache misses and thus performance will be faster. For a small enough rectangle, this won’t matter.
More Related Contents:
- What is the most “pythonic” way to iterate over a list in chunks?
- Why are loops always compiled into “do…while” style (tail jump)?
- When is optimisation premature?
- What Every Programmer Should Know About Memory?
- What do the terms “CPU bound” and “I/O bound” mean?
- C loop optimization help for final assignment (with compiler optimization disabled)
- How do you test running time of VBA code?
- SIMD instructions lowering CPU frequency
- How much of ‘What Every Programmer Should Know About Memory’ is still valid?
- What is copy-on-write?
- How to iterate over a list in chunks
- How can I rank observations in-group faster?
- Time complexity of memory allocation
- Deflate compression browser compatibility and advantages over GZIP
- Can compiler optimization introduce bugs?
- Is the inequality operator faster than the equality operator?
- Why does the Java API use int instead of short or byte?
- RDTSCP in NASM always returns the same value (timing a single instruction)
- foldl is tail recursive, so how come foldr runs faster than foldl?
- Hyperparameter optimization for Deep Learning Structures using Bayesian Optimization
- What’s the fastest way to divide an integer by 3?
- How to do batching without UBOs?
- What is Big O notation? Do you use it? [duplicate]
- Can I use the “null pointer optimization” for my own non-pointer types?
- Is optimizing JavaScript for loops really necessary?
- How to optimize these loops (with compiler optimization disabled)?
- Logger slf4j advantages of formatting with {} instead of string concatenation
- What branch misprediction does the Branch Target Buffer detect?
- Can I efficiently return an object by value in Rust?
- How can I optimize these loops (with compiler optimization disabled)?