When we have to read a NxN matrix stored in memory with a programming language, we normally use two loops, one inside other. First to read rows, and second to read columns of such row. Why we do that?
The reason is in the memory management of the operating system of our computer.
I’m going to talk about general methods and simplified situations of the problem in that way we can understand easily the question.
When our computer works executing process, he needs data. This data at first sight comes from the main memory, which is far far away (in computational words) from the Central Processing Unit, that means these data spend a lot of time coming from the memory to the CPU. To improve this, CPU usually contains quick-little memory circuits called registers, which CPU uses directly. These registers are too quick, but they can’t store a lot of data. A circuit between registers and main memory is called cache.
When we have to read a 3×3 matrix inside a C program, for a example, we coded:
This is the more efficient way to read a NxN matrix, a first loop for rows and a second for columns, last inside first. That is because when CPU needs the first value of the matrix (v) , that value comes from main memory to cache, but not only him, he comes with v and v. That is because they are stored together in memory, in consecutive directions:
When CPU needs data to work with, he first see if the data is in the cache, if it is in cache CPU uses it, but if not, data must come to cache, operating system bring values near him in memory too.
This works good because of the data storing in consecutive directions of memory, the cache structure and the functions of the operating system.
When the CPU needs data that isn’t in cache, a cache fault appears. When a cache fault appears, CPU spends time bringing data in memory to cache. This is a value that we can use to calculate the efficiency of this method of data access. If we suppose a 3×3 matrix and the two possible methods, access by columns (the studied here) and access by rows (the other method):
Method Cache faults
Access by columns 3
Access by rows 9
We can see that the number of cache faults with the first method is 3, one for access to each row of the matrix , while the second method has 9 cache faults, a cache fault appears with each element we access to.
So we can determinate that this method of matrix accessing is the more efficient.