The other day I was looking at a opensource engine, basically it was uploading the image from a file to a RAM buffer adding some decoding while streaming (I assume this is done "in-cache" so I can't count the decoding as an extra copy), then the buffer was feeded to glTexImage2D
.
According to GL specifications, glTexImage2D
is free to keep a in-RAM copy of the image and wait for the best moment to upload it to the GPU.
So basically we have
- From disk to RAM
- From RAM to RAM
- From RAM to GPU
Thos are basically 4-5 BUS runs x Textures Size in megabytes bandwith payload
3 copies of the same data.
Isn't there anyway to reduce the copies to
- From disk to RAM
- from RAM to GPU
Or even
- From disk to GPU
?
(Assuming hard-disk already have data in the correct memory layout and hence no decoding is necessary).