Asset Storage (shared)

Last Updated: Sep 19, 2022
documentation for the dotCMS Content Management System

Assets and files saved in dotCMS are stored on an underlying shared file system, with pointers and metadata stored in the database. Generally, dotCMS nodes in a cluster share an NFS or ReadWriteMany volume, which allows all nodes to be able to save and retrieve assets as they are requested.

Binary Assets

All assets in dotCMS use the idea of binary fields to store assets. A binary field is just a content type field that can be added to any content type to include and store binary files along with content. The beauty of binary fields is that binary assets, such as images, files and videos, can be included, versioned and updated whenever the content object itself is referenced and updated. dotCMS stores these binary files on an underlying shared file system. These files are saved in the /assets folder based on a b-tree like folder structure that uses {inode} + {fieldVariable} as the key. For example, take a PDF file named a-big-document.pdf referenced in a content type field called associatedPdf and having a content version inode of 71b8a1ca-37b6-4b6e-a43b-c7482f28db6c. In this case, the underlying file would be located in the following FS location from the root of the /assets folder:

/assets/7/1/71b8a1ca-37b6-4b6e-a43b-c7482f28db6c/associatedPdf/a-big-document.pdf

Shared Assets Location

In the dotCMS container, the directory /data/shared directory is used to share the dotCMS assets and should be mapped in as a volume or persistant claim for use by the running containers. In our docker compose examples, we map this directory to a docker managed volume, cms-shared, e.g.

    volumes:
      - cms-shared:/data/shared

In multi-node production environments, you will want to use a ReadWriteMany file system share or a network shared volume using NFS.

Changing the Path to the Assets Folder (Depreciated)

While not generally recommended, you may change the location where the hard assets are stored by changing the DOT_ASSET_REAL_PATH environmental property.

export DOT_ASSET_REAL_PATH=/var/data/dotcms/assets

HardLinks & Storage Space

In order to minimize storage requirements, dotCMS uses hardlinks when storing versions of the same asset. In essence, this means that uploaded files, images or videos in dotCMS are only stored once. If further edits are made to the metadata surrounding that asset, which create new versions of the content in dotCMS, the file is unmodified and is still stored once - across all versions that . As a demonstration:

In our starter site, we have the file /images/404.jpg. You can see that this image has a version inode of 249eeb5c-7002-48e8-9ef3-ea6cd8e Looking on dotCMS’s /assets filesystem at that stored image, you can see it stored under the /assets directory here with a size of 47k and an INODE on the file system of 62669676.

$ ls -lih ./assets/2/4/249eeb5c-7002-48e8-9ef3-ea6cd8ea9043/fileAsset/404.jpg
62669676 -rw-r--r--  5 will  staff    47K Jul 30 17:14 ./assets/2/4/249eeb5c-7002-48e8-9ef3-ea6cd8ea9043/fileAsset/404.jpg

Now if edits are made to the 404 content - if we change the title of the image or set show on menu=true, dotCMS will create new versions of the content but under the covers, the actual 404 image that is stored is stored as hardlinks to the original image. And this is where the magic happens - hardlinks are just pointers and take up almost no disk space. You can test this by editing the content a few times and doing a find on the fs and report back the filesystem inode.

$ find ./assets -name 404.jpg -exec ls -i {} \;
62669676 ./assets/2/4/249eeb5c-7002-48e8-9ef3-ea6cd8ea9043/fileAsset/404.jpg
62669676 ./assets/4/a/4a352130-523d-44bc-934a-f63e7af4779a/fileAsset/404.jpg
62669676 ./assets/9/c/9c6b1880-c78e-42e4-94d9-725a50a99235/fileAsset/404.jpg
62669676 ./assets/3/0/305d7840-7b1d-45e3-8be1-e6bf8aeb697e/fileAsset/404.jpg

You can see they are all the same inode - 62669676 - which means they are all just hardlinks to the same file system space on disk which is only stored once. You can test this by doing a du on all the 404.jpg files found:

$ du -shc \
> ./assets/2/4/249eeb5c-7002-48e8-9ef3-ea6cd8ea9043/fileAsset/404.jpg \
> ./assets/4/a/4a352130-523d-44bc-934a-f63e7af4779a/fileAsset/404.jpg \
> ./assets/9/c/9c6b1880-c78e-42e4-94d9-725a50a99235/fileAsset/404.jpg \
> ./assets/3/0/305d7840-7b1d-45e3-8be1-e6bf8aeb697e/fileAsset/404.jpg
 48K    ./assets/2/4/249eeb5c-7002-48e8-9ef3-ea6cd8ea9043/fileAsset/404.jpg
 48K    total

The original image was 47k. Storing 4 versions of it the image using hardlinks only takes up 48k rather than the expected 188k (47k*4). Now if I edit the /images/404.jpg content again, and this time choose to upload a new image instead, things look different. Let’s say I replace the 404.jpg with another jpg that is 100k and save my content, creating a new version. If I run my find again, I get

$ find ./assets -name 404.jpg -exec ls -i {} \;
62700210 ./assets/0/c/0cef7994-2bc4-4fdc-82f7-f74ac57270f9/fileAsset/404.jpg
62669676 ./assets/2/4/249eeb5c-7002-48e8-9ef3-ea6cd8ea9043/fileAsset/404.jpg
62669676 ./assets/4/a/4a352130-523d-44bc-934a-f63e7af4779a/fileAsset/404.jpg
62669676 ./assets/3/0/305d7840-7b1d-45e3-8be1-e6bf8aeb697e/fileAsset/404.jpg
62669676 ./assets/9/c/9c6b1880-c78e-42e4-94d9-725a50a99235/fileAsset/404.jpg

And you can see now the inode list has two unique inodes in it - 62700210 and 62669676. To check out how much disk space is now being taken up by these 5 versions of content - we can run our du again and it returns the space taken my these 5 files

$ du -shc \
> ./assets/2/4/249eeb5c-7002-48e8-9ef3-ea6cd8ea9043/fileAsset/404.jpg \
> ./assets/4/a/4a352130-523d-44bc-934a-f63e7af4779a/fileAsset/404.jpg \
> ./assets/9/c/9c6b1880-c78e-42e4-94d9-725a50a99235/fileAsset/404.jpg \
> ./assets/3/0/305d7840-7b1d-45e3-8be1-e6bf8aeb697e/fileAsset/404.jpg \
> ./assets/0/c/0cef7994-2bc4-4fdc-82f7-f74ac57270f9/fileAsset/404.jpg
100K    ./assets/0/c/0cef7994-2bc4-4fdc-82f7-f74ac57270f9/fileAsset/404.jpg
 48K    ./assets/2/4/249eeb5c-7002-48e8-9ef3-ea6cd8ea9043/fileAsset/404.jpg
148K    total

On this page

×

We Dig Feedback

Selected excerpt:

×