Hey! Have you read one of our dev blogs before? Here's one on our use of Grafana, a tool to generate a bunch of graphs about Hypixel. In these dev posts, we talk about how the technology works on the network. Today, we'll explain how we handle the storage of player islands on Hypixel SkyBlock. Briefly for the few of you who might not have played: Each player owns their own Minecraft world on the game in which they can build their base, generate resources using minions and store their valuables in chests. Depending on their rank, players can create a certain number of profiles, each with their own island. The task of storing Minecraft worlds is easily divided in two: 1. turning blocks into files 2. holding lots of files. But first... ▶ SkyBlock Statistics (so far) Game released on June 11th 2019 1.73 million player islands 40.2 GB of data on drives 27 thousand concurrent players record 75 big computers running 1500 Minecraft servers with 10 worlds each, just for player islands 3300 saves and 904 island load per minute at 20k ccu Started development in January with 3-4 full-time devs, with a prototype first built in a in-house game jam in November Click read more to learn how the cake is cooked. ▶ Turning worlds into files Minecraft world folders The first challenge was changing the blocks you grow wheat on and the pigs you make swords from into a bunch of 1s and 0s stored on SSDs. This might sound really easy if you've ever ran a Minecraft server, considering the game already does that. Here's how it looks in Windows Explorer: In fact we leveraged that in 2015 for Hypixel Housing, another game where players own worlds (less crafting, more parkour). What Housing did (in code) was: Select all those files -> Right-Click -> Send to -> Compressed (zipped) folder, then store the zip in a SQL database. This process worked fine until (2 years later) we ran into some scalability issues, where the 2.5 million player houses took up 4TB of data, which somehow was a bottleneck both for machine provisioning and on the database. Basically, the housing worlds were too big. We also knew we wanted to let players own multiple worlds in SkyBlock (currently in the form of profiles), while housing only offers a single world. Overall, we still leverage a lot of what Minecraft has already done for world storage, but remixed how it does it into our own custom solution... Minecraft Region format: Pitfalls The Region file format is the name of the collection of files composing minecraft worlds. Their format is very well documented on the Minecraft Wiki (Region, Level, Chunks), with some extra bit of lore on wiki.vg. We've had a lot of experience with this format at Hypixel because of the number of game maps we've created and handle and the few implementations of loading/saving and NBT across different in-house tools and Minecraft PE. Inside a Minecraft world with NBTExplorer The Region format comes standard with tools we use (Minecraft server, MCEdit, rendering tools...) and we will keep using it for all of our other minigames like Bed Wars and SkyWars. In fact the public SkyBlock islands (The Hub, Gold Island, Nether Fortress, etc...) are stored as regular region files. We identified 4 pitfalls with the Region format for our use case. The main difference we have from vanilla Minecraft is that minigame worlds are not infinite. Region files are optimized for players walking across infinite worlds (also called world streaming), while SkyWars or SkyBlock worlds usually take place within 16x16 chunks, which can all be loaded in one go and held in memory (and eating all your RAM from 1.8.9 leaks). Here are the 4 pitfalls (this might get technical): 1. Region files accumulate extra unneeded data over time The files are divided into sectors of 4KB. Each chunk takes takes at least (and usually) 1 sector. Since the chunk size (in bytes) is pre-pended with its length, Minecraft doesn't bother with clearing each sector before writing on top of it. This means that the longer you play (or build) on a Minecraft world, the more garbage data its world files accumulate and the bigger they get. This usually doesn't matter when playing survival, but it does matter when storing lots and lots of worlds. 2. Region files compress per-chunk and have a lot of padding To simplify the world IO, each chunk is compressed individually, with its own zlib header. Grouping the whole thing together offers a better compression ratio. The padding is mostly negated when zipping the whole world, but might still leak a few (aka thousands) bytes of extra data here and there. 3. Chunk Format has lots of data we don't need There are two parts to this. First, the NBT content has useless fields (for us), like "LastUpdate", "InhabitedTime", "LightPopulated", "TerrainPopulated", xPos, Zpos.... Second, using NBT itself isn't great for size because it contains the field names. It can be replaced with a simple versioned binary format. 4. Zlib is 24 years old Zlib/gzip (DEFLATE...) is the compression library used by the vanilla Minecraft server to save its worlds. A library is basically a bunch of code meant to be re-used, while compression is the magical process (aka maths) of turning big files into small files. Zlib is ubiquitous, but there has been advances in the compression field since its release. Giant tech (Google, Facebook...) has invested millions in R&D and open sourced their work (made publicly available). For compression our requirements are: Need high compression ratio because lots of files, and compression needs to be faster than decompression cause we do that more. After looking at a few options (includes trying them out on Minecraft & judging who else uses the lib), we landed on Zstd (Java bindings). 5x faster AND 2.5x smaller Introducing: The Slime world format It's a new file format, like a .doc or a .html, but instead it's .slime! Slime fixes the points outlined above, while sticking to a lot of Minecraft "standards", sometimes so software compatibility is easier, sometimes for performance. Defining a file format might very well be the most boring ever, but let's give it a shot (this is also technical): ---------- “Slime” file format 2 bytes - magic = 0xB10B 1 byte (ubyte) - version, current = 0x03 2 bytes (short) - xPos of chunk lowest x & lowest z 2 bytes (short) - zPos 2 bytes (ushort) - width 2 bytes (ushort) - depth [depends] - chunk bitmask -> each chunk is 1 bit: 0 if all air (missing), 1 if present -> chunks are ordered zx, meaning -> the last byte has unused bits on the right -> size is ceil((width*depth) / 8) bytes 4 bytes (int) - compressed chunks size 4 bytes (int) - uncompressed chunks size <array of chunks> (size determined from bitmask) compressed using zstd 4 bytes (int) - compressed tile entities size 4 bytes (int) - uncompressed tile entities size <array of tile entity nbt compounds> same format as mc, inside an nbt list named “tiles”, in global compound, no gzip anywhere compressed using zstd 1 byte (boolean) - has entities [if has entities] 4 bytes (int) compressed entities size 4 bytes (int) uncompressed entities size <array of entity nbt compounds> Same format as mc EXCEPT optional “CustomId” in side an nbt list named “entities”, in global compound Compressed using zstd 4 bytes (int) - compressed “extra” size 4 bytes (int) - uncompressed “extra” size [depends] - compound tag compressed using zstd Custom chunk format 256 ints - heightmap 256 bytes - biomes 2 bytes - sections bitmask (bottom to top) 2048 bytes - block light 4096 bytes - blocks 2048 bytes - data 2048 bytes - skylight 2 bytes (ushort) - HypixelBlocks3 size (0 if absent) [depends] - HypixelBlocks3 For each section ---------- If you don't see indentation: https://pastebin.com/raw/EVCNAmkw Spoiler: Extra information Note HypixelBlocks3, which is a cache for 1.9+ storage. The entities CustomIds is to be able to serialize custom entity classes. There could be extra savings by: 1) Ordering blocks in XZY colums & using the heightmap for y height 2) Storing blocks as 12 bits (8 bits id + 4 bits type) because we only run 1.8.9 3) Combining skylight nibble in each in those 4 remaining bits (even good for compression, cause skylight usually changes once per y column) BUT! It's better for loading speed to use the same chunk format as the in-memory representation (bunch of nibbles, 13 bits block palette, HypixelBlocks3). Chances are, the difference in size gets compressed away. Versions history: - V1: Initial release - V2: Added "extra" nbt tag for per-world custom data - V3: Added entities storage Here's a video visually explaining the pitfalls of the region format: Our integration of the Slime format in Craftbukkit is done by using "in-memory worlds", which don't use the disk storage at all. Most of the things in our server software is public to avoid reflection. We instantiate the nms World and pass an overriden IDataManager, whose createChunkLoader (name can vary) returns an overriden ChunkRegionLoader which holds the pre-loaded nms Chunks in memory. Another note: We store blocks, entities and tile entities in separate arrays in the hope of better compression ratios. There is no publicly-available plugin for slime saving/loading at the moment because too much of it ties into custom code which we cannot share. Analysis of real-world scenario savings All of the above mumbo-jumbo allows use to save many GB of data. Prior to deploying the Slime format at scale, we did some testing on our existing Hypixel worlds using a little program called slime-tools. If you are a developer and know how to use the command line (or a decompiler), you can download it [HERE]. (Note that we do not offer any support for this tool in any way) Aggregate results for various gametypes: These are the average saving factor of slime over zipping the Minecraft world. Code: hg: 3.179x (31% of zip) sw: 5.032x (20% of zip) smash: 8.34x (12% of zip) mcgo: 4.209x (24% of zip) bedwars: 10.246x (10% of zip) mw: 2.905x (34% of zip) bb: 59.56x (2% of zip) housing: 5.358x (19% of zip) Spoiler: SkyWars savings Spoiler: 18 popular Houses Spoiler: Extra Analysis Discussion The most comparable game to SkyBlock in terms of worlds is Bed Wars. The reason is that Housing has ballooned savings because of island copies (yes, the islands in the distance in housing are stored on disk). Here's what a housing world looks like: https://staticassets.hypixel.net/news/5c442d9fd200f.image (7).png Build Battle has an incredible 60x factor because of chunk-aligned copies of chunks which seem to be fully compressed away (it's a theory only). Build battle has many copies of the map on disk. Example build battle world: https://staticassets.hypixel.net/news/5c442e4e3e4cb.image (8).png Another point is that disabling HypixelBlocks3 in a mono-version context would yield between 10%-35% reductions in .slime file size. It could even be disabled on Hypixel but that requires CPU on world load that we cannot spare. (HypixelBlocks3 is a cache of 1.9+ blocks format stored in each chunk section) ▶ Holding lots of files SeaweedFS The second challenge is how to actually store and manage all of the player worlds. This problem is complicated because: There are many servers which will need fast access to those worlds The game cannot go down Players care about the worlds, it needs to work even if a computer explodes Whatever holds those worlds needs to be simple and documented We looked into a few different options to store all of the .slime files and decided to use SeaweedFS (https://github.com/chrislusf/seaweedfs). It is basically Amazon S3, but self-hosted. We've had to develop our own backup script and monitoring, interfacing with SeaweedFS. Backups of the whole dataset are done twice a day and stored in S3 for a few weeks. We run SeaweedFS across 3 volume servers which all use very low resources, always replicating volumes on the 3 servers for availability and peace of mind. The Seaweed FID are stored in Mongo (like the rest of our player data). A view of our SkyBlock world storage monitoring Note that the data usage is triple the real size (because of replication) The blip in the middle of the graph storage is when we added many volumes The worlds are stored when the game is closed on a server, which happens just a few dozen seconds after the game world has had 0 players. They're also stored every few minutes, more often if players have been interacting with the world a lot. That's why there's a lot more saving going on than loading. The avg load size (that's the .slime size) is recorded when a world is actually loaded. That figure is about 80% higher than the avg size on disk per world, presumably because active worlds have more stuff going on than inactive ones. Spoiler: World Creation Timing graph This graph monitors how long it takes per-task in the process of loading your world. Note that the only thing on the main thread (Instantiate the Skyblock game) takes less than a tick because of in-memory worlds and other modifications, allowing us to freely add/remove worlds on the fly. -- Hopefully this gives you a good overview of how we store all of the SkyBlock islands! There's a lot more to the SkyBlock tech which we may cover in the future, so keep an eye out.