1. [​IMG]
    Hey!

    This is another one of those in-depth dev blogs in which we explain what happens behind the scenes to maintain a seamless experience for you, the players. Today we're going into a little more technical post since we'll be talking about how we were able to scale from 30,000 players to 76,000+ in under 3 months.

    Firstly I'd like to address the myth of dynamic scaling at Hypixel; when you see us announcing that we're going to be adding new machines to our fleet, it's not us simply pushing a couple of buttons and magically having new servers available for our system to run. There's quite a lot of things behind the scenes going on that doesn't make it as simple as that for us to scale up, one of these being that the system we have was not designed for the kind of scaling that some people were led to believe we have. We use a mixture of SaltStack, Python scripts, Shell scripts, and other in-house systems to facilitate deploying game servers, however, this process doesn’t fall under automated deployment or a cloud solution. Keeping that in mind for the rest of the post will shed some more light into how we do things.

    Performance, Optimizations, and Profiling
    One of the main problems we had to face in the last few months of endless growth, was that with the addition of SkyBlock, other games started struggling and fighting for resources, now you may ask yourself "wait a minute surely Minecraft servers don't use that many resources?" and that's where you'd be wrong. We run a highly modified version of Spigot, and even with all the optimizations and changes we've done, it'll always struggle to run under harsh conditions simply because it was not meant to, Minecraft as a game is not optimized for what we do, and we have done many attempts to get it to a point where it's better, however it is not an easy task as a lot of people think. Now, back to the resources part, we have two types of dedicated machines, one of them is meant for mini game servers (the usual, SkyBlock Dynamic Worlds, Arcade, SkyWars, BedWars, etc) and the other one is meant for mega game servers (UHC, Pit, etc). Each one of our mini boxes runs 20 instances, while each mega runs 5 instances, now (like everyone else) we make mistakes and we learn from them. This means that every now and then we find things that we did wrong, or that we simply didn't have the tools or needs to do differently and that's what this whole section is about, finding these mistakes and patching them up, in doing so we were able to give fewer resources to certain systems, and they would still function properly and without interruption, however, that's not an easy job and that's why we had quite a few people working on it. Here are a few things that we've noticed:
    • Don't overestimate the JVM's (Java Virtual Machine) GC (Garbage Collection). Here's a nice article on how this works https://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html. It's not a simple concept to grasp, especially once you reach the different types of Memory Models there are and what each one is used for, nonetheless, it's a good thing to know.
      Here's an illustration to show the true complexity of the Memory Models.
      [​IMG]
      We mostly focus on the JVM Heap (where objects are stored) and Eden Space (where objects are allocated), however, all of them are used.
    • Getting used to performance-oriented algorithms, this can take a long time to get used to, and there's a lot of tools out there to help you get started, and here's an example http://www.java-gaming.org/topics/extremely-fast-sine-cosine/36469/view.html. These are usually designed to use as little resources as possible, and that's what we're attempting to achieve.
    • Using primitive collections, this will help you avoid unnecessary autoboxing and Trove/FastUtil have quite good implementations of this. Trove: http://trove4j.sourceforge.net/html/overview.html FastUtil: http://fastutil.di.unimi.it/. Autoboxing is the conversion that the Java compiler does between a primitive type and it's corresponding wrapper class. This is done a lot it can lead up to quite a bit of CPU time.
    • Avoiding hard references, now this is quite a known one, however, it happens to the best of us, and it can go unnoticed, leading to quite obscure memory leaks. Just recently, we noticed that in SkyBlock a few maps were holding hard references to Worlds, which is one of the worst things you can keep loaded in memory due to all the data it has.
    • Avoiding unnecessary synchronization and old ways, for this, you need to have an understanding of how concurrency works, and there are quite a few utilities that java provides to avoid using primitive synchronization.
    • Avoid unnecessary Streams, these allocate new Objects and Collections which, if called a lot, can add up to a lot of CPU time, also, sometimes these Streams don't actually contribute to anything more than just "using Java 8 utilities" and that doesn't justify them.
    • Understanding how the big O notation functions, this is quite simple mathematics and is mentioned in a lot of dev posts, so it's good to have a good understanding of it.
    • Avoiding defensive copies, once again, this is quite a small thing, however, every little bit counts. An example of this are Enum#values calls, we suggest simply making a static field which calls Enum#values and you'll have the same result.
    Most times writing optimized code can be a bit of a complicated task, so we compiled a small list of Spigot related notes on best practices that we have pinned on one of our main Slack channels, we hope it can be useful for you too.
    Now, here's a few other things we have been implementing and learning and as such we recommend them too.
    • Learning to benchmark properly, it's a common misconception that doing benchmarks is just doing a simple time comparison when there's more to it, Java has a great tool for it called JMH. This has helped us to, for example, find out that some UUID method calls in Java 8 are a lot slower than what they are in Java 10, and as a result, we've ported this over to our own utilities. This can be used for virtually anything, however, they do take time. https://openjdk.java.net/projects/code-tools/jmh/.
    • Learning how to understand memory snapshots, CPU snapshots, etc. These are extremely useful when fighting that pesky memory leak or any other weird issue you may have and there are a lot of tools out there to do it, however, what we use is something called YourKit Java Profiler, now we have a few systems in place to help us use it, but it's quite easy to set up, but it's not as easy to understand what you're looking at. We've recently been relying on this more than anything to find most of the things we've mentioned above. Here's a small article regarding Object allocations: https://www.yourkit.com/docs/java/help/allocations.jsp.
    • Use the tools that Java provides you, these include jstacks, jmaps, etc. These are are a lower level than what YourKit shows you and thus don’t use GUIs, but understanding these can be really good too and even help you spot things that you may miss with other tools. JStacks are used to show thread dumps of any active Java process, and JMaps are used to look at memory. Here's a nice and short tutorial on them https://linuxhint.com/jmap-and-jstack-tutorial/.
    Monitoring
    We cannot stress how crucial this is if done properly. We constantly use it to monitor the health of any and all given systems we have, and we have a really fancy dashboard on Grafana to look at it, below are a couple of screenshots of this.

    upload_2019-8-15_16-4-32.png

    upload_2019-8-15_16-4-53.png

    upload_2019-8-15_16-5-27.png
    As weird as it may seem, these (along with others) have helped us debug issues in the past, for example we once were having network issues and we had no clue what was causing them software-wise, we started looking at the metrics dashboards until someone pointed out that database calls were being much higher than usual, thus we began to restart the systems that we had recently rolled until we found which one was causing the issues, patched it up and that was that (even though this can be really stressful, it's nice to have all of these utilities to help us, if not we'd be left in the dark a lot longer as we'd have to manually dig deeper).

    We published a dev-blog talking about Monitoring last year, which can be seen here https://hypixel.net/threads/dev-blog-4-graphs-are-awesome.1824926/.

    Putting these tools to use
    Another major thing we did to help us scale this much, was going over old games and reviewing them with all the tools that we've mentioned so far to find their weak points, etc. An example of a really weird thing we found, was in some of our Murder Mystery maps (and this reappeared in SkyBlock with Furniture), ArmorStands were not being marked as Invulnerable, thus taking damage every single tick over hundreds of entities which caused our TPS to topple down and affect other instances within the same box. We try to limit the CPU usage per mini to 5%, and sometimes when these issues happen their CPU usage spikes up and helps us spot exactly what's happening. It's good to know which tools to use in which situations, for example when we receive a report of something lagging, we usually look at performance graphs, there we can see if it's a memory or a CPU issue. Memory is usually easier to find since you can look at the objects, GC graphs, allocations, etc, however, CPU can be sneaky sometimes since object allocations, threads, stacks, etc all contribute towards it. We've seen CPU issues caused by networking too, for example with the release of the End in SkyBlock, we noticed that in our profilers we were seeing massive spikes of packets correlating to CPU usage spiking up, this was caused by lots of players in contained areas, light updates, entities moving around, etc which the server does not handle well. We didn't only go through our games, but we also went through our critical systems (which handle most of our ecosystem) and managed to get them from graphs like this
    [​IMG]
    to graphs like this
    [​IMG]
    (notice the memory usage, not how clean the graphs look).
    Most of these optimizations were achieved through reducing allocations massively, and it's pretty simple to spot this in YourKit. UHC, MurderMystery, SkyWars, and SkyBlock are some of the biggest examples we can provide where reducing allocations had a great impact.

    Here's an example of something that we changed, that doesn't have a massive impact, however, it's still useful. This code is what JDK 8 uses for parsing UUIDs,
    Code:
    public static UUID fromString(String name) {
            String[] components = name.split("-");
            if (components.length != 5)
                throw new IllegalArgumentException("Invalid UUID string: "+name);
            for (int i=0; i<5; i++)
                components[i] = "0x"+components[i];
    
            long mostSigBits = Long.decode(components[0]).longValue();
            mostSigBits <<= 16;
            mostSigBits |= Long.decode(components[1]).longValue();
            mostSigBits <<= 16;
            mostSigBits |= Long.decode(components[2]).longValue();
    
            long leastSigBits = Long.decode(components[3]).longValue();
            leastSigBits <<= 48;
            leastSigBits |= Long.decode(components[4]).longValue();
    
            return new UUID(mostSigBits, leastSigBits);
        }
    This code is doing 20 allocations plus the object itself, and below is the code in JDK 9 which does far fewer allocations, and is faster too.
    Code:
    public static UUID fromString(String name) {
            int len = name.length();
            if (len > 36) {
                throw new IllegalArgumentException("UUID string too large");
            }
    
            int dash1 = name.indexOf('-', 0);
            int dash2 = name.indexOf('-', dash1 + 1);
            int dash3 = name.indexOf('-', dash2 + 1);
            int dash4 = name.indexOf('-', dash3 + 1);
            int dash5 = name.indexOf('-', dash4 + 1);
    
            // For any valid input, dash1 through dash4 will be positive and dash5
            // negative, but it's enough to check dash4 and dash5:
            // - if dash1 is -1, dash4 will be -1
            // - if dash1 is positive but dash2 is -1, dash4 will be -1
            // - if dash1 and dash2 is positive, dash3 will be -1, dash4 will be
            //   positive, but so will dash5
            if (dash4 < 0 || dash5 >= 0) {
                throw new IllegalArgumentException("Invalid UUID string: " + name);
            }
    
            long mostSigBits = Long.parseLong(name, 0, dash1, 16) & 0xffffffffL;
            mostSigBits <<= 16;
            mostSigBits |= Long.parseLong(name, dash1 + 1, dash2, 16) & 0xffffL;
            mostSigBits <<= 16;
            mostSigBits |= Long.parseLong(name, dash2 + 1, dash3, 16) & 0xffffL;
            long leastSigBits = Long.parseLong(name, dash3 + 1, dash4, 16) & 0xffffL;
            leastSigBits <<= 48;
            leastSigBits |= Long.parseLong(name, dash4 + 1, len, 16) & 0xffffffffffffL;
    
            return new UUID(mostSigBits, leastSigBits);
        }
    We used these findings, to run a benchmark which produced the following results:
    Code:
    Benchmark                                    Mode  Cnt  Score   Error  Units
    BenchmarkUuidFromString.fromString           avgt   10  0.342 ± 0.005  us/op
    BenchmarkUuidFromString.fromStringRebuild    avgt   10  0.478 ± 0.007  us/op
    BenchmarkUuidFromString.unsignedLong         avgt   10  0.116 ± 0.001  us/op
    BenchmarkUuidFromString.unsignedLongReplace  avgt   10  0.239 ± 0.003  us/op
    We won't go into explaining what it all means, because we want you to be able to figure it out, but we can say that we ended up porting the JDK 9 method into our Utilities because we thought even though it had a small impact, it'd be useful in the long run, and these things matter.

    One of the things we changed within Spigot, was ItemStack and ItemMeta comparisons and accessing. Spigot was comparing NBT Data in a way that was not efficient, and with the sudden rise in SkyBlock popularity, all those items you see in chests, inventories, etc were constantly being compared against each other, especially by things like hoppers, etc, so what did we do to fix it? We added an ID to the NBT compound, then when we first compare items, if they are equal we sync up their IDs and we know for certain that they’re the same, we also track the parent of each NBT Tag, so when something changes within it we change the ID and thus the process begins again, this makes it much faster for us to do just as many comparisons.

    Furthermore, there's a very important thing to always keep in mind, not all possible improvements are as easy to hunt down, and we found this out when we started optimizing UHC we noticed that CPU was spiking up, yet it didn't show up in CPU sampling at all (even if we were certain of the time at which it was happening), so we fell back to doing our own in-house way to find out what was happening since it was quite obscure. Every time a tick didn't complete within a specified window, we printed the current thread stack (which shows us a bit of useful information) and thus we were able to hunt down the issue and fix it.

    Now, a pretty important point is that performance is not always the solution to everything, as often it can be very ugly, manually resizing arrays is a neat example of this. Sometimes, the impact it has on performance is so little, that it's better to leave it as readable code than to spend a long time re-writing it and not being able to understand it in the end.

    These tools can also be used to prove someone wrong when they really want you to review their code and you know what they changed is not worth it, so you write up a couple of benchmarks (and because it's a great meme). Here's our little comparison, but it's a good example of how big of a range of uses these tools have.

    [​IMG]

    We are always constantly making changes to make the network run it's smoothest, however, we felt like we should share our findings so far to show how spending some time reviewing code and taking performance into account can help you grow in a massive scale.

    If you've got any questions, feel free to post your questions here or Tweet at us.

    Thank you for tuning in and for supporting us over all these years.
    See you next time!
     
    #1
    Last edited: Aug 19, 2019
    • Like x 193
    • Useful x 25
    • Creative x 7
    • Hype Train x 5
    • Agree x 3
    • Funny x 3
    • Mod Emerald x 3
    • Helper Lapis x 2
  2. I love all of these graphs =D

    *Even if I admit that this time I haven't understand everything :p*
     
    #2
    Last edited: Aug 19, 2019
    • Like x 24
    • Agree x 20
    • Funny x 13
    • Dislike x 1
    • Useful x 1
  3. Stannya

    Stannya Well-Known Member

    Stannya
    MVP++
    Blue Crew BLUCRU
    Member
    Messages:
    2,418
    Thank you for keeping us posted! It’s also awesome to see the 6th Dev Posts, I really enjoy seeing these.

    I don’t understand the majority of topic mentioned in this thread, but I do understand some! :)
     
    #3
    Last edited: Aug 19, 2019
    • Like Like x 17
    • Agree Agree x 6
    • Dislike Dislike x 2
    • Creative Creative x 1
  4. Glad to see that you're making improvements to keep this game in good shape.


    I don't understand any of those graphs.
     
    #4
    • Like Like x 6
    • Agree Agree x 3
    • Funny Funny x 2
  5. Lukeee

    Lukeee Active Member

    luwuke
    MVP++
    Shore SHORE
    Member
    Messages:
    270
    Awesome job man!
     
    #5
    • Agree Agree x 3
    • Like Like x 2
  6. 101DRex

    101DRex Well-Known Member

    101Drex
    VIP+
    MyGeeks GEEKS
    Member
    Messages:
    3,095
    Gonna try and comprehend this another time, I'm too sleepy for that now.
     
    #6
    • Funny Funny x 4
    • Like Like x 3
    • Agree Agree x 2
  7. CosmicJhin

    CosmicJhin Well-Known Member

    Messages:
    4,664
    very fun
    also, I don't understand the big deal, just play a different game while it's in maintenance that's what I'm doing.
     
    #7
    • Agree Agree x 7
    • Like Like x 1
  8. Hacker

    Hacker Well-Known Member

    aysuh
    MVP++
    The Bloodlust BLOOD
    Member
    Messages:
    1,302
    epic, gamer
     
    #8
    • Like Like x 1
  9. Awesome!
    But... what is the garbage collection scheme? CMS? G1?
     
    #9
    • Agree Agree x 1
  10. Trihard

    Trihard Well-Known Member

    UnbrokenFlame
    MVP+
    Messages:
    6,609
    Seed
     
    #10
  11. here we can see the struggle of the masses to understand the thread
     
    #11
    • Funny Funny x 9
    • Like Like x 1
    • Agree Agree x 1
  12. When is skyblock coming back?
     
    #12
    • Funny Funny x 4
    • Useful Useful x 2
    • Like Like x 1
    • Agree Agree x 1
  13. Awesome! Good job!
     
    #13
    • Like Like x 1
    • Hype Train Hype Train x 1
  14. Keith

    Keith Well-Known Member

    MrKeith
    MVP+
    Messages:
    1,150
    Awesome!

    When handling the allocation of different instances for games like Skyblock, when do you decide to add more of those instances? Does it occur when RAM usage has hit a certain point, or when a certain player threshold has been hit? It's interesting to know that each instance can be allowed to play different games- when do you move an instance from one game type to another?

    Thanks for the server info, quite cool!
     
    #14
    • Useful Useful x 1
  15. Aemmo

    Aemmo Active Member

    Aemmo
    MVP+
    Proob SACRED
    Member
    Messages:
    202
    Love reading these!
     
    #16
  16. Skyblock has exploded hypixel...
     
    #17
    • Agree Agree x 1
  17. Support me on the Hypixel Server Discussion forum on my Greek Translation Post!
    I really need support on that for Hypixel staff to notice and maybe add Greek or Greeklish into hypixel.
     
    #18
  18. First page maybe? Awesome information btw! ❤️
     
    #19
  19. Love these developer blogs!! :eek:
     
    #20
    • Agree Agree x 5

Share This Page