White Paper

Fusion Power for MongoDB


Unleash unstructured data from disk I/O chains.

Relational databases are optimal with applications that store predictable data types and sizes. But the information age has created new use cases and opportunities for industries like Social Media, Analytics, Security, and Compliance to analyze and search data of all types, such as text, images, videos, music. 

MongoDB and other NoSQL databases meet this challenge, providing a way to quickly access unstructured data. To consumers, slow disk performance and RAM-density constraints have created scale out and cost challenges.

When deployed in MongoDB NoSQL servers, Fusion ioMemory delivers a flexible, high-performance, low-latency database for big data. This enables the real-time analytics, data mining, and scalability required to handle continually growing data sets without the complexity of traditional disk-based and DRAM heavy systems. This solution has been proven across a growing set of customers, including Kontera and Aggregate Knowledge.


MongoDB is Ideal for Unstructured Data Sets

MongoDB simplifies database operations for unstructured data, offering fast, scalable lookups without the overhead and size of traditional query language and the rigidity of a fixed schema. Figure 1 below illustrates how MongoDB reduces database schema complexity.


Figure 1:  MongoDB simplifies relational DB schema.[1]


Fusion ioMemory Offers an Alternative to Expensive DRAM

MongoDB provides a memory mapped storage engine that it can use to leverage the server’s DRAM for a performance boost. MongoDB relies on this optimization to service requests quickly from the in-memory pages of the map. Mongo database servers tend to be provisioned with sufficient DRAM to keep indexes and the “working set” of records in memory.

But as the working set increases, DRAM pricing and capacity quickly become an obstacle to efficient scaling.  Low DRAM chip density limits scaling beyond a few hundred Gigabytes per server. Worse yet, pricing increases substantially at higher DRAM capacity points. It is not uncommon to see DRAM pricing at $35-$45/GB.

Fusion ioMemory, by comparison, is readily available at 10 times the capacity of DRAM per PCI Express slot, and at 1/10th the cost.  A Fusion ioScale device, for example, can deliver over 3TB of capacity at a list price under $4/GB.  This makes flash an obvious choice for large working sets.  

While architects can use MongoDB sharding to reduce RAM costs, this carries other downsides. Sharding introduces configuration complexity, limits flexibility, and can preclude the use of MongoDB features that developers find convenient, such as capped collections.


Persistent Flash Memory Drives 10-40x Gains for Entire Databases

By using ioMemory as the primary storage for a MongoDB database, requests are served directly from fast flash memory instead of slow disk. The result is impressive.  Read latencies drop to as low as 2ms for the entire database, not just the set in the memory map.  Overall performance improvements of 10x–40x are typical depending on the workload.

MongoDB's Memory Mapped Storage Engine is more efficient when mapping files stored on ioMemory and can provide performance with smaller quantities of DRAM. For existing systems, this frees up DRAM resources for other applications.  For new systems, it is possible to reduce capex costs by provisioning servers with a balanced mix of flash and smaller amounts of DRAM.


Big Data Can Be Green Data with Flash

Traditional HDD- and SSD-supported systems meet performance through scale out—adding more servers, cores and spindles to aggregate higher levels of performance. This runs counter to a functional green datacenter approach.  

Disk-based systems are particularly wasteful when disks are being used for IOPS.  The IOPS from a single PCIe flash device is more than the aggregate IOPS from hundreds of disk drives. With power, cooling, and rack space at a premium in big data environments, datacenters have been looking for a solution that scales efficiently. ioMemory meets this challenge, delivering a green deployment to big data environments.

Figure 2 shows the green benefits of flash memory, 3 servers for 384GB working set compared to 1 server for a 1.2TB working set.


Figure 2:  Flash reduces power, cooling, rack space and cost


The consolidation of three servers saved 1,100 Watts in power with an additional 1,100 Watts saved on cooling.  A 66% reduction in rack space from 6u to 2u was similarly realized. 

Up to 40x Improvement in MongoDB Response Times

We ran tests comparing a big data workload (specifically Yahoo! Cloud Serving Benchmark (YCSB)) on ioMemory versus 10 x 7,200 RPM HDD hard disks in a RAID 0 under the following workloads:

  • Workload A: 50/50% read/write mix
  • Workload B: 95/5% read/write mix
  • Workload F: Updates
  • Workload F: 50/50% read/read+modify+write mix

Figure 3 shows the YCSB performance benefits with Fusion ioMemory.


Figure 3:  Fusion ioMemory Outperforms HDD Systems by up to 40x


Conclusion

Fusion-io with MongoDB offers substantial cost, power, cooling and rack space benefits for NoSQL environments with large working sets.  Adding Fusion ioMemory to MongoDB databases in place of using DRAM or disks creates a faster, more scalable, more efficient big data system.  Specifically we found the following results: 

  • 11-18x write-performance improvement for the entire database
  • 20-40x read-performance improvement
  • Read latencies as low as 2ms for the entire database
  • 2.2KW savings in power and cooling by reducing servers
  • 66% reduction in rack space by reducing servers
  • No need for sharding of the databases

More information: