Blog

MongoDB, MMAPv1, WiredTiger, Locking, and Queues

In December 2014, MongoDB acquired the database company WiredTiger, and began integrating the WiredTiger storage engine into the MongoDB architecture. In March 2015, MongoDB released version 3.0, which introduced their pluggable storage engine, with options to store your data using the original (MMAPv1) storage engine, or using WiredTiger. This was kind of a Big Deal, and MongoDB developers and DBAs rejoiced, including myself, for a few main reasons:

  • WiredTiger provided on-disk data compression, reducing both the space requirements of your database, as well as the I/O requirements, generating some nice performance wins (at the expense of some extra CPU usage). The legacy (MMAPv1) storage engine was somewhat notorious for gobbling up tons of disk space (and never giving it back to the operating system).
  • WiredTiger introduced document-level locking to MongoDB (well, not exactly, but more on that later). A big issue with MongoDB historically was the use of a server-level lock for all database writes. While server-level locking makes reasoning about concurrent behavior easy, it meant that MongoDB’s write performance would always trail behind (a long way behind) its read performance.

So version 3.0 and WiredTiger seemed to solve all our problems. The truth was a little more complicated…

(more…)

New Beginnings

After a 3+ year break to build a startup, I’m jumping back into Arborian with both feet. You should expect to see some changes on the site, and more frequent blogging.

For now, I’ll leave you with a link to a talk I did at MongoDB World last year about Chapman, a high performance distributed task queue we use at my startup:

Chapman: Building a High-Performance Distributed Task Service with MongoDB

MongoDB Developers Online Training

As part of our training offerings at Arborian, we’re pleased to announce an upcoming online class: MongoDB Developers Online Training. Read on for class details and pricing, and we’d love to see you there!

MongoDB Developers Online Training

This is an intermediate-level, 4-week class that introduces you to MongoDB. The
format consists of weekly sessions delivered in webcast format. (These
sessions will be available on video if you’re not able to make the live
webcast.) In between the lectures, you’ll have hands-on exercises to drive home
the concepts in the class. Additionally, I’ll have an open “office hour” the day
after each class where you can ask questions about the exercises.

The rough schedule we’ll be using is:

  • Week 1: MongoDB overview, introduction, and installation This module includes BSON,
    JSON, collections, documents, and deployment models. Installing MongoDB and PyMongo, using the
    mongo shell and IPython, basic document creation, modification, and queries.
  • Week 2: Queries, Updates, and Indexes This module elaborates on the MongoDB
    query language, complex atomic updates with MongoDB, creating indexes with
    MongoDB and evaluating them using the explain() method.
  • Week 3: MongoDB Aggregation This module includes simple aggregation using
    count, group, and distinct, batch aggregation using mapreduce, and
    ad-hoc reporting and aggregation using the MongoDB aggregation framework.
  • Week 4: Geospatial indexes and GridFS This module introduces geospatial
    indexing and querying in MongoDB and large object storage using MongoDB’s
    filesystem GridFS.

Each webcast session will begin with an explanation of the exercises from the previous
week, followed by an in-depth lecture on the week’s topics with plenty of
examples. I estimate each webcast will last about an hour.

Who is the class for?

If you’re interested in developing applications for MongoDB, this class is for you. We will be working the exercises using the Python programming language, so it’s best to at least be familiar with the concepts presented in the free Python tutorial. If you’re brand-new to MongoDB, or just want to “fill in the gaps”, the class is perfect for you to learn how to quickly become productive.

What do I get to take home?

As part of the class, you will receive access to a downloadable recording of all
the webcasts, all the slides and examples used in lectures, and a copy of my
ebook on MongoDB with Python and Ming.

So when is it, and how do I sign up?

At this point, I’m planning on starting the classes on September 17th, with webcasts
on Mondays and office hours on Thursdays. The price for the 4-week class is
$400. How do you sign up? Just click the “Buy Now” button below, and I’ll get in
touch with you via email. (If you don’t see a “Buy Now” button, just
click on the following link to sign up: Register for MongoDB for Developers.)





Of course, if you have any questions about the class, you can always contact me
at info@arborian.com, or you can always use the comment form below. You
can also sign up for my training mailing list if you’re interested in getting
notified of future offerings.

MongoDB Online Class: MongoDB for Operational Intelligence

One of the services offered by Arborian Consulting is online MongoDB training
classes. In this post, I’ll describe what one of these training classes is like,
using our MongoDB for Operational Intelligence class as an example.

Class Structure

MongoDB for Operational Intelligence is a four-week class that consists of the
following components:

  • Weekly lectures (2 per week, approx. 30-45 mins each)
  • Weekly lab assignments (2 per week, one per lecture)
  • Weekly office hours (2 per week, 1 hour per week)

Below, I’ll summarize exactly what these components will look like:

Weekly lectures

These lectures will be delivered using webcasting software, and will allow for
some back-and-forth between the instructor and the students. The main purpose of
the lectures, however, is to deliver the detailed information needed to complete
the exercises, where we believe students will find the largest benefit. The
lecture schedule for MongoDB for Operational Intelligence, for instance, is the
following:

  • Module 1: High-speed logging component
    • Lecture 1.1: Installation & intro to the class, schema design & basic
      operations for high-speed logging
    • Lecture 1.2: Queries, index design, and explain() for high-speed logging
    • Lecture 1.3: Computing ad-hoc aggregates using the MongoDB aggregation
      framework
    • Lecture 1.4: Design for data retention and sharding concerns
  • Module 2: Incremental aggregation
    • Lecture 2.1: Intro to MongoDB update modifiers and upserting
    • Lecture 2.2: Schema design at scale for incremental aggregation
  • Module 3: Hierarchical aggregation
    • Lecture 3.1: MapReduce introduction & MongoDB implementation and caveats
    • Lecture 3.2: Creating reusable mapreduce

Each of the lectures (except for the first one) starts with an explanation of the
lab exercise assigned at the conclusion of the previous lecture.

Lab assignments and office hours

The lab assignments are assigned at the conclusion of each exercise. The day
after each lecture, the instructor is available for questions online using the
same webcasting software. The intent of the office hours is to answer questions
about the exercises as well as providing more in-depth help to students who want
to immediately apply the lessons of the class to their particular business
problems.

Follow-up after the class

At the conclusion of the class, if students are still interested in continued
“office hours”-style consulting, this is offered as a follow-on coaching product
where you are guaranteed timely answers to email questions as well as having
access to weekly coaching office hours.

Ready to sign up, or just want more information?

Whether you’re interested in one of our upcoming MongoDB for Operational
Intelligence
classes or need something more customized to your needs, we’d love
to hear from you. To start the process, just send an email to
info@arborian.com and we’ll get
started today!

MongoDB Consulting and Evaluation

One of the services offered by Arborian Consulting is MongoDB Consulting and Evaluation.
In this post, I’ll tell you what these
engagements typically look like, and what to expect.

Initial discussions

Before signing any contracts or any money changes hands, we will have one or more
email or telephone discussions concerning the scope of the engagement. Typically
these discussions begin with a prospective client sending an email to
info@arborian.com. A typical engagement will be priced according to a set
number of days spent on-site in review meetings, and the number of days is highly
dependent on the size and complexity of your MongoDB
deployment. Once these discussions have concluded, Arborian generates a
statement of work covering the expected results of the engagement and the
expected time frame for completion. It’s not uncommon for Arborian to also sign a
non-disclosure agreement at this stage. If both parties agree with the scope and rate
of the statement of work, we will schedule a series of days for onsite meetings.

Onsite meetings

Once the statement of work and schedule of meetings is agreed upon, the Arborian
Consultant will come to your site to speak with the developers and operations
team responsible for the MongoDB deployment. These meetings usually begin with an
overview of the various deployments of MongoDB within your organization, followed
by in-depth code review with developers and operations review with your
operations team.

Code reviews

In the code review, the consultant will focus on the following topics:

  • MongoDB connection setup, including write safety, write concerns,
    journaling, and read preference
  • MongoDB Schema design in the various applications that use the MongoDB
    database(s)
  • Queries and updates in the code, with particular focus on performance
    and index design
  • Shard key selection, if applicable, for sharded collecitons

Operational deployment and monitoring reviews

During the time with the operations team, the consultant will focus on:

  • Server deployment (number of servers, memory and disk, CPU, etc.)
  • Replica set configuration and deployment
  • Server load estimation and sizing
  • Disaster recovery and backup strategies
  • Key performance metrics for monitoring and alerting
  • Sharding deployment, if applicable

The report

During the onsite meetings, the consultant may make serveral suggestions for
changes, and questions may come up for the consultant or the client to
investigate further. For instance, there may be various load statistics that the
client needs to collect for the sizing exercise, or there may be detailed
operational or development questions that require further research on the part of
the consultant.

Within a week of the onsite meetings, Arborian will generate a detailed report
for use by the client. This report will give an overview of the various uses of
MongoDB that were the subject of the review, with any concerns noted. The report
will then give a series of recommendations based on the review, including code,
schema, deployment, and monitoring recommendations. Finally, the report will
contain a question and answer section for the questions identified during the
review that were not addressed in other sections of the report.

Follow-up and further questions

Once the report has been delivered and accepted by the client, Arborian remains
available for occasional questions and support. More substantial follow-on
consulting will typically be covered by a separate statement of work, or the
initial statement of work may contain a retainer provision for ongoing support.

Ready to sign up?

So whether you currently have a MongoDB deployment that you’d like an expert
second set of eyes on, or whether you’re just considering MongoDB and need help
evaluating different architectural options, Arborian is glad to be your partner.
To start the process, just send an email to info@arborian.com and we’ll
get started right away!

Schema Design at Scale

Slides from July Atlanta MongoDB User Group

[slideshare id=13755693&doc=schemadesignatscale-120725122529-phpapp02]

Want to learn how to design your app to scale? Like, _really_ scale? On a single master database? To a global audience? Come hear Rick Copeland tell the story of the MongoDB Monitoring Service (MMS), a free realtime monitoring platform available for all MongoDB users, worldwide. You’ll learn how 10gen used MongoDB’s document-based storage to open up MMS to any and all of their customers, worldwide, for free, storing minute-by-minute statistics for every server process they run, all on a single-master MongoDB deployment.

MongoDB Books by Rick Copeland

Today I’m happy to make two book-related announcements.

MongoDB with Python and Ming Book


First off, I’ve collected, edited, and expanded upon the the MongoDB and Python series I’ve been working on at Just a Little Python and released it as an eBook. MongoDB with Python and Ming is now available on Lulu as an epub and on Amazon (at MongoDB with Python and Ming). In the book, I cover the following topics:

  • Using PyMongo, from installing & basic queries to the new aggregation framework
  • Overview of tuning your application for the best performance under MongoDB
    including discussions of replication and sharding
  • Using the Ming toolkit to enforce your schema in MongoDB
  • Using Ming’s object-document mapper (ODM) to raise the abstraction level of your
    MongoDB programming
  • Various tips, tricks, and goodies with Ming including Mongo-in-Memory,
    extending the ODM, and schema migrations

MongoDB Applied Design Patterns

The second announcement is that I’ve agreed to write another book for O’Reilly, tentatively titled MongoDB Design Patterns. That one’s not done yet, and it’s still early in the authoring / publication process, but it should be available early 2013. Sadly I don’t have a nice product page link to share with you yet, but I will leave you with a summary of the topics I plan to cover:

  • Part 1: MongoDB Design Patterns
    • Embedding versus referencing documents
    • Using polymorphic schemas
    • Using complex atomic updates
    • Optimistic updates with compensation (in lieu of multi-document transactions)
  • Part 2: Use Cases
    • Operational intelligence / real-time analytics
    • E-Commerce
    • Content management systems
    • Online advertising networks
    • Social Networking
    • Online Gaming

I hope that’s enough to whet your appetite. Of course, the book’s not finished, so if you have any other topics you’d love to see covered in such a book, I’d love to hear about it in the comments!

Python and MongoDB Training Classes

As promised earlier, We have now officially launched Python and MongoDB training
classes. The first batch of classes will be offered between August 7-10, 2012,
with early bird registration closing on July 31. We’d love to have you attend
one or more classes, so read on for more info!

The first two days of training (August 7-8) will be a
developer training class, focused on bringing you from little or no MongoDB
background to a proficient PyMongo developer. This is the class I’d recommend
you take if you’re completely new to MongoDB.

The third day (August 9th) consists of a case study in using
MongoDB for content management systems, from schema and index design to
common operations to scalability and sharding. Students in this class should
already have a basic knowledge and proficiency with MongoDB.

The final day (August 10th) consists of a case study in using
MongoDB for operational intelligence, giving you lots of information on
using MongoDB for various types of analytics. Of all the talks I’ve given on
MongoDB, the most popular by far was the one on “real time analytics,” so I
anticipate this class might sell out quickly.

Each class is limited to 12 participants in order to make sure I can spend a good
bit of time addressing the exact training needs of the attendees. The early bird
price (again, ending July 31st) for the 2-day training is $800, with each of the
one-day classes only $400 each. I’d love to see you there, so please sign up
soon!

Oh, and one more thing: You’ll notice that the venue is still “Atlanta Metro
Area, TBD”. I currently have a facility we can use in Roswell, Georgia, just
north of Atlanta, but if it turns out we have a lot of out-of-town attendees,
we’ll end up moving the class to a location that’s closer to the downtown area.

Python and MongoDB Training Classes

At this past PyCon, I had the opportunity to lead a python and mongodb tutorial and had a great time, but realized that the three hour format is just not enough time. So I’m hoping to start some full- or multi-day training classes, but I need your input to know what and
where to offer them.

So with all that said, I would love it — if you’re interested, of course — if you would sign up for updates about the classes as they progress by clicking the signup link. When signing up, please list your zip code so I can determine the best location to offer the classes. I live in Atlanta, so I know some venues here, but I’m more than willing to travel if there’s enough interest.

Even if the classes listed on the signup form don’t interest you and you don’t want updates, I’d love to hear in the comments below what kinds of classes/training you would be interested in. Thanks so much, and I look forward to hearing from you!