RailsTips by John Nunemaker

Flipper Preloading

2016-12-08T12:26:56-05:00

Flipper is already pretty optimized for production usage (flipper does billions of feature checks a day at GitHub), but the latest release (0.10.2) just received a couple new ones — all thanks to community contributions.

In jnunemaker/flipper#190, @mscoutermarsh said:

Hi,

Would love if there’s a way to preload enabled features for an actor. For Product Hunt, we check several features in a single request for current_user. With activerecord this adds up to quite a few queries. Would love to get it down to one.

From browsing source I don’t believe this is currently available.

I suggested using caching, as we do at GitHub, but also put some thoughts down on how preloading features could work. @gshutler saw the conversation and put pen to paper on a great pull request that made the idea concrete.

Some Background

Often times users of flipper do many feature checks per request. Normally, this would require a network call for each feature check, but flipper comes with a Memoizer middleware that stores the gate values for a feature in memory for the duration of a request. This makes it so the first feature check for a feature performs a network call, but subsequent ones just do in memory Hash fetches (aka dramatic hamster faster).

DSL#preload and Adapter#get_multi

The addition by gshutler was an adapter method get_multi, which takes an array of features and allows the adapter to load all features in one network call. He then added DSL#preload, along with a :preload option for the Memoizer middleware, which use get_multi to load any provided features in one network call instead of N.

By default, the Adapter implementation of get_multi performs one get per feature. Each adapter can then override this functionality with a more efficient means of fetching the data. For example, the active record adapter uses an IN query to select all gate values for all features.

def get_multi(features)
  db_gates = @gate_class.where(feature_key: features.map(&:key))
  grouped_db_gates = db_gates.group_by { |gate| gate.feature_key }
  result = {}
  features.each do |feature|
    result[feature.key] = result_for_feature(feature, grouped_db_gates[feature.key])
  end
  result
end

gshutler provided the redis implementation of get_multi and then I, as the maintainer of flipper, added get_multi for the other core adapters, along with updates to the Adapters and Optimizations documentation.

The Result

mscoutermarsh was kind enough to drop a graph of the result for an endpoint of Product Hunt.

I will leave it up to the reader to determine when he deployed the preloading. :) Looks like ~50% improvement on that endpoint.

I feel like this is a great example of the power of open source and how important it is to efficiently load data for requests, so I thought I would take the time to write it up. Hope you enjoyed it. Happy bulk loading!

Flipping ActiveRecord

2016-07-05T09:50:40-04:00

Originally, I did not like the idea of an ActiveRecord adapter for Flipper. I work on GitHub.com day to day, so everything I do has to be extremely performant. Using ActiveRecord for something like this felt like way too much overhead.

In fact, at GitHub, we use a custom adapter for Flipper built on good old raw SQL. Not only that, but we also use a memcache adapter which wraps the pure SQL adapter to avoid hitting MySQL most of the time. The memcache wrapper (at the time of this writing) works similar to the memoizing adapter that is included with Flipper (for those that are curious).

Over time, a few good options came out for using Flipper with ActiveRecord and they changed my mind. I realized that not every application is GitHub.com. Some applications value ease of integration over performance. I even wrote my own ActiveRecord adapter for SpeakerDeck, which is what I am now including in the core flipper repo (but available as a separate gem).

Installation

Drop the gem in your Gemfile:

gem "flipper-active_record"

Generate the migration:

rails g flipper:active_record

Usage

require 'flipper/adapters/active_record'
adapter = Flipper::Adapters::ActiveRecord.new
flipper = Flipper.new(adapter)
# profit...

From there, you use flipper the same as you would with any of the previously supported adapters. Internally, all features are stored in a flipper_features table and all gate related values are stored in a flipper_gates table. You can see more about the internals in the examples.

Conclusion

As of Flipper 0.7.3, you can now flip features with the easy and comfort of ActiveRecord and the peace of mind that as new AR versions are released, your flipper adapter will be updated and ready to go.

Happy flipping and happy holidays!

Flipper: Insanely Easy Feature Flipping

2015-12-22T20:51:27-05:00

Cross posted from JohnNunemaker.com as it seems relevant here too.


                                   __
                               _.-~  )
                    _..--~~~~,'   ,-/     _
                 .-'. . . .'   ,-','    ,' )
               ,'. . . _   ,--~,-'__..-'  ,'
             ,'. . .  (@)' ---~~~~      ,'
            /. . . . '~~             ,-'
           /. . . . .             ,-'
          ; . . . .  - .        ,'
         : . . . .       _     /
        . . . . .          `-.:
       . . . ./  - .          )
      .  . . |  _____..---.._/ _____
~---~~~~----~~~~             ~~

Nearly three years ago, I started work on Flipper. Even though there were other feature flipping libraries out there at the time, most notably rollout, I decided to whip up my own. Repeating others is, after all, one of the better ways to level up your game.

My main issue with rollout was that it was inflexible. You couldn’t change the ways in which a feature was enabled (ie: adding percentage of time rollout). You had to use redis. The list goes on. I poked around and couldn’t find anything like what I was looking for and I was in the mood to create, so I started flipper.

Most of the work was done off and on over the course of a few weeks. At the time, I was working on traffic graphs for GitHub and I wanted a way to turn features on/off in a flexible way.

Naming is hard

Flipper started as a simple ripoff of rollout with the primary difference being the use of adapters for storage instead of forcing redis. I struggled through awkward terminology and messy code for a while, until a great conversation with Brandon Keepers led me to the lingo flipper uses today: Actor, Feature and Gate (thanks Brandon!)

An actor is the thing trying to do something. It can be anything. On GitHub, the actor can be a user, organization or even a repository. Actors must respond to flipper_id. If you plan on using multiple types of actors, you can namespace the flipper_id with the type (ie: “User:6”, “Organization:12”, or “Repository: 2”).

A feature is something that you want to control enabled-ness for. On SpeakerDeck, I have a feature for search. With the click of a button, I can disable search if it is causing issues. On GitHub, we do thousands of feature checks per second across nearly 30 features (at the time of this writing) in different states of enabled-ness. If I told you what they were for I would have to kill you.

A gate determines if a feature is enabled for an actor. There are currently five gates — boolean, actor, group, % of actors and % of time. Amongst these you can rollout a new feature or control an existing one in whatever way you desire.

The Gates

The boolean gate allows completely enabling or disabling a feature. Think of it as a short cut to turning a feature fully on or fully off quickly. Enabling the boolean gate means the feature is on all the time for everyone. Disabling the boolean gate clears all enabled gates so the feature is completely off. Think of disable like a reset.

flipper = Flipper.new(adapter)
flipper[:search].enable # turn on
flipper[:search].disable # turn off
flipper[:search].enabled? # check

The actor gate allows enabling a feature for one or more specific actors. If you wanted to enable a new feature for one of your friends, you could use this gate.


flipper = Flipper.new(adapter)

flipper[:search].enable_actor user # turn on for actor
flipper[:search].enabled? user # true

flipper[:search].disable_actor user # turn off for actor
flipper[:search].enabled? user # false

The group gate allows enabling a feature for one or more groups. A group is a named block of code that returns true or false for a given actor. You could have a group for everyone in your company, or only engineering, or perhaps all users in the US or Europe. Anything your heart can imagine can be converted to a group and the entire group can be enabled at once.


Flipper.register(:admins) do |actor|
  actor.respond_to?(:admin?) &amp;&amp; actor.admin?
end

flipper = Flipper.new(adapter)

flipper[:search].enable_group :admins # turn on for admins
flipper[:search].disable_group :admins # turn off for admins

person = Person.find(params[:id])
flipper[:search].enabled? person # check if enabled, returns true if person.admin? is true

The percentage of actors gate allows slowly enabling a feature for a percentage of actors. As long as you continue to increase the percentage, an actor will consistently remain enabled. This allows for careful rollouts of a feature to everyone without overwhelming the system as a whole.


flipper = Flipper.new(adapter)

# turn search on for 10 percent of users in the system
flipper[:search].enable_percentage_of_actors 10

# checks if actor's flipper_id is in the enabled percentage by hashing
# user.flipper_id.to_s to ensure enabled distribution is smooth
flipper[:search].enabled? user

# turn search off for percentage of actors, other gates could retur true still
flipper[:search].disable_percentage_of_actors # sets to 0

The percentage of time gate allows enabling a feature for a random percentage of time. This is great for dark shipping and load testing. We actually used somehing similar to this to launch traffic graphs. We wanted to be positive that we could stand up to real traffic, so we performed ajax requests behind the scenes based on a percentage of time to the new feature. This allowed us to crank up the traffic, hit a bottleneck, kill the traffic, fix the bottlneck and repeat.


flipper = Flipper.new(adapter)


# turn on logging for 5 percent of the time
# could be on during one request and off the next
# could even be on first time in request and off second time
flipper[:logging].enable_percentage_of_time 5

# turn off logging for percentage of time
flipper[:logging].disable_percentage_of_time # sets to 0

All the gates are fully documented in the flipper repo as well.

Adapters

The adapter pattern is used to store which gates gates are enabled for a given feature. This means you can store flipper’s information however you desire. At the time of this writing, several adapters already exist, such as in memory, pstore, mongo, redis, cassandra, and active record. If one of those doesn’t tickle your fancy, creating a new adapter is really easy. The API for an adapter is this:

features – Get the set of known features.
add(feature) – Add a feature to the set of known features.
remove(feature) – Remove a feature from the set of known features.
clear(feature) – Clear all gate values for a feature.
get(feature) – Get all gate values for a feature.
enable(feature, gate, thing) – Enable a gate for a thing.
disable(feature, gate, thing) – Disable a gate for a thing.

At GitHub, we actually use a SQL adapter fronted by memcache for performance reasons.

Instrumentation

Flipper is wired to be instrumented out of the box, using ActiveSupport::Notifications API (though AS::Notifs are not specifically required). I even included automatic statsd instrumentation for those that are already using statsd.

require "flipper/instrumentation/statsd"
statsd = Statsd.new # or whatever your statsd instance is
Flipper::Instrumentation::StatsdSubscriber.client = statsd

If statsd doesn’t work for you, you can easily customize wherever you want to instrument to (ie: InfluxDB, New Relic, etc.).

Performance

Flipper was built based on my time working on Words with Friends and to be used at GitHub, so you can rest easy that it was built with performance in mind. The adapter API is intentionally made to allow for fetching all gate values for a feature in one network call and there is even (optional) built in memoization of adapter calls, including a Rack middleware which enables memoizing the fetching a feature for the duration of a request.

I’ve also thought about making it easy to allow for batch loading of features, though I haven’t needed this yet on any site I’ve worked on, so for now it remains a thought rather than an implementation.

Web UI

As a cherry on top, I’ve also created a rack middleware web UI for controlling flipper, which can be protected by any authentication you need. Below are a couple screenshots (at the time of this writing).

List of features

Viewing individual feature

All the gates can be manipulated to enable features however you would like through the click of a button or the clack of a keyboard.

Conclusion

Flipper is ready for the prime time. As I said earlier, we are now using it on GitHub.com for thousands of feature checks every second. The API changed a bit in 0.7, but is pretty stable now. Drop it in your next project and give it a try. If you do, please let me know (email or issue on the repo) as I love to know how people are using things I’ve worked on.

Of Late

2014-02-24T14:18:22-05:00

A lot has changed over the years. I now do a lot more than just rails and having railstips as my domain seems to mentally put me in a corner.

As such, I have revived johnnunemaker.com. While I may still post a rails topic here once in a while, I’ll be posting a lot more varied topics over there.

In fact, I just published my first post of any length, titled Analytics at GitHub. Head on over and give it a read.

Let Nunes Do It

2013-04-18T16:00:51-04:00

In a moment of either genius or delirium I decided to name my newest project after myself. Why? Well, here is the story whether you want to know or not.

Why Nunes?

Naming is always the hardest part of a project. Originally, it was named Railsd. The idea of the gem is automatically subscribe to all of the valuable Rails instrumentation events and send them to statsd in a sane way, thus Railsd was born.

After working on it a bit, I realized that the project was just an easy way to send Rails instrumentation events to any service that supports counters and timers. With a few tweaks, I made Railsd support InstrumentalApp, a favorite service of mine, in addition to Statsd.

Thus came the dilemma. No longer did the (already terrible) name Railsd make sense. As I sat and thought about what to name it, I remembered joking one time about naming a project after myself, so that every time anyone used it they had no choice but to think about me. Thus Nunes was born.

Lest you think that I just wanted to name it Nunes only so that you think of me, here is a bit more detail. Personally, I attempt to instrument everything I can. Be it code, the steps I take, or the calories I consume, I want to know what is going on. I have also noticed that which is automatically instrumented is the easiest to instrument.

I love tracking data so deeply that I want to instrument your code. Really, I do. I want to clone your repo, inject a whole bunch of instrumentation and deploy it to production, so you can know exactly what is going on. I want to sit over your shoulder and look at the graphs with you. Ooooooh, aren’t those some pretty graphs!

But I don’t work for you, or with you, so that would be weird.

Instead, I give you Nunes. I give you Nunes as a reminder that I want to instrument everything and you should too. I give you Nunes so that instrumenting is so easy that you will feel foolish not using it, at least a start. Go ahead, the first metric is free! Yep, I want you to have that first hit and get addicted, like me.

Using Nunes

I love instrumenting things. Nunes loves instrumenting things. To get started, just add Nunes to your gemfile:

# be sure to think of me when you do :)
gem "nunes"

Once you have nunes in your bundle (be sure to think of bundling me up with a big hug), you just need to tell nunes to subscribe to all the fancy events and provide him with somewhere to send all the glorious metrics:

# yep, think of me here too
require 'nunes'

# for statsd
statsd = Statsd.new(...)
Nunes.subscribe(statsd) # ooh, ooh, think of me!

# for instrumental
I = Instrument::Agent.new(...)
Nunes.subscribe(I) # one moooore tiiiime!

With just those couple of lines, you get a whole lot of goodness. Out of the box, Nunes will subscribe to the following Rails instrumentation events:

process_action.action_controller
render_template.action_view
render_partial.action_view
deliver.action_mailer
receive.action_mailer
sql.active_record
cache_read.active_support
cache_generate.active_support
cache_fetch_hit.active_support
cache_write.active_support
cache_delete.active_support
cache_exist?.active_support

Thanks to all the wonderful information those events provide, you will instantly get some of these counter metrics:

action_controller.status.200
action_controller.format.html
action_controller.exception.RuntimeError – where RuntimeError is the class of any exceptions that occur while processing a controller’s action.
active_support.cache_hit
active_support.cache_miss

And these timer metrics:

action_controller.runtime
action_controller.view_runtime
action_controller.db_runtime
action_controller.posts.index.runtime – where posts is the controller and index is the action
action_view.app.views.posts.index.html.erb – where app.views.posts.index.html.erb is the path of the view file
action_view.app.views.posts._post.html.erb – I can even do partials! woot woot!
action_mailer.deliver.post_mailer – where post_mailer is the name of the mailer
action_mailer.receive.post_mailer – where post_mailer is the name of the mailer
active_record.sql
active_record.sql.select – also supported are insert, update, delete, transaction_begin and transaction_commit
active_support.cache_read
active_support.cache_generate
active_support.cache_fetch
active_support.cache_fetch_hit
active_support.cache_write
active_support.cache_delete
active_support.cache_exist

But Wait, There is More!

In addition to doing all that work for you out of the box, Nunes will also help you wrap your own code with instrumentation. I know, I know, sounds too good to be true.


class User < ActiveRecord::Base
  extend Nunes::Instrumentable # OH HAI IT IS ME, NUNES

  # wrap save and instrument the timing of it
  instrument_method_time :save
end

This will instrument the timing of the User instance method save. What that means is when you do this:

# the nerve of me to name a user nunes
user = User.new(name: "NUNES!")
user.save

An event named instrument_method_time.nunes will be generated, which in turn is subscribed to and sent to whatever you used to send instrumentation to (statsd, instrumental, etc.). The metric name will default to “class.method”. For the example above, the metric name would be user.save. No fear, you can customize this.

class User < ActiveRecord::Base
  extend Nunes::Instrumentable # never

  # wrap save and instrument the timing of it
  instrument_method_time :save, 'crazy_town.save'
end

Passing a string as the second argument sets the name of the metric. You can also customize the name using a Hash as the second argument.

class User < ActiveRecord::Base
  extend Nunes::Instrumentable # gonna

  # wrap save and instrument the timing of it
  instrument_method_time :save, name: 'crazy_town.save'
end

In addition to name, you can also pass a payload that will get sent along with the generated event.


class User < ActiveRecord::Base
  extend Nunes::Instrumentable # give nunes up

  # wrap save and instrument the timing of it
  instrument_method_time :save, payload: {pay: "loading"}
end

If you subscribe to the event on your own, say to log some things, you’ll get a key named :pay with a value of "loading" in the event’s payload. Pretty neat, eh?

Conclusion

I hope you find Nunes useful and that each time you use it, you think of me and how much I want to instrument your code for you, but am not able to. Go forth and instrument!

P.S. If you have ideas for Nunes, create an issue and start some chatter. Let’s make Nunes even better!

An Instrumented Library in ~30 Lines

2013-01-23T15:35:26-05:00

The Full ~30 Lines

For the first time ever, I am going to lead with the end of the story. Here is the full ~30 lines that I will break down in detail during the rest of this post.

require 'forwardable'

module Foo
  module Instrumenters
    class Noop
      def self.instrument(name, payload = {})
        yield payload if block_given?
      end
    end
  end

  class Client
    extend Forwardable

    def_delegator :@instrumenter, :instrument

    def initialize(options = {})
      # some other setup for the client ...
      @instrumenter = options[:instrumenter] || Instrumenters::Noop
    end

    def execute(args = {})
      instrument('client_execute.foo', args: args) { |payload|
        result = # do some work...
        payload[:result] = result
        result
      }
    end
  end
end

client = Foo::Client.new({
  instrumenter: ActiveSupport::Notifications,
})

client.execute(...) # I AM INSTRUMENTED!!!

The Dark Side

A while back, statsd grabbed a hold of the universe. It swept in like an elf on a unicorn and we all started keeping track of stuff that previously was a pain to keep track of.

Like any wave of awesomeness, it came with a dark side that was felt, but mostly overlooked. Dark side? Statsd? Graphite? You must be crazy! Nope, not me, definitely not crazy this one. Not. At. All.

What did we all start doing in order to inject our measuring? Yep, we started opening up classes in horrible ways and creating hooks into libraries that sometimes change rapidly. Many times, updating a library would cause a break in the stats reporting and require effort to update the hooks.

The Ideal

Now that the wild west is settling a bit, I think some have started to reflect on that wave of awesomeness and realized something.

I no longer want to inject my own instrumentation into your library. Instead, I want to tell your library where it should send the instrumentation.

The great thing is that ActiveSupport::Notifications is pretty spiffy in this regard. By simply allowing your library to talk to an “instrumenter” that responds to instrument with an event name, optional payload, and optional block, you can make all your library’s users really happy.

The great part is:

You do not have to force your users to use active support. They simply need some kind of instrumenter that responds in similar fashion.
They no longer have to monkey patch to get metrics.
You can point them in the right direction as to what is valuable to instrument in your library, since really you know it best.

There are a few good examples of libraries (faraday, excon, etc.) doing this, but I haven’t seen a great post yet, so here is my attempt to point you in what I feel is the right direction.

The Interface

First, like I said above, we do not want to force requiring active support. Rather than require a library, it is always better to require an interface.

The interface that we will require is the one used by active support, but an adapter interface could be created for any instrumenter that we want to support. Here is what it looks like:

instrumenter.instrument(name, payload) { |payload|
  # do some code here that should be instrumented
  # we expect payload to be yielded so that additional 
  # payload entries can be included during the 
  # computation inside the block
}

Second, we have two options.

Either have an instrumenter or not. If so, then call instrument on the instrumenter. If not, then do not call instrument.
The option, which I prefer, is to have a default instrumenter that does nothing. Aptly, I call this the noop instrumenter.

The Implementation

Let’s pretend our library is named foo, therefore it will be namespaced with the module Foo. I typically namespace the instrumenters in a module as well. Knowing this, our noop instrumenter would look like this:

module Foo
  module Instrumenters
    class Noop
      def self.instrument(name, payload = {})
        yield payload if block_given?
      end
    end
  end
end

As you can see, all this instrumenter does is yield the payload if a block is given. As I mentioned before, we yield payload so that the computation inside the block can add entries to the payload, such as the result.

Now that we have a default instrumenter, how can we use it? Well, let’s imagine that we have a Client class in foo that is the main entry point for the gem.

module Foo
  class Client
    def initialize(options = {})
      # some other setup for the client ...
      @instrumenter = options[:instrumenter] || Instrumenters::Noop
    end
  end
end

This code simply allows people to pass in the instrumenter that they would like to use through the initialization options. Also, by default if no instrumenter is provided, we use are noop version that just yields the block and moves on.

Note: the use of || instead of #fetch is intentional. It prevents a nil instrumenter from being passed in. There are other ways around this, but I have found using the noop instrumenter in place of nil, better than complaining about nil.

Now that we have an :instrumenter option, someone can quite easily pass in the instrumenter that they would like to use.

client = Foo::Client.new({
  :instrumenter => ActiveSupport::Notifications,
})

Boom! Just like that we’ve allowed people to inject active support notifications, or whatever instrumenter they want into our library. Anyone else getting excited?

Once we have that, we can start instrumenting the valuable parts. Typically what I do is I setup delegation of the instrument to the instrumenter using ruby’s forwardable library:

require 'forwardable'

module Foo
  class Client
    extend Forwardable

    # forward instrument in this class to @instrumenter, for those unfamilier
    # with forwardable.
    def_delegator :@instrumenter, :instrument

    def initialize(options = {})
      # some other setup for the client ...
      @instrumenter = options[:instrumenter] || Instrumenters::Noop
    end
  end
end

Now we can use the instrument method directly anywhere in our client instance. For example, let’s say that client has a method named execute that we would like to instrument.

module Foo
  class Client
    def execute(args = {})
      instrument('client_execute.foo', args: args) { |payload|
        result = # do some work...
        payload[:result] = result
        result
      }
    end
  end
end

With just a tiny wrap of the instrument method, the users of our library can do a ridiculous amount of instrumentation. For one, note that we pass the args and the result along with the payload. This means our users can create a log subscriber and log each method call with timing, argument, and result information. Incredibly valuable!

They can also create a metrics subscriber that sends the timing information to instrumental, metriks, statsd, or whatever.

The Bonus

You can even provide log subscribers and metric subscribers in your library, which means instrumentation for your users is simply a require away. For example, here is the log subscriber I added to cassanity.

require 'securerandom'
require 'active_support/notifications'
require 'active_support/log_subscriber'

module Cassanity
  module Instrumentation
    class LogSubscriber < ::ActiveSupport::LogSubscriber
      def cql(event)
        return unless logger.debug?

        name = '%s (%.1fms)' % ["CQL Query", event.duration]

        # execute arguments are always an array where the first element is the
        # cql string and the rest are the bound variables.
        cql, *args = event.payload[:execute_arguments]
        arguments = args.map { |arg| arg.inspect }.join(', ')

        query = "#{cql}"
        query += " (#{arguments})" unless arguments.empty?

        debug "  #{color(name, CYAN, true)}  [ #{query} ]"
      end
    end
  end
end

Cassanity::Instrumentation::LogSubscriber.attach_to :cassanity

All the users of cassanity need to do to get logging of the CQL queries they are performing and their timing is require a file (and have activesupport in their gemfile):

require 'cassanity/instrumentation/log_subscriber'

And they get logging goodness like this in their terminal:

The Accuracy

But! BUT, you say. What about the tests? Well, my friend, I have that all wrapped up for you as well. Since it is so easy to pass through an instrumenter to our library, we should probably also have an in memory instrumenter that keeps track of the events instrumented, so you can test thoroughly, and ensure you don’t hose your users with incorrect instrumentation.

The previous sentence was quite a mouthful, so my next one will be short and sweet. For testing, I created an in-memory instrumenter that simply stores each instrumented event with name, payload, and the computed block result for later comparison. Check it:

module Foo
  module Instrumenters
    class Memory
      Event = Struct.new(:name, :payload, :result)

      attr_reader :events

      def initialize
        @events = []
      end

      def instrument(name, payload = {})
        result = if block_given?
          yield payload
        else
          nil
        end

        @events << Event.new(name, payload, result)

        result
      end
    end
  end
end

Now in your tests, you can do something like this when you want to check that your library is correctly instrumenting:

instrumenter = Foo::Instrumenters::Memory.new

client = Foo::Client.new({
  instrumenter: instrumenter,
})

client.execute(...)

payload = {... something .. }
event = instrumenter.events.last

assert_not_nil event
assert_equal 'client_execute.foo', event.name
assert_equal payload, event.payload

The End Result

With two instrumenters (noop, memory) and a belief in interfaces, we have created immense value.

Fin

Go forth and instrument all the things!

Booleans are Baaaaaaaaaad

2017-12-11T17:03:37-05:00

First off, did you pronounce the title of this article like a sheep? That was definitely the intent. Anyway, onward to the purpose of this here text.

One of the things I have learned the hard way is that booleans are bad. Just to be clear, I do not mean that true/false is bad, but rather that using true/false for state is bad. Rather than rant, lets look at a concrete example.

An Example

The first example that comes to mind is the ever present user model. On signup, most apps force you to confirm your email address.

To do this there might be a temptation to add a boolean, lets say “active”. Active defaults to false and upon confirmation of the email is changed to true. This means your app needs to make sure you are always dealing with active users. Cool. Problem solved.

It might look something like this:

class User
  include MongoMapper::Document

  scope :active, where(:active => true)

  key :active, Boolean
end

To prevent inactive users from using the app, you add a before filter that checks if the current_user is inactive. If they are, you redirect them to a page asking them to confirm there email or resend the email confirmation. Life is grand!

The Requirements Change

Then, out of nowhere comes an abusive user, let’s name him John. John is a real jerk. He starts harassing your other users by leaving mean comments about their moms.

In order to combat John, you add another boolean, lets say “abusive”, which defaults to false. You then add code to allow marking a user as abusive. Doing so sets “abusive” to true. You then add code that disallows users who have abusive set to true from adding comments.

The Problem

You now have split state. Should an abusive user really be active? Then a new idea pops into your head. When a user is marked as abusive, lets also set active to false, so they just can’t use the system. Oh, and when a user is marked as active, let’s make sure that abusive is set to false. Problem solved? Right? RIGHT? Wrong.

You are now maintaining one state with two switches. As requirements change, you end up with more and more situations like this and weird edge cases start to sneak in.

The Solution

How can we improve the situation? Two words: state machine. State machines are awesome. Lets rework our user model to use the state_machine gem.

class User
  include MongoMapper::Document

  key :state, String

  state_machine :state, :initial => :inactive do
    state :inactive
    state :active
    state :abusive

    event :activate do
      transition all => :active
    end

    event :mark_abusive do
      transition all => :abusive
    end
  end
end

With just the code above, we can now do all of this:

user = User.create
user.active? # false because initial is set to inactive
user.activate!
user.active? # true because we activated
user.mark_abusive!
user.active? # false
user.inactive? # false
user.abusive? # true

User.with_state(:active) # scope to return active
User.with_state(:inactive) # another scope
User.with_state(:abusive) # driving the example home

Pretty cool, eh? You get a lot of bang for the buck. I am just showing the beginning of what you can do, head on over to the readme to see more. You can add guards and all kinds of neat things. Problem solved. Right? RIGHT? Wrong.

Requirements Change Again

Uh oh! Requirements just changed again. Mr. CEO decided that instead of calling people abusive, we want to refer to them as “spammy”.

The app has been wildly successful and you now have millions of users. You have two options:

Leave the code as it is and just change the language in the views. This sucks because then you are constantly translating between the two.
Put up the maintenance page and accept downtime, since you have to push out new code and migrate the data. This sucks, because your app is down, simply because you did not think ahead.

A Better State Machine

Good news. With just a few tweaks, you could have built in the flexibility to handle changing your code without needing to change your data. The state machine gem supports changing the value that is stored in the database.

Instead of hardcoding strings in your database, use integers. Integers allow you to change terminology willy nilly in your app and only change app code. Let’s take a look at how it could work:

class User
  include MongoMapper::Document

  States = {
    :inactive => 0,
    :active => 1,
    :abusive => 2,
  }

  key :state, Integer

  state_machine :state, :initial => :inactive do
    # create states based on our States constant
    States.each do |name, value|
      state name, :value => value
    end

    event :activate do
      transition all => :active
    end

    event :mark_abusive do
      transition all => :abusive
    end
  end
end

With just that slight change, we now are storing state as an integer in our database. This means changing from “abusive” to “spammy” is just a code change like this:

class User
  include MongoMapper::Document

  States = {
    :inactive => 0,
    :active => 1,
    :spammy => 2,
  }

  key :state, Integer

  state_machine :state, :initial => :inactive do
    States.each do |name, value|
      state name, :value => value
    end

    event :activate do
      transition all => :active
    end

    event :mark_spammy do
      transition all => :spammy
    end
  end
end

Update the language in the views, deploy your changes and you are good to go. No downtime. No data migration. Copious amounts of flexibility for little to no more work.

Next time you reach for a boolean in your database, think again. Please! Whip out the state machine gem and wow your friends with your wisdom and foresight.

Four Guidelines That I Feel Have Improved My Code

2012-07-05T15:51:28-04:00

I have been thinking a lot about isolation, dependencies and clean code of late. I know there is a lot of disagreement with people vehemently standing in both camps.

I certainly will not say either side is right or wrong, but what follows is what I feel has improved my code. I post it here to formalize some recent thoughts and, if I am lucky, get some good feedback.

Before I rush into the gory details, I feel I should mention that I went down this path, not as an architecture astronout, but out of genuine pain in what I was working on.

My models were growing large. My tests were getting slow. Things did not feel “right”.

I started watching Gary Bernhardt’s Destroy All Software screencasts. He is a big proponent of testing in isolation. Definitely go get a subscription and take a day to get caught up.

On top of DAS, I started reading everything I could on the subject of growing software, clean code and refactoring. When I say reading, I really should say devouring.

I was literally prowling about like a lion, looking for the next book I could devour. Several times my wife asked me to get off my hands and knees and to kindly stop roaring about SRP.

Over the past few months as I have tried to write better code, I have definitely learned a lot. Learning without reflection and writing is not true learning for me.

Reflecting on why something feels better and then writing about it formalizes it in my head and has the added benefit of being available for anyone else who is struggling with the same.

Here are a few guidelines that have jumped out at me over the past few days as I reflected on what I have been practicing the past few months.

Guideline #1. One responsibility to rule them all

Single responsibility principle (SRP) is really hard. I think a lot of us are frustrated and feeling the pain of our chubby <insert your favorite ORM> classes. Something does not feel right. Working on them is hard.

The problem is context. You have to load a lot of context in your brain when you crack open that INFAMOUS user model. That context takes up the space where we would normally create and come up with new solutions.

Create More Classes

So what are we to do? Create more classes. Your models do not need to inherit from ActiveRecord::Base, or include MongoMapper::Document, or whatever.

A model is something that has business logic. Start breaking up your huge models that have persistence bolted on into plain old Ruby classes.

I am not going to lie to you. If you have not been doing this, it will not be easy. Everything will seem like it should just be tucked as another method in a model that also happens to persist data in a store.

Naming is Hard

Another pain point will be naming. Naming is fracking hard. You are welcome for the BSG reference there. I would like to take that statement a step further though.

Naming is hard because our classes and methods are doing too much. The fewer responsibilities your class has, the easier it will be to name, especially after a few months of practice.

An Example

Enough talk, lets see some code. In our track processors, which pop tracks off a queue and store reports in a database, we query for the gauge being tracked before storing reports. The purpose of this query is to ensure that the gauge is in good standing and that we should, in fact, store reports in the database for it.

A lot of people throw the tracking code on their site and never remove it or sign up for a paying account. We do this find to make sure those people noop, instead of creating tons of data that no one is paying for.

This query happens for each track and it is pulling information that rarely if ever changes. It seemed like a prime spot for a wee bit of caching.

First, I created a tiny service around the memcached client I decided to use. This only took an hour and it means that my application now has an interface for caching (get, set, delete, and fetch). I’ll talk more about this in guideline #3.

Once I had defined the interface Gauges would use for caching, I began to integrate it. After much battling and rewriting of the caching code, each piece felt like it was doing too much and things were getting messy.

I stepped back and thought through my plans. I wanted to cache only the attributes, so I threw everything away and started with that. First, I wanted to be able to read attributes from the data store.

class GaugeAttributeService
  def get(id)
    criteria = {:_id => Plucky.to_object_id(id)}
    if (attrs = gauge_collection.find_one(criteria))
      attrs.delete('_id')
      attrs
    end
  end
end

Given an id, this class returns a hash of attributes. That is pretty much one responsibility. Sweet action. Let’s move on.

Second, I knew that I wanted to add read-through caching for this. Typically read-through caching uses some sort of fetch pattern. Fetch is basically a shortcut for look first in the cache and if it is not there, compute the block, store the computed result in the cache and return the computed result.

If I would have added caching in the GaugeAttributeService class, I would have violated SRP. Describing the class would have been "checks the cache and if not there it fetches from database". Note the use of "and".

As Growing Object Oriented Software states:

Our heuristic is that we should be able to describe what an object does without using any conjunctions (“and,” “or”).

Instead, I created a new class to wrap (or decorate) my original service.

class GaugeAttributeServiceWithCaching
  def initialize(attribute_service = GaugeAttributeService.new)
    @attribute_service = attribute_service
  end

  def get(id)
    cache_service.fetch(cache_key(id)) {
      @attribute_service.get(id)
    }
  end
end

I left a few bits out of this class so we can focus on the important part, which is that all we do with this class is wrap the original one with a cache fetch.

As you can see, naming is pretty easy for this class. It is a gauge attribute service with caching and is named as such. It initializes with an object that must respond to get. Note also that it defaults to an instance of GaugeAttributeService.

Unit testing this class is easy as well. We can isolate the dependencies (attribute_service and cache_service) in the unit test and make sure that they do what we expect (fetch and get).

Note: There definitely could a point made that "with" is the same as "and" and therefore means that we are breaking SRP. Naming is hard, really hard. Rather than get mired forever in naming, I rolled with this convention and, at this point, it does not bother me. I am definitely open to suggestions. Another name I played with was CachedGaugeAttributeService.

Below is an example setup with new dependencies inject in the test that help us verify this classes behavior in isolation.

attributes = {'title' => 'GitHub'}

attribute_service = Class.new do
  def get(id)
    attributes
  end
end.new

cache_service = Class.new do
  def fetch(key)
    get(key) || yield
  end

  def get(key)
  end
end.new

service = GaugeAttributeServiceWithCaching.new(attribute_service)
service.cache_service = cache_service

Above I used dynamic classes. Instead of dynamic classes, one could use stubbing or whatever. I’ll talk more about cache_service= later.

Decorating in this manner means we can easily find without caching by using GaugeAttributeService or with caching by using GaugeAttributeServiceWithCaching.

The important thing to note is that we added new functionality to our application by extending existing parts instead of changing them. I read recently, but cannot find the quote, that if you can add a new feature purely by extending existing classes and creating new classes, you are winning.

Guideline #2. Use accessors for collaborators

In the example above, you probably noticed that when testing GaugeAttributeServiceWithCaching, I changed the cache service used by assigning a new one. What I often see is others using some top level config, or even worse they actually use a $ global.

# bad
Gauges.cache = Memcached.new
class GaugeAttributeServiceWithCaching
  def get(id)
    Gauges.cache.fetch(cache_key(id)) { … }
  end
end

# worse
$cache = Memcached.new
class GaugeAttributeServiceWithCaching
  def get(id)
    $cache.fetch(cache_key(id)) { … }
  end
end

What sucks about this is you are coupling this class to a global and coupling leads to pain. Instead, what I have started doing is using accessors to setup collaborators. Here is the example from above, but now with the cache service accessors included.


class GaugeAttributeServiceWithCaching
  attr_writer :cache_service

  def cache_service
    @cache_service ||= CacheService.new
  end
end

By doing this, we get a sane, memoized default for our cache service (CacheService.new) and the ability to change that default (cache_service=), either in our application or when unit testing.

Finding ourselves doing this quite often, we created a library, aptly named Morphine. Right now it does little more than what I just showed (memoized default and writer method to change).

As I have started to use this gem, I am getting more ideas for things that would be helpful. Here is the same code as above, but using Morphine. What I like about it, over a memoized method and an attr_writer is that it feels a little more declarative and creates a standard way of declaring collaborators for classes.


class GaugeAttributeServiceWithCaching
  include Morphine

  register :cache_service do
    CacheService.new
  end
end

Note also that I am not passing these dependencies in through initialize. At first I started with that and it looked something like this:

class GaugeAttributeServiceWithCaching
  def initialize(attribute_service = GaugeAttributeService.new, 
                 cache_service = CacheService.new)
    @attribute_service = attribute_service
    @cache_service = cache_service
  end
end

Personally, over time I found this method tedious. My general guideline is pass a dependency through initialize when you are going to decorate it, otherwise use accessors. Let’s look at the attribute service with caching again.

class GaugeAttributeServiceWithCaching
  include Morphine

  register :cache_service do
    CacheService.new
  end

  def initialize(attribute_service = GaugeAttributeService.new)
    @attribute_service = attribute_service
  end
end

Since this class is decorating an attribute service with caching, I pass in the service we want to decorate through initialize. I do not, however, pass in the cache service through initialize. Instead, the cache service uses Morphine (or accessors).

First, I think this makes the intent more obvious. The intent of this class is to wrap another object, so that object should be provided to initialize. Defaulting the service to wrap is merely a convenience.

Second, the cache service is a dependency, but not one that is being wrapped. It purely needs a sane default and a way to be replaced, therefore it uses Morphine (or accessors).

I cannot say this is a hard and fast rule that everyone should follow and that you are wrong if you do not. I can say that through trial and error, following this guideline has led to the least amount of friction while maintaining flexibility and isolation.

Guideline #3. Create real interfaces

As I mentioned above, the first thing I started with when working on the caching code was an interface for caching for the application, rather than just using a client directly. Occasionally what I see people do is create an interface, but wholesale pass arguments through to a client like so:

# bad idea
class CacheService
  def initialize(driver)
    @driver = driver
  end

  def get(*args)
    @driver.get(*args)
  end

  def set(*args)
    @driver.set(*args)
  end

  def delete(*args)
    @driver.delete(*args)
  end
end

In my opinion, this is abstracting at the wrong level. All you are doing is adding a layer of indirection on top of a driver. It makes it harder to follow and any exceptions that the driver raises will be raised in your application. Also, any parameters that the driver works with, your interface will work with. There is no point in doing this.

Instead, create a real interface. Define the methods and parameters you want your application to be able to use and make that work with whatever driver you end up choosing or changing to down the road.

Handling Exceptions

First, I created the exceptions that would be raised if anything goes wrong.

class CacheService

  class Error < StandardError
    attr_reader :original

    def initialize(original = $!)
      if original.nil?
        super
      else
        super(original.message)
      end
      @original = original
    end
  end

  class NotFound < Error; end
  class NotStored < Error; end

end

CacheService::Error is the base that all other errors inherit from. It wraps whatever the original error was, instead of discarding it, and defaults to the last exception that was raised $!. I will show how these are used in a bit.

Portability and serialization

I knew that I wanted the cache to be portable, so instead of just defaulting to Marshal’ing, I used only raw operations and ensured that I wrapped all raw operations with serialize and deserialize, where appropriate.

In order to allow this cache service class to work with multiple serialization methods, I registered a serializer dependency, instead of just using MultiJson’s dump and load directly. I then wrapped convenience methods (serialize and deserialize) that handle a few oddities induced by the driver I am wrapping.

class CacheService
  include Morphine

  register :serializer do
    Serializers::Json.new
  end

  private

  def serialize(value)
    serializer.serialize(value)
  end

  def deserialize(value)
    if value.is_a?(Hash) # get with multiple keys
      value.each { |k, v| value[k] = deserialize(v) }
      value
    else
      serializer.deserialize(value)
    end
  end
end

Handling exceptions (continued)

I then created a few private methods that hit the driver and wrap exceptions. These private methods are what the public methods use to ensure that exceptions are properly handled and such.

class CacheService
  private

  def driver_read(keys)
    deserialize(@driver.get(keys, false))
  rescue Memcached::NotFound
    raise NotFound
  rescue Memcached::Error
    raise Error
  end

  def driver_write(method, key, value)
    @driver.send method, key, serialize(value), DefaultTTL.call, false
  rescue Memcached::NotStored
    raise NotStored
  rescue Memcached::Error
    raise Error
  end

  def driver_delete(key)
    @driver.delete(key)
  rescue Memcached::NotFound
    raise NotFound
  end
end

At this point, no driver specific exceptions should ever bubble outside of the cache service. When using the cache service in the application, I need only worry about handling the cache service exceptions and not the specific driver exceptions.

If I change to a different driver, only this class changes. The rest of my application stays the same. Big win. How many times have you upgraded a gem and then had to update pieces all over your application because they willy-nilly changed their interface.

The public interface

All that is left is to define the public methods and parameters that can be used in the application.

class CacheService
  def get(keys)
    driver_read(keys)
  rescue NotFound
    nil
  end

  def set(key, value)
    driver_write :set, key, value
  end

  def delete(key)
    driver_delete key
  rescue NotFound
    nil
  end
end

At this point, the application has a defined interface that it can work with for caching and for the most part does not need to worry about exceptions as they are wrapped and, in some cases, even handled (ie: nil for NotFound).

Creating real interfaces ensures that expectations are set and upgrades are easy. Defined interfaces give other developers on the project confidence that if they follow the rules, things will work as expected.

Guideline #4. Test the whole way through

Whatever you want to call them, you need tests that prove all your components are wired together and working as expected, in the same manor as they will be used in production.

The reason a lot of developers have felt pain with pure unit testing and isolation is because they forget to add that secondary layer of tests on top that ensure that the way things are wired together works too.

Unit tests are there to drive our design. Acceptance tests are there to make sure that things are actually working the whole way through. Each of these are essential and not to be skipped over.

If you are having problems testing, it may be your design. If you are getting burned by isolation, you are probably missing higher level tests. You should be able to kill your unit tests and still have reasonable confidence that your system is working.

Nowadays, I often start with a high level test and then work my way in unit testing the pieces as I make them. I’ve found this keeps me focused on the value I am adding and ensures that my coverage is good.

Conclusion

While it has definitely taken a lot of trial and error, I am starting to find the right balance between flexibility, isolation and overkill.

Stick to single responsibilities.
Inject decorated dependencies through initialization and use accessors for other dependencies.
Create real interfaces.
Test in isolation and the whole way through.

Follow these guidelines and I believe you will start to feel better about the code you are writing, as I have over the past few months.

I would love to hear what others of you are doing and see examples. Comment below with gists, github urls, and other thoughts. Thanks!

Misleading Title About Queueing

2012-03-05T13:37:12-05:00

I don’t know about you, but I find it super frustrating when people blog about cool stuff at the beginning of a project, but then as it grows, they either don’t take the time to teach or they get all protective about what they are doing.

I am going to do my best to continue to discuss the strategies we are using to grow Gauges. I hope you find them useful and, by all means, if you have tips or ideas, hit me. Without any further ado…

March 1st of last year (2011), we launched Gauges. March 1st of this year (a few days ago), we finally switched to a queue for track requests. Yes, for one full year, we did all report generation in the track request.

1. In the Beginning

My goal for Gauges in the beginning was realtime. I wanted data to be so freakin’ up-to-date that it blew people’s minds. What I’ve realized over the past year of talking to customers is that sometimes Gauges is so realtime, it is too realtime.

That is definitely not to say that we are going to work on slowing Gauges down. More what it means, is that my priorities are shifting. As more and more websites use Gauges to track, availability moves more and more to the front of my mind.

Gut Detects Issue

A few weeks back, with much help from friends (Brandon Keepers, Jesse Newland, Kyle Banker, Eric Lindvall, and the top notch dudes at Fastest Forward), I started digging into some performance issues that were getting increasingly worse. They weren’t bad yet, but I had this gut feeling they would be soon.

My gut was right. Our disk io utilization on our primary database doubled from January to February, which was also our biggest growth in terms of number of track requests. If we doubled again from February to March, it was not going to be pretty.

Back to the Beginning

From the beginning, Gauges built all tracking reports on the fly in the track request. When a track came in, Gauges did a few queries and then performed around 5-10 updates.

When you are small, this is fine, but as growth happens, updating live during a track request can become an issue. I had no way to throttle traffic to the database. This meant if we had enough large sites start tracking at once, most likely our primary database would say uncle.

As you can guess, if your primary says uncle, you start losing tracking data. In my mind, priority number one is now to never lose tracking data. In order to do this effectively, I felt we were finally at the point where we needed to separate tracking from reporting.

2. Availability Takes Front Seat

My goal is for tracking to never be down. If, occasionally, you can’t get to your reporting data, or if, occasionally, your data gets behind for a few minutes, I will survive. If, however, tracking requests start getting tossed to the wayside while the primary screams for help, I will not.

I talked with some friends and found Kestrel to be very highly recommended, particularly by Eric (linked above). He swore by it, and was pushing it harder than we needed to, so I decided to give it a try.

A few hours later, my lacking JVM skills (Kestrel is Scala) were bearing their head big time. I still had not figured out how to build or run the darn thing. I posted to the mailing list, where someone quickly pointed out that Kestrel defaults to /var for logging, data, etc. and, unfortunately, spits out no error on startup about lacking permissions on OSX. One sudo !! later and I was in business.

3. Kestrel

Before I get too far a long with this fairy tail, let’s talk about Kestrel — what is it and why did I pick it?

Kestrel is a simple, distributed message queue, based on Blaine Cook’s starling. Here are a few great paragraphs from the readme:

Each server handles a set of reliable, ordered message queues. When you put a cluster of these servers together, with no cross communication, and pick a server at random whenever you do a set or get, you end up with a reliable, loosely ordered message queue.

In many situations, loose ordering is sufficient. Dropping the requirement on cross communication makes it horizontally scale to infinity and beyond: no multicast, no clustering, no “elections”, no coordination at all. No talking! Shhh!

It features the memcached protocol, is durable (journaled), has fanout queues, item expiration, and even supports transactional reads.

My favorite thing about Kestrel? It is simple, soooo simple. Sound too good to be true? Probably is, but the honeymoon has been great so far.

Now that we’ve covered what Kestrel is and that it is amazing, let’s talk about how I rolled it out.

4. Architecture

Here is the general idea. The app writes track requests to the tracking service. Workers process off those track requests and generate the reports in the primary database.

After the primary database writes, we send the information through a pusher proxy process, which sends it off to pusher.com, the service that provides all the live web socket goodness that is in Gauges. Below is a helpful sketch:

That probably all makes sense, but remember that we weren’t starting from scratch. We already had servers setup that were tracking requests and I needed to ensure that was uninterrupted.

5. Rollout

Brandon and I have been on a tiny classes and services kick of late. What I am about to say may sound heretical, but we’ve felt that we need a few more layers in our apps. We’ve started using Gauges as a test bed for this stuff, while also spending a lot of time reading about clean code and design patterns.

We decided to create a tiny standardization around exposing services and choosing which one gets used in which environment. Brandon took the standardization and moved it into a gem where we could start trying stuff and share it with others. It isn’t much now, but we haven’t needed it to be.

Declaring Services

We created a Registry class for Gauges, which defined the various pieces we would use for Kestrel. It looked something like this:

class Registry
  include Morphine

  register :track_service do
    KestrelTrackService.new(kestrel_client, track_config['queue'])
  end

  register :track_processor do
    KestrelTrackProcessor.new(blocking_kestrel_client, track_config['queue'])
  end
end

We then store an instance of this register in Gauges.app. We probably should have named it Gauges.registry, but we can worry about that later.

At this point, what we did probably seems pointless. The kestrel track service and processor look something like this:

class KestrelTrackService
  def initialize(client, queue)
    @client = client
    @queue  = queue
  end

  def record(attrs)
    @client.set(@queue, MessagePack.pack(attrs))
  end
end

class KestrelTrackProcessor
  def initialize(client, queue)
    @client = client
    @queue = queue
  end

  def run
    loop { process }
  end

  def process
    record @client.get(@queue)
  end

  def record(data)
    Hit.record(MessagePack.unpack(data))
  end
end

The processor uses a blocking kestrel client, which is just a decorator of the vanilla kestrel client. As you can see, all we are doing is wrapping the kestrel-client and making it send the data to the right place.

Using Services

We then used the track_service in our TrackApp like this:

class TrackApp < Sinatra::Base
  get '/track.gif' do
    # stuff
    Gauges.app.track_service.record(track_attrs)
    # more stuff
  end
end

Then, in our track_processor.rb process, we started the processor like so:

Gauges.app.track_processor.run

Like any good programmer, I knew that we couldn’t just push this to production and cross our fingers. Instead, I wanted to roll it out to work like normal, but also push track requests to kestrel. This would allow me to see kestrel receiving jobs.

On top of that, I also wanted to deploy the track processors to pop track requests off. At this point, I didn’t want them to actually process those track requests and write to the database, I just wanted to make sure the whole system was wired up correctly and stuff was flowing through it.

Another important piece was seeing how many track request we could store in memory with Kestrel, based on our configuration, and how it performed when it used up all the allocated memory and started going to disk.

Service Magic

The extra layer around tracking and processing proved to be super helpful. Note that the above examples used the new Kestrel system, but that I wanted to push this out and go through a verification process first. First, to do the verification process, we created a real-time track service:

class RealtimeTrackService
  def record(attrs)
    Hit.record(attrs)
  end
end

This would allow us to change the track_service in the registry to perform as it currently was in production. Now, we have two services that know how to record track requests in a particular way. What I needed next was to use both of these services at the same time so I created a multi track service:

class MultiTrackService
  include Enumerable

  def initialize(*services)
    @services = services
  end

  def record(attrs)
    each { |service| service.record(attrs) }
  end

  def each
    @services.each do |service|
      yield service
    end
  end
end

This multi track services allowed me to record to both services for a single track request. The updated registry looked something like this:

class Registry
  include Morphine

  register :track_service do
    which = track_config.fetch(:service, :realtime)
    send("#{which}_track_service")
  end

  register :multi_track_service do
    MultiTrackService.new(realtime_track_service, kestrel_track_service)
  end

  register :realtime_track_service do
    RealtimeTrackService.new
  end

  register :kestrel_track_service do
    KestrelTrackService.new(kestrel_client, track_config['queue'])
  end
end

Note that now, track_service selects which service to use based on the config. All I had to do was update the config to use “multi” as the track service and we were performing realtime track requests while queueing them in Kestrel at the same time.

The only thing left was to beef up failure around the Kestrel service so that it was limited in how it could affect production. For this, I chose to catch failures, log them, and move on as if they didn’t happen.

class KestrelTrackService

  def initialize(client, queue, options={})
    @client = client
    @queue  = queue
    @logger = options.fetch(:logger, Logger.new(STDOUT))
  end

  def record(attrs)
    begin
      @client.set(@queue, MessagePack.pack(attrs))
    rescue => e
      log_failure(attrs, e)
      :error
    end
  end

  private

  def log_failure(attrs, exception)
    @logger.info "attrs: #{attrs.inspect}  exception: #{exception.inspect}"
  end
end

I also had a lot of instrumentation in the various track services, so that I could verify counts at a later point. These verifications counts would prove whether or not things were working. I left that out as it doesn’t help the article, but you definitely want to verify things when you roll them out.

Now that the track service was ready to go, I needed a way to ensure that messages would flow through the track processors without actually modifying data. I used a similar technique as above. I created a new processor, aptly titled NoopTrackProcessor.

class NoopTrackProcessor < KestrelTrackProcessor
  def record(data)
    # don't actually record
    # instead  just run verification
  end
end

The noop track processor just inherits from the kestrel track processor and overrides the record method to run verification instead of generating reports.

Next, I adjusted the registry to allow flipping the processor that is used based on the config.

class Registry
  include Morphine

  register :track_processor do
    which = track_config.fetch(:processor, :noop)
    send("#{which}_track_processor")
  end

  register :kestrel_track_processor do
    KestrelTrackProcessor.new(blocking_kestrel_client, track_config['queue'])
  end

  register :noop_track_processor do
    NoopTrackProcessor.new(blocking_kestrel_client, track_config['queue'])
  end
end

With those changes in place, I could now set the track service to multi, the track processor to noop, and I was good to deploy. So I did. And it was wonderful.

6. Verification

For the first few hours, I ran the multi track service and turned off the track processors. This created the effect of queueing and never dequeueing. The point was to see how many messages kestrel could hold in memory and how it performed once messages started going to disk.

I used scout realtime to watch things during the evening while enjoying some of my favorite TV shows. A few hours later and almost 530k track requests later, Kestrel hit disk and hummed along like nothing happened.

Now that I had a better handle of Kestrel, I turned the track processors back on. Within a few minutes they had popped all the messages off. Remember, at this point, I was still just noop’ing in the track processors. All reports were still being built in the track request.

I let the multi track service and noop track processors run through the night and by morning, when I checked my graphs, I felt pretty confident. I removed the error suppression from the kestrel service and flipped both track service and track processor to kestrel in the config.

One more deploy and we were queueing all track requests in Kestrel and popping them off in the track processors after which, the reports were updated in the primary database. This meant our track request now performed a single Kestrel set, instead of several queries and updates. As you would expect, response times dropped like a rock.

It is pretty obvious when Kestrel was rolled out as the graph went perfectly flat and dropped to ~4ms response times. BOOM.

You might say, yeah, your track requests are now fast, but your track processors are doing the same work that the app was doing before. You would be correct. Sometimes growing is just about moving slowness into a more manageable place, until you have time to fix it.

This change did not just move slowness to a different place though. It separated tracking and reporting. We can now turn the track processors off, make adjustments to the database, turn them back on, and instantly, they start working through the back log of track requests queued up while the database was down. No tracking data lost.

I only showed you a handful of things that we instrumented to verify things were working. Another key metric for us, since we aim to be as close to realtime as possible, is the amount of time that it takes to go from queued to processing.

Based on the numbers, it takes us around 500ms right now. I believe as long as we keep that number under a second, most people will have no clue that we aren’t doing everything live.

7. Conclusion

By no means are we where I want us to be availability-wise, but at least we are one more step in the right direction. Hopefully this article gives you a better idea how to roll things out into production safely. Layers are good. Whether you are using Rails, Sinatra, or some other language entirely, layer services so that you can easily change them.

Also, we are now a few days in and Kestrel is a beast. Much thanks to Robey for writing it and Twitter for open sourcing it!

More Tiny Classes

2012-02-06T07:00:10-05:00

My last post, Keep ’Em Separated, made me realize I should start sharing more about what we are doing to make Gauges maintainable. This post is another in the same vein.

Gauges allows you to share a gauge with someone else by email. That email does not have to exist prior to your adding it, because nothing is more annoying that wanting to share something with a friend or co-worker, but first having to get them to sign up for the service.

If the email address is found, we add the user to the gauge and notify them that they have been added.

If the email address is not found, we create an invite and then send an email to notify them they should sign up, so they can see the data.

The Problem: McUggo Route

The aforementioned sharing logic isn’t difficult, but it was just enough that our share route was getting uggo. It started off looking something like this:

post('/gauges/:id/shares') do
  gauge = Gauge.get(params['id'])

  if user = User.first_by_email(params[:email])
    Stats.increment('shares.existing')
    gauge.add_user(user)
    ShareWithExistingUserMailer.new(gauge, user).deliver
    {:share => SharePresenter.new(gauge, user)}.to_json
  else
    invite = gauge.invite(params['email'])
    Stats.increment('shares.new')
    ShareWithNewUserMailer.new(gauge, invite).deliver
    {:share => SharePresenter.new(gauge, invite)}.to_json
  end
end

Let’s be honest. We’ve all seen Rails controller actions and Sinatra routes that are fantastically worse, but this was really burning my eyes, so I charged our programming butler to refactor it.

The Solution: Move Logic to Separate Class

We talked some ideas through, and once he had finished, the route looked more like this:

post('/gauges/:id/shares') do
  gauge    = Gauge.get(params['id'])
  sharer   = GaugeSharer.new(gauge, params['email'])
  receiver = sharer.perform
  {:share => SharePresenter.new(gauge, receiver)}.to_json
end

Perfect? Who cares. Waaaaaaaaay better? Yes. The concern of a user existing or not is moved away to a place where the route could care less.

Also, the bonus is that sharing a gauge can now be used without invoking a route.

So what does GaugeSharer look like?

class GaugeSharer
  def initialize(gauge, email)
    @gauge = gauge
    @email = email
  end

  def user
    @user ||= … # user from database
  end

  def existing?
    user.present?
  end

  def perform
    if existing?
      share_with_existing_user
    else
      share_with_invitee
    end
  end

  def share_with_existing_user
    # add user to gauge
    ShareWithExistingUserMailer.new(@gauge, user).deliver
    user
  end

  def share_with_invitee
    invite = ... # invite to db
    ShareWithNewUserMailer.new(@gauge, invite).deliver
    invite
  end
end

Now, instead of having several higher-level tests to check each piece of logic, we can just ensure that GaugeSharer is invoked correctly in the route test and then test the crap out of GaugeSharer with unit tests. We can also use GaugeSharer anywhere else in the application that we want to.

This isn’t a dramatic change in code, but it has a dramatic effect on the coder. Moving all these bits into separate classes and tiny methods improves ease of testing and, probably more importantly, ease of grokking for another developer, including yourself at a later point in time.

Keep 'Em Separated

2012-02-04T16:14:59-05:00

Note: If you end up enjoying this post, you should do two things: sign up for Pusher and then subscribe to destroy all software screencasts. I’m not telling you do this because I get referrals, I just really like both services.

For those that do not know, Gauges currently uses Pusher.com for flinging around all the traffic live.

Every track request to Gauges sends a request to Pusher. We do this using EventMachine in a thread, as I have previously written about.

The Problem

The downside of this, is when you get to the point we were (thousands of a requests a minute), there are so many pusher notifications to send (thousands of a minute) that the EM thread starts stealing a lot of time from the main request thread. You end up with random slow requests that have one to five seconds of “uninstrumented” time. Definitely not a happy scaler does this make.

In the past, we had talked about keeping track of which gauges were actually being watched and only sending a notification for those, but never actually did anything about it.

The Solution

Recently, Pusher added web hooks on channel occupy and channel vacate. This, combined with a growing number of slow requests, was just the motivation I needed to come up with a solution.

We (@bkeepers and I) started by mapping a simple route to a class.

class PusherApp < BaseApp
  post '/pusher/ping' do
    webhook = Pusher::WebHook.new(request)
    if webhook.valid?
      PusherPing.receive(webhook)
      'ok'
    else
      status 401
      'invalid'
    end
  end
end

Using a simple class method like this moves all logic out of the route and into a place that is easier to test. The receive method iterates the events and runs each ping individually.

class PusherPing
  def self.receive(webhook)
    webhook.events.each do |event|
      new(event, webhook.time).run
    end
  end
end

At first, we had something like this for each PusherPing instance.

class PusherPing
  def initialize(event, time)
    @event         = event || {}
    @time          = time
    @event_name    = @event['name']
    @event_channel = @event['channel']
  end

  def run
    case @event_name
    when 'channel_occupied'
      occupied
    when 'channel_vacated'
      vacated
    end
  end

  def occupied
    update(@time)
  end

  def vacated
    update(nil)
  end

  def update(value)
    # update the gauge in the
    # db with the value
  end
end

We pushed out the change so we could start marking gauges as occupied. We then forced a browser refresh, which effectively vacated and re-occupied all gauges people were watching.

Once we new the occupied state of each gauge was correct, we added the code to only send the request to pusher on track if a gauge was occupied.

Deploy. Celebrate. Booyeah.

The New Problem

Then, less than a day later, we realized that pusher doesn’t guarantee the order of events. Imagine someone vacating and then occupying a gauge, but receiving the occupy first and then the vacate.

This situation would mean that live tracking would never turn on for the gauge. Indeed, it started happening to a few people, who quickly let us know.

The New Solution

We figured it was better to send a few extra notifications than never send any, so we decided to “occupy” gauges on our own when people loaded up the Gauges dashboard.

We started in and quickly realized the error of our ways in the pusher ping. Having the database calls directly tied to the PusherPing class meant that we had two options:

Use the PusherPing class to occupy a gauge when the dashboard loads, which just felt wrong.
Re-write it to separate the occupying and vacating of a gauge from the PusherPing class.

Since we are good little developers, we went with 2. We created a GaugeOccupier class that looks like this:

class GaugeOccupier
  attr_reader :ids

  def initialize(*ids)
    @ids = ids.flatten.compact.uniq
  end

  def occupy(time=Time.now.utc)
    update(time)
  end

  def vacate
    update(nil)
  end

private

  def update(value)
    return if @ids.blank?
    # do the db updates
  end
end

We tested that class on its own quite quickly and refactored the PusherPing to use it.

class PusherPing
  def run
    case @event_name
    when 'channel_occupied'
      GaugeOccupier.new(gauge_id).occupy(@time)
    when 'channel_vacated'
      GaugeOccupier.new(gauge_id).vacate
    end
  end
end

Boom. PusherPing now worked the same and we had a way to “occupy” gauges separate from the PusherPing. We added the occupy logic to the correct point in our app like so:

ids = gauges.map { |gauge| gauge.id }
GaugeOccupier.new(ids).occupy

At this point, we were now “occupied” more than “vacated”, which is good. However, you may have noticed, that we still had the issue where someone loads the dashboard, we occupy the gauge, but then receive a delayed, or what I will now refer to as “stale”, hook.

To fix the stale hook issue, we simply added a bit of logic to the PusherPing class to detect staleness and simple ignore the ping if it is stale.

class PusherPing
  def run
    return if stale?
    # do occupy/vacate
  end

  def stale?
    return false if gauge.occupied_at.blank?
    gauge.occupied_at > @time
  end
end

Closing Thoughts

This is by no means a perfect solution. There are still other holes. For example, a gauge could be occupied by us after we receive a vacate hook from pusher and stay in an “occupied” state, sending notifications that no one is looking for.

To fix that issue, we can add a cleanup cron or something that occasionally gets all occupied channels from pusher and vacates gauges that are not in the list.

We decided it wasn’t worth the time. We pushed out the occupy fix and are now reaping the benefits of sending about 1/6th of the pusher requests we were before. This means our EventMachine thread is doing less work, which gives our main thread more time to process requests.

You might think us crazy for sending hundreds of http requests in a thread that shares time with the main request thread, but it is actually working quite well.

We know that some day we will have to move this to a queue and an external process that processes the queue, but that day is not today. Instead, we can focus on the next round of features that will blow people’s socks off.

What a Year

2012-01-01T14:13:58-05:00

The last 12 months have been nuts. My health and professional/personal life were completely at odds.

Between January and August, I had three hernia surgeries. As if that wasn’t enough for one year, the last few months of the year I’ve been plagued by a few other ailments (which are still giving me a hard time). Definitely a rough stretch. I will never take health for granted again and really look forward to getting back to “normal”.

Quite the contrary to my health, Ordered List grew from 2 to 5 people, helped Zynga launch Words with Friends on Facebook, launched Gauges and Speaker Deck while improving Harmony, and, finally, was acquired by the only other company in the world I wanted to be a part of, GitHub.

Here is to a healthy 2012.

Acquired

2011-12-05T13:03:56-05:00

Several times over the past few years, I have stated that GitHub is probably the only other place I could see myself working. Today, it is official. All of Ordered List has joined GitHub.

Maybe someday I’ll write about what Ordered List has meant to me, but today I am going to fully enjoy the present, instead of rambling about the past. I have no doubt great things will come of this.

You can read more at GitHub and Ordered List.

Creating an API

2011-12-01T12:43:33-05:00

A few weeks back, we publicly released the Gauges API. Despite building Gauges from the ground up as an API, it was a lot of work. You really have to cross your t’s and dot your i’s when releasing an API.

1. Document as You Build

We made the mistake of documenting after most of the build was done. The problem is documenting sucks. Leaving that pain until the end, when you are excited to release it, makes doing the work twice as hard. Thankfully, we have a closer on our team who powered through it.

2. Be Consistent

As we documented the API, we noticed a lot of inconsistencies. For example, in some places we return a hash and in others we returned an array. Upon realizing these issues, we started making some rules.

To solve the array/hash issue, we elected that every response should return a hash. This is the most flexible solution going forward. It allows us to inject new keys without having to convert the response or release a whole new version of the API.

Changing from an array to a hash meant that we needed to namespace the array with a key. We then noticed that some places were name-spaced and others weren’t. Again, we decided on a rule. In this case, all top level objects should be name-spaced, but objects referenced from a top level object or a collection of several objects did not require name-spacing.

{users:[{user:{...}}, {user:{...}}]} // nope
{users:[{...}, {...}]} // yep
{username: 'jnunemaker'} // nope
{user: {username:'jnunemaker'}} // yep

You get the idea. Consistency is important. It is not so much how you do it as that you always do it the same.

3. Provide the URLs

Most of my initial open source work was wrapping APIs. The one thing that always annoyed me was having to generate urls. Each resource should know the URLs that matter. For example, a user resource in Gauges has a few URLs that can be called to get various data:

{
  "user": {
    "name": "John Doe",
    "urls": {
      "self": "https://secure.gaug.es/me",
      "gauges": "https://secure.gaug.es/gauges",
      "clients": "https://secure.gaug.es/clients"
    },
    "id": "4e206261e5947c1d38000001",
    "last_name": "Doe",
    "email": "john@doe.com",
    "first_name": "John"
  }
}

The previous JSON is the response of the resource /me. /me returns data about the authenticated user and the URLs to update itself (self), get all gauges (/gauges), and get all API clients (/clients). Let’s say next you request /gauges. Each gauge returned has the URLs to get more data about the gauge.

{
  "gauges": [
    {
      // various attributes
      "urls": {
        "self":"https://secure.gaug.es/gauges/4ea97a8be5947ccda1000001",
        "referrers":"https://secure.gaug.es/gauges/4ea97a8be5947ccda1000001/referrers",
        "technology":"https://secure.gaug.es/gauges/4ea97a8be5947ccda1000001/technology",
        // ... etc
      },
    }
  ]
}

We thought this would prove helpful. We’ll see in the long run if it turns out to work well.

4. Present the Data

Finally, never ever use to_json and friends from a controller or sinatra get/post/put block. At least as a bare minimum rule, the second you start calling to_json with :methods, :except, :only, or any of the other options, you probably want to move it to a separate class.

For Gauges, we call these classes presenters. For example, here is a simplified version of the UserPresenter.

class UserPresenter
  def initialize(user)
    @user = user
  end

  def as_json(*)
    {
      'id'          => @user.id,
      'email'       => @user.email,
      'name'        => @user.name,
      'first_name'  => @user.first_name,
      'last_name'   => @user.last_name,
      'urls'        => {
        'self'    => "#{Gauges.api_url}/me",
        'gauges'  => "#{Gauges.api_url}/gauges",
        'clients' => "#{Gauges.api_url}/clients",
      }
    }
  end
end

Nothing fancy. Just a simple ruby class that sits in app/presenters. Here is an example of the the /me route looks like in our Sinatra app.

get('/me') do
  content_type(:json)
  sign_in_required
  {:user => UserPresenter.new(current_user)}.to_json
end

This simple presentation layer makes it really easy to test the responses in detail using unit tests and then just have a single integration test that makes sure overall things look good. I’ve found this tiny layer a breath of fresh air.

I am sure that nothing above was shocking or awe-inspiring, but I hope that it saves you some time on your next public API.

Stupid Simple Debugging

2011-09-01T11:49:52-04:00

There are all kinds of fancy debugging tools out there, but personally, I get the most mileage out of good old puts statements.

When I started with Ruby, several years ago, I used puts like this to debug:

puts account.inspect

The problem with this is two fold. First, if you have a few puts statements, you don’t know which one is actually which object. This always led me to doing something like this:

puts "account: #{account.inspect}"

Second, depending on whether you are just in Ruby or running an app through a web server, puts is sometimes swallowed. This led me to often times do something like this when using Rails:

Rails.logger.debug "account: #{account.inspect}"

Now, not only do I have to think about which method to use to debug something, I also have to think about where the output will be sent so I can watch for it.

Enter Log Buddy

Then, one fateful afternoon, I stumbled across log buddy (gem install log_buddy). In every project, whether it be a library, Rails app, or Sinatra app, one of the first gems I throw in my Gemfile is log_buddy.

Once you have the gem installed, you can tell log buddy where your log file is and whether or not to actually log like so:

LogBuddy.init({
  :logger   => Gauges.logger,
  :disabled => Gauges.production?,
})

Simply provide log buddy with a logger and tell it if you want it to be silenced in a given situation or environment and you get some nice bang for your buck.

One Method, One Character

First, log buddy adds a nice and short method named d. d is 4X shorter than puts, so right off the bat you get some productivity gains. The d method takes any argument and calls inspect on it. Short and sweet.

d account # will puts account.inspect
d 'Some message' # will puts "Some message"

The cool part is that on top of printing the inspected object to stdout, it also logs it to the logger provided in in LogBuddy.init. No more thinking about which method to use or where output will be. One method, output is sent to multiple places.

This is nice, but it won’t win you any new friends. Where log buddy gets really cool, is when you pass it a block.

d { account } # puts and logs account = <Account ...>

Again, one method, output to stdout and your log file, but when you use a block, it does magic to print out the variable name and that inspected value. You can also pass in several objects, separating them with semi-colons.

d { account; account.creator; current_user }

This gives you each variable on its own line with the name and inspected value. Nothing fancy, but log buddy has saved me a lot of time over the past year. I figured it was time I send it some love.

RailsTips by John Nunemaker

Flipper Preloading

Some Background

DSL#preload and Adapter#get_multi

The Result

Flipping ActiveRecord

Installation

Usage

Conclusion

Flipper: Insanely Easy Feature Flipping

Naming is hard

The Gates

Adapters

Instrumentation

Performance

Web UI

List of features

Viewing individual feature

Conclusion

Of Late

Let Nunes Do It

Why Nunes?

Using Nunes

But Wait, There is More!

Conclusion

An Instrumented Library in ~30 Lines

The Full ~30 Lines

The Dark Side

The Ideal

The Interface

The Implementation

The Bonus

The Accuracy

The End Result

Further Reading

Fin

Booleans are Baaaaaaaaaad

An Example

The Requirements Change

The Problem

The Solution

Requirements Change Again

A Better State Machine

Four Guidelines That I Feel Have Improved My Code

Guideline #1. One responsibility to rule them all

Create More Classes

Naming is Hard

An Example

Guideline #2. Use accessors for collaborators

Guideline #3. Create real interfaces

Handling Exceptions

Portability and serialization

Handling exceptions (continued)

The public interface

Guideline #4. Test the whole way through

Conclusion

Misleading Title About Queueing

1. In the Beginning

Gut Detects Issue

Back to the Beginning

2. Availability Takes Front Seat

3. Kestrel

4. Architecture

5. Rollout

Declaring Services

Using Services

Service Magic

6. Verification

7. Conclusion

More Tiny Classes

The Problem: McUggo Route

The Solution: Move Logic to Separate Class

Keep 'Em Separated

The Problem

The Solution

The New Problem

The New Solution

Closing Thoughts

What a Year

Acquired