performance

Performance Doesn't Matter (Unless You Can Prove That It Does)

I get bombarded with questions like these all the time from my fellow developers:

  • "Which is faster, .Count() or .Any()?"
  • "Should I use a Redis database for my reads rather then SQL Server? I heard Redis is amazingly fast."
  • "Won't [Snippet X] be faster than [Snippet Y] because it's more optimized?"

The interesting thing about all of these questions is that they each have a defined, measurable answer. Almost certainly, .Any() will be faster than .Count() (for an IEnumerable, as we'll see below). Almost certainly, in simple cases, Redis will be faster for reads than SQL Server. Optimized code will be faster than non-optimized code (because, really, that's kinda what optimized means). All of these questions can be answered in a definable way.

In each of these cases, though, the developer asking is probably posing the wrong question. The dev is concerned about performance, forgetting that 97% of the time performance doesn't matter. Spending time on making your app perform better is, often, time and effort wasted, as the energy you put in to solving that problem might have been better spent elsewhere.

In The Line Of Fire Business

Most developers (including me) work on line-of-business apps. A line-of-business app is one that directly contributes to a customer transaction or business need. A great many of these apps are internal, meaning that no one outside that company will ever see them.

The vast majority of the time, programmers (including myself) are working on software that will probably never be seen by the outside world at large, never be used by anyone other than our coworkers, direct customers, and peers. In these situations, what's most important is zeroing in on getting the user requirements implemented correctly. In fact, trying to do micro-optimization of this type on these kinds of apps actually hurts your team, as the time spent trying to implementing these micro-optimizations could be better spent implementing the core requirements.

A naval cadet looks down the sight of her training weapon.

Let's take one of the questions from earlier. Here's some sample code that uses Count() and Any() to determine if a User has any Email Addresses:

//attempt1
var user = userRepository.Get(id);  
if(user.EmailAddresses.Count() > 0)  
{
    //Send an email to all addresses
}

//attempt2
var user = userRepository.Get(id);  
if(user.EmailAddresses.Any())  
{
    //Send an email to all addresses
}

What's interesting about these two snippets is that the answer of "which is faster" depends on the type of the collection EmailAddresses; if it is a simple IEnumerable then Any() is faster, but for List or ICollection Count() is faster.

Let's say we (in our capacity as a lead developer) ran across either this code in a code review for an internal app. Should we flag either scenario, tell the developer to change their code for performance reasons? No, because there isn't sufficient justification for changing the code.

The mental algorithm I use to determine if I should refactor a given code snippet goes like this:

  1. If the code doesn't work, change it.
  2. If the code doesn't fulfill the customer requirements, change it.
  3. If the code doesn't match team coding standards, change it.
  4. If you think the code isn't fast enough, prove it.
  5. If you can't prove it or don't have time to, leave it alone.

If I can't prove that the code is a performance problem, then I probably don't care about fixing it. And yet, lots of devs will still try to optimize everything, even snippets that don't really need it (I know because I'm very much guilty of this). The question becomes: why do we like trying to optimize everything, even when we suspect that it probably isn't that important to do so? The answer comes down to one thing: it feels good to solve problems.

Solve All The Things

Software development is 100% about solving problems. Without problems there wouldn’t be a need for software.

John Sonmez

Programmers are, at their core, problem solvers. We take pride in our ability to break down problems into composable parts, solve for the issue each of those parts represent, then put the parts back together in a way that resembles a functional, working application. We do this because a) we get paid for it, but more importantly b) because it's fun! Solving problems makes us happy.

Sometimes, though, our problem-solving drive misfires and manufactures solutions when, really, there wasn't a problem to solve in the first place.

My coworker (we'll call her Claire) came to me with a problem the other day. She had a snippet of code that looked something like this:

public string EmailBody(string salutation, string name, string bodyText, string closing, string signatureName)  
{
    string body = "";
    body += salutation + " " + name + ":\n\n" + bodyText + "\n" + closing + "\n" + signatureName;
    return body;
}

Claire had read this answer on StackOverflow and wanted to refactor this method to use StringBuilder so that, in her words, "it would be faster". The refactored method looked like this:

public string EmailBody(string salutation, string name, string bodyText, string closing, string signatureName)  
{
    StringBuilder body = new StringBuilder();
    body.AppendLine(salutation);
    body.AppendLine(name);
    body.AppendLine("");
    body.Append(bodyText);
    body.AppendLine("");
    body.AppendLine(closing);
    body.AppendLine(signatureName);
    return body.ToString();
}

She hadn't run any benchmarks on these two snippets, and was basing her supposition that the refactored code would be "faster" on a hunch and that StackOverflow question. This method was called fairly often in the app we were building, but that app had very little users and hadn't shown any need for performance testing. In short, she'd solved a problem she couldn't prove existed.

I told her to go back and benchmark the two methods. The difference between them turned out to be miniscule (something like 10ms) and so I told her not to worry about it, as we had bigger fish to fry. She objected, saying that this could back to bite us if the app gets heavy load. I agreed with her, essentially saying if it did bite us, we'd fix it, but until then we have 40% of the requirements not implemented and we needed to get those done first before doing any optimizations at this level. She wasn't happy about it, but understood.

All of which is to say: don't fix things performance-related because you think they are a problem. Rather, fix things you can prove are a problem.

Correct, Readable, Fast, In That Order

None of this is to say that performance never matters (after all, those who speak in absolutes are invariably wrong). Just that performance doesn't matter as much as we might like to think it does. When it does matter, it can matter A LOT.

A closeup shot of a steer ruler Steel ruler closeup from Wikimedia, used under license

Let's take StackOverflow for instance, whose co-founder Jeff Atwood explicitly writes that performance is a feature for the site. I can tell, just by using it over the last seven years, that the team has spent a LOT of time tweaking, improving, generally making the site's performance so good that it's a non-issue as to whether or not StackOverflow will respond quickly. The site is remarkably performant on relatively little hardware (and yes, I know that post is two years old and they've probably gotten some updates since). StackOverflow takes immense pride in providing one of the most responsive web experiences around, and I'd be willing to bet that the site's performance is a major reason why it's become the go-to site for developers needing answers.

But the question is, do you work on huge customer-facing sites like StackOverflow?

If you don't, and you probably don't, then performance is not an important issue for you. Yes, this means you (and me). You probably don't need those performance tweaks, you don't need <10ms response time, you don't need to waste energy and effort plunging into the depths of your libraries to save some milliseconds in processing. It's just extra work with no tangible benefit.

The most important thing for you, the line-of-business app developer, to focus on is acquiring and implementing the correct requirements for your site. After that, it's critical that you make your code readable so that other devs can correct or expand on it. Only after those are done should you focus on improving your app's performance.

In other words, the performance of your application is at best a tertiary concern, behind making it correct and making it readable.

Get it right, get it beautiful, get it fast. In that order.

Joris Timmermans

Performance doesn't matter, unless you can prove that it does. Until then, work on your other pressing problems, and don't worry about shaving that extra 10 ms off your response time.

What do you think about performance in general? Should it be more important, and where does its importance fall relative to getting correct requirements and making beautiful code? Have you worked on applications where performance unquestionably was important, and how did you ensure that the application measured up? Let me know in the comments!

Happy Coding!

Between Two Stacks: The Consequences of a Data-Less Decision

We've been having an ongoing debate in our team about what archicture to use to implement our new enterprise-level application. There are two possible solutions, one familiar, one fast, but we can't seem to reach a conclusion as to which to use. A lack of applicable data is forcing us to make this key decision on intuition and guesswork, and I can't help but wonder how else we might be able to decide which path to take.

A nighttime long-exposure photo of lights from speeding cars on a seaside highway, leaving bright colored lines in their wake Speed lights 2 from Flickr, used under license

Familiarity vs Performance

Our new teammate Jerry, my boss Frank, and I have been kicking around ways to ensure that this new service will be blazing fast and thoroughly scalable, since much of our company's infrastructure will depend on it. Specifically, we're trying to determine the best (read: fastest) way of accessing the information in this system's database, since we believe that the amount of reads will be orders of magnitude larger than the amount of writes. It was partly for this reason that I benchmarked the performance of Entity Framework vs Dapper vs ADO.NET.

Throughout all of this, Jerry, Frank, and I have collectively tried to determine which assortment of technologies will allow the system to be both blazing fast and scalable, as well as not too different from what we already know. This, as you might imagine, is more difficult than we thought it would be.

There are two possible architectures we have bandied about. The first one is the one my group is most familiar with: the Microsoft stack of SQL Server, Entity Framework, and ASP.NET Web API. We build almost all of our other apps using this stack, and so development time would be much quicker if we use this setup.

The second possible architecture involves a less-familiar but theoretically more-performant stack: Redis, Dapper, RabbitMQ, and Web API, implemented using the Command Query Responsibility Segregation (CRQS) pattern. In theory, this architecture would allow the system to more redundant, more scalable, more performant, more testable, more everything (at least according to Jerry). Problem is, with the exception of Web API, nobody on my team has ever developed any products using these technologies or patterns.

So, since we lack the experience to make an educated guess as to which technology stack is "better", we wanted to use metrics to help us make a more informed decision. That, unfortunately for us, proved to be impossible.

Blindfolded Decision-Making

There's an implicit assumption in the desire to use data to make a decision, and that is that said data exists.

Our thought process went like this: if we can determine the amount of load this system will need to handle, we could make a better decision on which architecture to use (moderate load = familiar stack, heavy load = performance stack). Say we choose to go with the full MS-stack (SQL Server, Entity Framework, Web API), which many (including both Jerry and Frank) have argued will be less optimized and less performant than the theoretically-optimized stack (Redis, Dapper, RabbitMQ, Web API). In an absolute sense, we will be picking the slower option. Do we care? Even if it is the slower of the two options, would it be fast enough for our purposes?

We have no data, no metrics, no information of any kind that can give us an idea of what our load expectation will be. There's no infrastructure in place, no repository of statistics and metrics that we can review, parse, and draw conclusions from. How can we make a decision as to which architecture to use if we don't have any pertinent data?

It's a Catch-22. We need the metrics to choose the best architecture, but we need to actually implement the damn thing in order to get metrics, and implementation requires us to select an architecture. In the best case, the metrics could reveal a clear path for us to venture down. In the worst case, well, we'd be in the same situation we are now, having to make an important decision while blindfolded due to lack of supporting data.

So how will we break this impasse? We're just gonna have to pick one.

There's no other choice left to us; we'll need to pick which stack we think is best for now, implement it, and improve it later as we start to collect metrics. Given this, it seems likely that we'll go with the performance-optimized stack, since we know that will provide us scalability and responsiveness benefits into the future.

Still, though, I have to wonder if the metrics we needed might have clearly shown us the path we should go down. Without evidence, the decision being debated is one that will be made out of hope, not proof. For now, we'll just have to hope that we will choose correctly.

Have you ever encountered a decision like this, where the "best" solution wasn't clear and the methods by which you could determine which solution was better didn't exist or weren't thorough enough? How did you pick a solution? Let me know in the comments!

Dapper vs Entity Framework vs ADO.NET Performance Benchmarking

We have an upcoming project in my group that is going to need to be very, very performant. This project will involve us querying for data from a SQL database, transforming that data into strongly-typed objects, then returning those objects to the calling system through a service layer. Eventually, this service will be a central pillar in our organization's service-oriented architecture (SOA), and as such it absolutely has to be fast.

We generally want to use Entity Framework for our ORM, but just a little searching reveals StackExchange questions and blog post after blog post detailing how EF is simply not up to par for high-performance systems. Into that gap steps so-called "micro-ORMs" like Dapper.NET (which is used on the StackExchange family of sites including StackOverflow) which promise performance at the cost of maintainability. As always, we also have the option of using straight ADO.NET queries.

Thing is, because performance needs to be front-and-center in this app, I'd like to be really sure which of these ORMs provide the best bang for my buck. So I worked up a sample project over on GitHub that takes each of these three data access methods and beats them till they beg for mercy tests them using the same sample data and same queries (with some caveats, as we'll see below). This post is divided up into the following sections:

Methodology

This test uses a database schema that looks like this:

A database diagram, showing that a Sport has many Teams, and a Team has many Players.

In other words, a Sport has many Teams, and a Team has many Players.

I needed some sample data to test against. The sample project has an entire section dedicated to producing this data, but suffice to say that you can select how many sports, how many teams per sport, and how many players per team you want for each test.

Now what I needed was a set of queries that I could create in each ORM and test against. I chose three different queries:

  • Player by ID
  • Players per Team
  • Teams per Sport (including Players)

For each Query, I will run the test against all data in the database (e.g. for Player by ID I will select each player by their ID) and average the total time it takes to execute the query (including setting up the DbContext or SqlConnection, as the case may be) for each execution. Then, I will do multiple runs of this over the same data so that I can average them out and get a set of numbers that should clearly show which of the ORMs is the fastest.

Test Setup

As an example, here's the code for the Entity Framework, ADO.NET, and Dapper.NET test classes test class:

public class EntityFramework : ITestSignature  
{
    public long GetPlayerByID(int id)
    {
        Stopwatch watch = new Stopwatch();
        watch.Start();
        using (SportContext context = new SportContext())
        {
            var player = context.Players.Where(x => x.Id == id).First();
        }
        watch.Stop();
        return watch.ElapsedMilliseconds;
    }

    public long GetPlayersForTeam(int teamId)
    {
        Stopwatch watch = new Stopwatch();
        watch.Start();
        using (SportContext context = new SportContext())
        {
            var players = context.Players.Where(x => x.TeamId == teamId).ToList();
        }
        watch.Stop();
        return watch.ElapsedMilliseconds;
    }

    public long GetTeamsForSport(int sportId)
    {
        Stopwatch watch = new Stopwatch();
        watch.Start();
        using (SportContext context = new SportContext())
        {
            var players = context.Teams.Include(x=>x.Players).Where(x => x.SportId == sportId).ToList();
        }
        watch.Stop();
        return watch.ElapsedMilliseconds;
    }
}
public class ADONET : ITestSignature  
{
    public long GetPlayerByID(int id)
    {
        Stopwatch watch = new Stopwatch();
        watch.Start();
        using(SqlConnection conn = new SqlConnection(Constants.ConnectionString))
        {
            conn.Open();
            using(SqlDataAdapter adapter = new SqlDataAdapter("SELECT Id, FirstName, LastName, DateOfBirth, TeamId FROM Player WHERE Id = @ID", conn))
            {
                adapter.SelectCommand.Parameters.Add(new SqlParameter("@ID", id));
                DataTable table = new DataTable();
                adapter.Fill(table);
            }
        }
        watch.Stop();
        return watch.ElapsedMilliseconds;
    }

    public long GetPlayersForTeam(int teamId)
    {
        Stopwatch watch = new Stopwatch();
        watch.Start();
        using(SqlConnection conn = new SqlConnection(Constants.ConnectionString))
        {
            conn.Open();
            using(SqlDataAdapter adapter = new SqlDataAdapter("SELECT Id, FirstName, LastName, DateOfBirth, TeamId FROM Player WHERE TeamId = @ID", conn))
            {
                adapter.SelectCommand.Parameters.Add(new SqlParameter("@ID", teamId));
                DataTable table = new DataTable();
                adapter.Fill(table);
            }
        }
        watch.Stop();
        return watch.ElapsedMilliseconds;
    }

    public long GetTeamsForSport(int sportId)
    {
        Stopwatch watch = new Stopwatch();
        watch.Start();
        using(SqlConnection conn = new SqlConnection(Constants.ConnectionString))
        {
            conn.Open();
            using(SqlDataAdapter adapter = new SqlDataAdapter("SELECT p.Id, p.FirstName, p.LastName, p.DateOfBirth, p.TeamId, t.Id as TeamId, t.Name, t.SportId FROM Player p INNER JOIN Team t ON p.TeamId = t.Id WHERE t.SportId = @ID", conn))
            {
                adapter.SelectCommand.Parameters.Add(new SqlParameter("@ID", sportId));
                DataTable table = new DataTable();
                adapter.Fill(table);
            }
        }
        watch.Stop();
        return watch.ElapsedMilliseconds;
    }
}
public class Dapper : ITestSignature  
{
    public long GetPlayerByID(int id)
    {
        Stopwatch watch = new Stopwatch();
        watch.Start();
        using (SqlConnection conn = new SqlConnection(Constants.ConnectionString))
        {
            conn.Open();
            var player = conn.Query<PlayerDTO>("SELECT Id, FirstName, LastName, DateOfBirth, TeamId FROM Player WHERE Id = @ID", new{ ID = id});
        }
        watch.Stop();
        return watch.ElapsedMilliseconds;
    }

    public long GetPlayersForTeam(int teamId)
    {
        Stopwatch watch = new Stopwatch();
        watch.Start();
        using (SqlConnection conn = new SqlConnection(Constants.ConnectionString))
        {
            conn.Open();
            var players = conn.Query<List<PlayerDTO>>("SELECT Id, FirstName, LastName, DateOfBirth, TeamId FROM Player WHERE TeamId = @ID", new { ID = teamId });
        }
        watch.Stop();
        return watch.ElapsedMilliseconds;
    }

    public long GetTeamsForSport(int sportId)
    {
        Stopwatch watch = new Stopwatch();
        watch.Start();
        using (SqlConnection conn = new SqlConnection(Constants.ConnectionString))
        {
            conn.Open();
            var players = conn.Query<PlayerDTO, TeamDTO, PlayerDTO>("SELECT p.Id, p.FirstName, p.LastName, p.DateOfBirth, p.TeamId, t.Id as TeamId, t.Name, t.SportId FROM Team t "
                + "INNER JOIN Player p ON t.Id = p.TeamId WHERE t.SportId = @ID", (player, team) => { return player; }, splitOn: "TeamId", param: new { ID = sportId });
        }
        watch.Stop();
        return watch.ElapsedMilliseconds;
    }
}

Note that in Dapper.NET and ADO.NET's cases, we will be selecting a row for each Player in the GetTeamsForSport query. This is not an exact comparison against the EF query, but for my purposes it works fine.

Results

The following results are for 10 iterations, each containing 8 sports, 30 teams in each sport, and 100 players per team.

Entity Framework Results

Run Player by ID Players for Team Teams for Sport
1 1.64ms 4.57ms 127.75ms
2 0.56ms 3.47ms 112.5ms
3 0.17ms 3.27ms 119.12ms
4 1.01ms 3.27ms 106.75ms
5 1.15ms 3.47ms 107.25ms
6 1.14ms 3.27ms 117.25ms
7 0.67ms 3.27ms 107.25ms
8 0.55ms 3.27ms 110.62ms
9 0.37ms 4.4ms 109.62ms
10 0.44ms 3.43ms 116.25ms
Average 0.77ms 3.57ms 113.45ms

ADO.NET Results

Run Player by ID Players for Team Teams for Sport
1 0.01ms 1.03ms 10.25ms
2 0ms 1ms 11ms
3 0.1ms 1.03ms 9.5ms
4 0ms 1ms 9.62ms
5 0ms 1.07ms 7.62ms
6 0.02ms 1ms 7.75ms
7 0ms 1ms 7.62ms
8 0ms 1ms 8.12ms
9 0ms 1ms 8ms
10 0ms 1.17ms 8.88ms
Average 0.013ms 1.03ms 8.84ms

Dapper.NET Results

Run Player by ID Players for Team Teams for Sport
1 0.38ms 1.03ms 9.12ms
2 0.03ms 1ms 8ms
3 0.02ms 1ms 7.88ms
4 0ms 1ms 8.12ms
5 0ms 1.07ms 7.62ms
6 0.02ms 1ms 7.75ms
7 0ms 1ms 7.62ms
8 0ms 1.02ms 7.62ms
9 0ms 1ms 7.88ms
10 0.02ms 1ms 7.75ms
Average 0.047ms 1.01ms 7.94ms

Analysis

As we can see in the data above Entity Framework is markedly slower than either ADO.NET or Dapper.NET, on the order of 3-10 times slower.

Let's be clear: the methodology used in this test had something to do with this, particularly the "Teams per Sport" query. In that query, Entity Framework was selecting both the teams in a given sport and the players involved with each team (via an Include() statement), whereas the ADO.NET and Dapper.NET queries were just selecting joined data. In a more rigorous statistical study, the test would either be improved or these results would be thrown out.

What's more interesting to me is that Dapper.NET was, on average, faster than ADO.NET for the more complex queries. It appears to me that there is a performance hit the first time you use Dapper.NET (as also appears to happen with EF) but once you get past that, Dapper.NET is amazingly fast. I suspect that this has something to do with the fact that in the ADO.NET test cases we are using a SqlDataAdapter, though I cannot prove this.

Even if you do throw out the "Teams per Sport" query, you're still left with EF being at least 3 times slower than either Dapper.NET or ADO.NET. The data shows that, at least in terms of raw speed and with these queries, Entity Framework will be the slowest option, and Dapper.NET will (narrowly) be the fastest. Which is why my ultimate conclusion might surprise you.

Conclusion

We're going to use Dapper.NET on our project; that much is not in doubt. However, we're not going to start development with it, and it will not be the only ORM in use. The plan is to develop this project using Entity Framework, and later optimize to use Dapper.NET in certain scenarios where the system needs a performance boost. Yes, we are going with the slowest option to start. Why would we do this?

Because the major drawback to using Dapper.NET is that you have naked SQL queries in your code. If anybody fat-fingers anything, we won't be aware of any issues until we run the tests against the code. Plus, the members of my group are more familiar with EF than Dapper.NET, and therefore development time will be quicker.

In short, Dapper.NET is unquestionably faster than EF and slightly faster than straight ADO.NET, but we'll be doing the majority of development in EF and then optimizing with Dapper.NET where needed. We think this will strike a balance between ease of development and performance (and hopefully allow us to both get it done and do it right).

Don't believe me? Good! Go get the app and see for yourself!

(Also, if you see anything obviously wrong in my methodology, tell me about it! I'd be happy to update this post and the sample project if problems are found in it.)

Happy Coding!