Particular Software

We've been hard at work on the next major version of NServiceBus for a long time, and we're really excited to show it to you for the first time today! With major improvements in performance, top-to-bottom async support, and an even cleaner API, this version takes NServiceBus to a whole new level. And as of today, NServiceBus 6.0 is now available for public beta.

If that's all you needed to hear and you want to get going right away, open Visual Studio's Package Manager Console and get started:

PM> Install-Package NServiceBus -Pre

If you'd like to know more about what the new version can do for you, read on.

Async/await

The async and await keywords make it easier for you to write asynchronous code more effectively, which in turn makes processes that are heavily IO-bound more efficient. NServiceBus has to spend a lot of time waiting for IO from queues and databases, so it makes sense that the biggest change in NServiceBus 6.0 is that the framework has been rewritten to be fully async. Message handlers will now return async Task, and you can await asynchronous operations within your handler logic.

If you're new to async/await, you can get up to speed quickly by checking out our async/await webinar series.

Clearer and more focused API

In addition to async/await, we've changed some APIs to make working with NServiceBus easier.

First, we've changed how message handlers work, making it easier to do the right thing and harder to make mistakes. Instead of injecting an instance of IBus into your handlers, which allows for all possible messaging operations, message handlers will instead have a special context parameter containing only the operations that are valid within that context. This will make it very hard to call Unsubscribe() from within a message handler (almost always a bad idea).

Second, we've made sagas a lot easier to develop. Through small changes to the API, we've been able to eliminate all of the common mistakes people typically make when authoring a saga. Have you ever been burned by forgetting to add the [Unique] attribute to your saga data? Well, that won't happen anymore.

A full rundown of the changes are available in the NServiceBus 5 to 6 upgrade guide.

Azure

The Azure transports (Azure Storage Queues and Azure Service Bus), along with Azure Storage Persistence, contain the most upgrades in this release, as Microsoft has lately been shipping most new features for Azure with async-only APIs. Everything has been completely rewritten for async, which has also opened up the use of new APIs. This provides a phenomenal improvement in performance.

Azure Storage Queues

Azure Storage, being one of the oldest and most mature Azure services, has not received many upgrades related to queues. However, the upgrade to an async API has significantly improved the performance of NServiceBus. We are now able to completely saturate a storage account (2000 messages per second) when excluding the overhead of message size and handler logic.

Check out the Azure Storage Queues upgrade guide for more information.

Azure Service Bus

In Azure Service Bus, NServiceBus can now be more efficient when sending multiple messages. Outgoing messages are sent as batches rather than individually, and the reduction in roundtrips to Azure servers results in better performance.

A big addition in Azure Service Bus gives you stronger message consistency guarantees with atomic send+receive within a single namespace. This means that the processing of a message and the sending of messages within the message handler are atomic; outgoing messages are not really sent unless the incoming message is processed successfully. The result is that you don't have to worry about ghost messages being sent when a message handler rolls back. This is game-changing for Azure developers, as you can now have the same level of consistency in messaging operations previously enjoyed only by on-premises developers.

Check out the Azure Service Bus upgrade guide for more information.

Azure Storage Persistence

Azure Storage Persistence has also received a big upgrade. In Azure persistence, where storing or retrieving data involves a round-trip to the Azure cloud, the move to async is a big benefit. Azure persistence has also received a major improvement in its saga persister. The new version will create indexing entries to prevent duplicate saga entities, and the elimination of a full table scan will provide a significant performance increase.

Check out the Azure Storage Persistence upgrade guide for more information.

MSMQ

A major pain point with the MSMQ transport has been scaling out. Other transports can use the competing consumers pattern, but scaling out MSMQ required the Distributor component. The Distributor routes all messages through a central point. As a result, it has inherent limitations and is difficult to cluster for high availability.

For NServiceBus 6.0, we've created a new method of scaling out MSMQ endpoints: sender-side distribution. In this scheme, all senders are aware of the available endpoint instances that can process a given message. They will alternate between configured instances without the need for a distributor component.

For more information on sender-side distribution, see our Scale Out with Competing Consumers and Sender-Side Distribution sample.

We've also done some major performance tuning on MSMQ. We think you'll be very happy with how much faster the MSMQ transport is in NServiceBus 6.0.

RabbitMQ

The RabbitMQ transport has received a massive overhaul, which will make it perform much better overall. It will be easier to tell what's going on from the RabbitMQ admin page, especially when running multiple endpoint instances in a competing consumer pattern. We've also revamped the connection management for higher reliability and introduced support for TLS-encrypted RabbitMQ servers (using AMQPS). This will allow you to securely connect to cloud-hosted RabbitMQ instances.

Check out the RabbitMQ upgrade guide for more information.

SQL Server Transport

The SQL Server Transport has also been redesigned to be more efficient. Because the ADO.NET API supports async/await, we can take full advantage of the benefits of asynchronous programming and the performance enhancements that it yields. We also made it easier to configure connection strings, especially when using a multi-schema topology.

Check out the SQL Server Transport upgrade guide for all the details.

Try it out!

We'd love for you to go get the new beta packages and take them for a spin. Just check the Include prerelease checkbox in the NuGet Package Manager, or include the -Pre flag when installing packages from the NuGet Package Manager Console.

While we've done some pretty thorough testing on the beta, please note that you shouldn't use it in production just yet.

We would love your feedback on the new version. Please let us know what you think in the comments below. We'll be writing a lot more about all the improvements in NServiceBus 6.0 as we get closer to a final release, so stay tuned!

This month, we released an update to our platform, designed primarily to give you a much better user experience. If you have been frustrated by having to deal with large numbers of failed messages, or if you routinely deal with relatively large messages, we have included new features specifically to address your pain points. We've also added multiple productivity enhancements across the platform, so there's sure to be something in it for everybody.

Head over to our downloads page for the latest versions, or read on for the details.

Failed message grouping

At times, some event might lead to a large amount of failed messages within ServicePulse. Maybe something failed within your infrastructure: a web service was unreachable, a disk filled up, a server restarted, or maybe a new version of an endpoint was released with a bug. This isn't a problem; this is your system dealing with failure as designed.

But dealing with all of those failed messages can be a real pain. Previously, ServicePulse gave you the ability to archive or retry all messages, or retry selected messages. Unfortunately, ServicePulse would only show you 50 messages at a time! This made it very difficult retry all failed messages of a given type if they counted in the hundreds. Worse yet, the retry process was slow and error prone, and the more messages you attempted to retry at a time, the worse it became.

What would be much better would be a grouping of similar failed messages, allowing you to retry an entire group, and know that they would all be retried reliably, regardless of the number of messages involved.

We've been doing a lot of work on failed message handling across our platform, and we think you're going to like our new message grouping features.

Failed message groups

The new Failed Messages screen in ServicePulse automatically groups similar failed messages so that they can be handled in bulk. If two or more failed messages were caused by the same exception in the same place, then it is likely that they can be dealt with in the same way. Each group can be retried or archived as a single operation. You can get details for each group to see a full list of messages and deal with them individually if you still need fine-grained control.

When you first install the updated platform components, all of your unresolved failed messages will be automatically grouped as a background process. This may take a some time, depending on how many unresolved failed messages you have, during which you will continue to see the old Failed Messages screen. Once the background process has completed, the new Failed Messages screen will appear.

When you choose to retry messages, whether an entire group or just a subset of messages, you will benefit from the updated retries capability of ServiceControl. We've invested a lot of effort to speed up the retry process and make it more robust. Large groups of messages will be split up into batches and dispatched separately, with progress reported for each batch.

Batched retries

The new retries capability uses a new queue named particular.servicecontrol.staging. Messages to be retried get batched and loaded into this queue. When the batch is fully staged, the entire contents are forwarded to their destination.

With these new changes to failed message groups, retrying or archiving hundreds of messages can now be handled with just a few clicks. The next time you have a flood of failed messages for whatever reason, the new failed message grouping features are going to save you a ton of time and headaches.

Very large messages

To audit all messages flowing through your system, you need to store copies of them. Most of the time, messages are fairly small, and this isn't a problem. But, to keep very large messages from affecting the performance of the system, previous versions of ServiceControl did not store the message body for audited messages larger than 100KB. Anything larger got discarded, and attempting to view the message within ServiceInsight would show only a blank window.

But 100KB is an arbitrary setting. You might regularly have messages that are just over that 100KB limit, and still would like to be able to view them in ServiceInsight. Maybe you are even willing to sacrifice some performance to have more information.

In order to allow auditing larger messages, the maximum message body size is now configurable. Additionally, if the message body of audited messages are larger than the maximum configured size, ServiceInsight will now present a meaningful message.

Message body was too long

Now you can make the decision on how large is too large for yourself, based on what makes sense for your organization. If you ever run into a message that goes above the limit, ServiceInsight will tell you what's going on and what to do about it.

More focused tooling

In this release, we've paid a lot of attention to common workflows. We have optimized our tools based on your suggestions, and we've made a bunch of improvements intended to help you get your job done quicker, more efficiently and with fewer distractions.

Stack trace coloring

Stack traces are invaluable for debugging, but previously we presented them as a wall of plain black text, which can be difficult for the brain to make sense of quickly. It would be ideal if the structure of the stack trace would make itself readily apparent so that you could understand it more quickly and continue with debugging.

To make stack traces in ServiceInsight easier to read, we are now using syntax highlighting to display them in color. Stack traces displayed in the Headers tab, and also in the Exception popup of the Flow Diagram tab (shown after clicking Exception type) will both be colored.

Stack trace coloring

With this small change, it's now much easier to see the method name, file path, and line number where an exception occurred, so you can pull up the offending code and fix the problem.

Better 3rd party integration

It's possible for NServiceBus to ingest messages created manually on the queuing infrastructure, perhaps as part of a 3rd party integration, but these messages commonly lack information in specific headers. Missing headers creates very odd output in some places, with values either missing or using default values like DateTime.MinValue, which created a lot of confusion.

It would be much clearer if these missing values were communicated better by providing meaningful descriptions, showing the true nature of the missing values. We have made these changes in ServicePulse and ServiceInsight.

In ServicePulse, Unknown will be displayed for missing Time Sent and Message Type values in the Failed Messages list.

Unknown headers in ServicePulse

The same goes for ServiceInsight in the Messages list, Flow Diagram, and Saga View.

Unknown headers in ServiceInsight

With this change, it's clear that the information is indeed not available, and not due to a random system error.

Streamlining ServiceInsight message list

In the previous version of ServiceInsight, the message list contained seven columns: Status, Message ID, Message Type, Time Sent, Critical Time, Processing Time and Delivery Time. This was too much information, making for a crowded grid, especially at lower monitor resolutions, where space is at a premium and horizontal scroll bars serve only to hide information from view.

ServiceInsight is primarily a tool to help you while you are developing and debugging your systems, so it would be more helpful to focus only on the information necessary for that purpose.

To accomplish this, we have streamlined the information displayed in ServiceInsight. Critical Time (the amount of time a message has to wait to be processed) and Delivery Time (the amount of time it takes for a message to be delivered to its destination) are related to monitoring, and are not that useful in ServiceInsight. Additionally, at times these calculated values were inaccurate due to clock drift between different servers, causing unnecessary confusion.

For these reasons, we have removed Critical Time and Delivery Time from the Messages list view to allow you an unobstructed view of the information that you really need to get an understanding of what your system is doing.

Usability improvements

We've made a lot of other improvements to make our tools easier to use.

In previous versions of ServiceInsight, message nodes were all the same size, regardless of content. Now, the width adapts to the length of its content.

Auto-sizing of message nodes in ServiceInsight

Now you can see everything you need, without resorting to a rollover tooltip.

In ServicePulse, we've reorganized the Endpoints Overview and Configuration screens to display their contents alphabetically, making the desired content easier to find. We've enabled text to wrap, where appropriate, so that you can see the full names of things without extra hassle. We've also introduced a new version notification, so you can always be sure you're running the latest and greatest version of ServicePulse.

Summary

A lot of great things went into this release of the platform, with the aim of removing some common pain points and frustrations, and optimizing things so that you can do your job better. Of course, we fixed a bunch of not-so-common bugs too. For the complete details, you can refer to the release notes for ServiceInsight 1.3.0, ServicePulse 1.2.0, and ServiceControl 1.6.0.

You can get the newest versions of these tools on our downloads page. We would love to hear what you think!

About the authors:Weronika Łabaj, Mike Minutillo, and John Simons are part of the Particular Software Engineering Team who are passionate about writing software that makes other developers happy.

In the second keynote of NSBCon 2015, Ted Neward introduces the concept of Platform-Oriented Architecture (POA) as the logical successor to the currently used SOA/REST architectural approaches. POA is a developer-focused approach that has an established communication backplane, an entity definition, a built in agent model and a set of expectations around various execution topics. Ted also talks about the relationship between POA and operating systems, programming languages, and database engines.

If you want to learn more about Ted's vision of POA watch his video here:

Ted Neward is "The Dude of Software."

This article is an excerpt from the book Dr. Harvey and the 8 Fallacies of Distributed Computing which chronicles the misadventures of Dr. Harvey Fallacious, a 19th-century explorer and researcher, and also my great-great-great-great-great-grandfather. In it, he details his travels and explorations in remote, then-uncharted regions of South America, unknowingly dealing with the repercussions of the 8 Fallacies long before they were written down by Peter Deutsch, let alone the birth of the computer age.

On the use of tin can phones in primitive culture

In the spring of that year, my travels brought me upon a previously undiscovered civilization. The people called themselves Ossians, and they lived in an isolated collection of villages in a remote part of South America.

Being remote as they were, their level of technology was understandably primitive. But I was surprised by the locals' recent obsession with new forms of communication. It all started, they told me, when one of them discovered that by attaching a rope between two clay pots and stretching the rope taut, a voice uttered into one side could be heard on the other. (I neglected to tell them that even as a boy I had done this very thing with tin cans.)

At first it was children doing it, but quickly the village elders recognized the advantages that communication over distances could bring, and so they strung ropes all throughout the village. "A clay pot in every home!" became a rallying cry for modernization.

While the clay pot network worked well enough locally, it didn't take long for problems to begin cropping up as they attempted to distribute the network to surrounding villages.

Proper rope tension could not be maintained over long distances, necessitating the placement of repeater stations (simple one-room huts) across the landscape, manned by a human operator who would listen to messages from one pot and relay them down the line toward the receiving end.

This all worked quite well — until a relay operator would miss an incoming message while outside the hut answering a call of nature or until a predator chewed through a transmission rope. Clearly not a perfect system, but improvements are ongoing.

Dr. Harvey Fallacious
November 25, 1834

The network is not reliable

Anyone with a cable or DSL modem knows how temperamental network connections can be. The Internet just stops working, and the only way to get it going again is to unplug it for 15 seconds. (Or, put another way, "Have you tried turning it off and on again?")

Thankfully, better solutions exist for professional data centers than consumer-grade modems, but problems can persist.

As a company that does reliable messaging, we've really heard it all. Don't worry, the names will be changed to protect the innocent.

A registered ISP had two routers: a primary and a backup. One day, the primary router malfunctioned. They switched to the backup, only to find that its routing tables had not been updated in a very long time. For many customers, that was the day the Internet died.

In another case, a project used an Oracle database. Everything worked great in the development environment, but in production there was an additional load balancer and firewall. Every once in a while, the load balancer would silently drop TCP connections to the database. These faulty connections continued to sit in a connection pool, so the next time somebody needed a connection, they would get an exception.

Just recently, on June 12, a single ISP in Asia broke the Internet for a big section of the world, creating Internet problems in Europe.

In general, you can't trust any network, no matter how local or global. Hardware, software, and security can all cause issues. This is codified in the 1^st fallacy of distributed computing: the network is reliable.

This is especially problematic for HTTP communication or any request/response or remote procedure call (RPC) style of communication.

Consider the following simple web service call:

var svc = new MyService();
var result = svc.Process(data);

How do you handle an HttpTimeoutException? This is an exception that's generated on the client side when there's a problem, but you can't know what went wrong because you haven't gotten a response.

Data can get lost when sent over the wire. It's possible that the web service call actually succeeded but the response got lost somewhere on the Internet. If the web service represents an idempotent operation (a process that can be repeated without any adverse side effects), then it can simply be retried. But what if that process charges a credit card?

Solutions

To provide a truly reliable system, you must accept that cross-network communication will not always be possible. Because we can't guarantee that an attempt at communication will be successful, we need to provide a facility to automatically retry after failures. To protect against failure while in the midst of a retry, we can use a pattern called store and forward. Instead of directly sending data to a remote server, we can store it in local storage. This way, when we boot up again, we are ready to continue right where we left off. We can use transactions to ensure that we keep retrying until processing succeeds.

This rises above the level of a simple retry loop around a web service invocation, which would fail if the server it was running on crashed. We need additional infrastructure to make these guarantees.

Asynchronous messaging

There are many different technologies out there, called reliable messaging or message queuing systems, that solve these types of problems. On the Microsoft platform, the best known one is Microsoft Message Queuing (MSMQ), and on Azure, there is Azure Queue Storage and Azure Service Bus. Outside the Microsoft ecosystem, there is RabbitMQ, ActiveMQ, and ZeroMQ. Basically anything with "MQ" at the end is an indication that the product fits within this family of technologies.

These queuing technologies wrap up something like a web service call into an isolated, discrete unit of work called a message. The message queue employs store and forward to ensure that the message gets to where it needs to go. It can facilitate automatic retry, as message processing can be attempted over and over, even after a system crash. Some even support transactions, so that the message is only fully consumed if the business transaction is successful. This way, a guarantee can be made that each message is successfully processed exactly once.

Techniques exist to enable "exactly once" processing in in environments like the cloud, where distributed transactions are not feasible, but that is outside the scope of this text.

Queuing technologies hold an additional advantage over a simple retry loop. For example, if we were attempting to create a customer and received an HTTP timeout, we would have no way of knowing if the server received the data and was simply unable to respond or if, rather, the data never arrived at all.

Retrying brings with it the possibility for server-side duplication. If we retry the attempt to create the customer, we may accidentally create the customer twice.

Message queues bring with them the concept of a message ID so that the server can decide whether an attempt is a retry or not. In essence, messaging allows deduplication on the server side.

Abandon request/response

Asynchronous messaging requires a slight change in thinking because it does not provide the ability to do the traditional request/response seen in typical web service calls.

Sending a message to a message queue is a fire-and-forget operation. You drop a message in a queue, and eventually it makes its way to the server and the server will process it. You do not get an immediate return value on the next line of code.

Ultimately, by solving certain infrastructure issues, asynchronous messaging forces you to redesign the logical flow of your system.

This is the difficult leap of queueing technologies: not that they have an API that is difficult to use (they don't), but that they require letting go of more traditional request/response programming models.

Unfortunately, you can't just take a system using HTTP, plug in a queue, and ship it. It requires a significant redesign, and sometimes rewrite, of your system.

That can be scary, but the results are worth it.

Summary

Compared with a few decades ago, networks are fairly reliable — except for when they're not. As we continue to build larger and more globally distributed systems, we make ourselves susceptible to all the bad things that can happen.

In order to deal with this, we're going to have to move away from synchronous request/response-type programming. The object-oriented model of invoking a method (known as remote procedure call, or RPC) tends to break down to conditions when the network is unreliable, putting our system into a non-deterministic state that is very difficult to get out of.

In the last several decades since the creation of the first computer networks, we have been unable to completely solve the problem of network reliability. It stands to reason that this will not change in the next 5–10 years. We need to learn to build systems that will work in this environment today.

About the author: David Boike is a developer at Particular Software who first got into computers because he couldn't find a long enough string for his tin can phone.

Have you ever been pulled in to a software project and had to figure out how everything works? Often, your options are limited to either sifting through piles of documentation or diving into thousands of lines of code. Unfortunately, the documentation probably became obsolete as the software grew and evolved and, while the code is accurate by definition, it requires a lot of concentration to trace through and figure out how everything fits together.

Ideally, what we really want is a method that reflects the accuracy of the code but is more accessible and easier to understand. The best strategy for large software systems to achieve this is to create a kind of living documentation. The idea with living documentation is to generate it from the existing codebase. This way, the documentation is accurate and can be easily kept up-to-date -- just regenerate it as needed (for example, as part of an automated build).

Visualizing NServiceBus Systems

In an NServiceBus system, endpoints can be developed, maintained, and run as independent processes that exchange messages with each other. Creating new ways for these endpoints to communicate can be as simple as creating a new message type. This level of loose coupling comes at a cost, though. It can be difficult to get a high-level picture of the endpoints that make up a system and the messages that flow between them.

One approach could be to analyze your codebase to see how messages flow through your system. But in order for any tool to do this, it would need to examine the code to see when and where messages are sent and where they are handled. This is not trivial, and the problems are compounded if the messages and endpoints are spread across more than one solution.

In an earlier blog post, we mentioned a new visualization technique that we've been working on to show how messages flow through a system. By extracting the endpoints, message types, and message-exchange patterns of your system from the runtime stream of audited messages, we are able to provide a higher level and up-to-date visualization, true to the principles of living documentation.

What does it look like?

This graph, generated from one of our samples, gives you a sense of the kind of documentation that can be generated automatically, showing the endpoints and the flow of messages between them.

Also, the graph isn't just an image -- it's a kind of XML file¹ that Visual Studio can render, allowing you to alter the layout and formatting however you like.

How does it work?

When an NServiceBus endpoint with auditing enabled processes a message, it will forward that message to an audit queue, along with some metadata about how it was processed. Then ServiceControl reads all of these messages and stores them in an embedded database.

The new visualization tool aggregates data from all of the messages in the ServiceControl database, extracting the message types as well as the endpoints responsible for processing them. From this information, the tool constructs a directed graph showing the flow of messages between endpoints.

In closing

Imagine having high-level documentation of how your system works that's always up to date. No more arguments among developers saying, "No, that's how it used to work. Now things are different." New developers could get up to speed much faster.

So go ahead, download the visualization tool and try it. Oh, and don't forget to tell us what you think -- we'd really appreciate your input.

Footnotes

¹Directed Graph Markup Language (DGML)

About the author:Mike Minutillo is a developer at Particular Software with a passion for generating living documentation, mostly so he doesn't have to spend time writing it by hand.

This article is an excerpt from the book Dr. Harvey and the 8 Fallacies of Distributed Computing which chronicles this misadventures of Dr. Harvey Fallacious, a 19th-century explorer and researcher, and also my great-great-great-great-great-grandfather. In it, he details his travels and explorations in remote, then-uncharted regions of South America, unknowingly dealing with the repercussions of the 8 Fallacies long before they were written down by Peter Deutsch, let alone the birth of the computer age. You can get your own free copy of the eBook here.

On the efficacy of messages in bottles

Upon the return from my visit with the newly-discovered Ossian society, it happened that my ship capsized and I was marooned for a time on a tiny deserted island somewhere in the Caribbean Sea.

One might think that the biggest problem with being marooned would be finding food for survival, but that was not the case on this island. Coconut and pineapple trees grew all over, and as luck would have it, I stumbled across a hidden cache of rations likely left behind by Spanish explorers or pirates. Among the supplies were dried and preserved meats and a collection of rum in glass bottles.

The biggest problem when stranded, after finding food and shelter, is keeping one's own mind occupied. One day, I took a discarded rum bottle, inserted a note, and threw it into the sea.

Imagine my surprise when, approximately six months later (to my count), I observed the selfsame rum bottle washing to shore. Pulling the stopper out of the bottle, I removed the note and was astonished to find that it was not the one I had originally written!

My excitement waned when I read its contents: "Received your message. Happy to help. Where are you?"

Furious with myself, I wrote another message, describing my island's position in relation to the moon and stars, and returned the bottle once again to the surf. I began drinking rum at a furious pace, emptying bottles so I could send a new message once per week.

Luckily, ocean currents in the region proved to be consistent, for six months later I observed white sails on the horizon. Soon thereafter, my rescuers made landfall. I thanked them profusely but found myself severely irritated that, had I only included my position in my first message, my rescue could have been hastened by six months.

Dr. Harvey Fallacious
August 3, 1835

Latency is not zero

The speed of light is actually quite slow. Light emitted from the sun this very instant will not reach us here on Earth for 8.3 minutes. It takes a full 5.5 hours for sunlight to reach Pluto and 4.24 years to reach our closest neighboring star, Proxima Centauri. And we cannot communicate at the speed of light; we must bounce data around between Ethernet switches, slowing things down considerably.

Meanwhile, human expectations of speed are pretty demanding. A 1993 Nielsen usability study found that, for web browsing, a 100 millisecond delay was perceived as reacting instantly, and a one second delay corresponded to uninterrupted flow. Anything more than that is considered a distraction.

What is latency?

Latency is the inherent delay present in any communication, regardless of the size of the transmission. The total time for a communication will be

TotalTime = Latency + (Size / Bandwidth)

While bandwidth is limited by infrastructure, latency is primarily bounded by geography and the speed of light. Geography we can control. The speed of light we cannot.

This latency occurs in every communication across a network. This includes, of course, users connecting to our web server, but it also governs the communications our web server must make to respond to that request. Web service calls, or even requests to the database, are all affected by latency. Frequently, this means that making many requests for small items can be drastically slower than requesting one item of the combined size.

Ignore at your own risk

Latency is something that should always be considered when developing applications. To neglect it can have disastrous consequences.

One of our developers once worked with a client that was building a system that dealt with car insurance. In the client's specific country, automobile insurance coverage was required by law. The software needed to check if drivers had car insurance already. If they did not, they were automatically dumped into an expensive government-provided insurance pool.

The team decoupled this dependence on an external insurance check by stubbing it behind a web service. During development, this web service simply returned true or false immediately. Unbeknownst to the team, the production system used a dial-up modem to talk to the government system. Clearly, this was not going to work at scale.

This is an extreme example, but it illustrates the point. The time to cross the network in one direction can be small for a LAN, but for a WAN or the Internet, it can be very large — many times slower than in-memory access.

Careful with objects

In earlier days of object-oriented programming, there was a brief period when remote objects were fashionable. In this programming style, a local in-memory reference would be a proxy object for the "real" version on a server somewhere else. This would sometimes mean that accessing a single property on the proxy object would result in a network round trip.

Now, of course, we use data transfer objects (DTOs) that pack all of an object's data into one container, shipping it across the network all at once and thus eliminating multiple round trips associated with remote objects. This way, the latency penalty only has to be paid once, rather than on each property access.

So, it's important to be careful of how your data objects are implemented, especially those generated by O/RM tools. Because of object-oriented abstraction, it's not always easy to know if a property will just access local data already in memory or if it will require a costly network round trip to retrieve. SELECT N+1 queries are just one specific example of the second fallacy of distributed computing rearing its ugly head.

Solutions

To combat latency, we can first optimize the geography so that the distance to be crossed is as short as possible. Once we have committed to making the round trip and paying the latency penalty, we can optimize to get as much out of it as possible.

Accounting for geography

For minimum latency, communicating servers should be as close together as possible. Shorter overall network round trips with fewer hops will reduce the effect of latency within our applications.

One strategy specifically for web content involves the use of a content delivery network (CDN) to bring resources closer to the client. This is especially useful with resources that don't often change, such as images, videos, and other high-bandwidth items.

Latency typically isn't much of a problem within an on-premise data center with gigabit Ethernet connections. But when designing for the cloud, special attention should be paid to having our cloud resources deployed to the same availability zone. If we are to fail over to a secondary availability zone, all of the related resources should fail over together so that we do not encounter a situation in which an application server in one zone is forced to communicate with a database in another.

Go big or don't go

After accounting for geography, the best strategy to avoid latency involves optimizing how and when those communications take place.

When you're forced to cross the network, it's advisable to take all the data you might need with you. Clearly this is a double-edged sword because downloading 100 MB of data when you need 5 KB isn't a good solution, either. At the very least, inter-object "chit-chat" should not need to cross the network. Ultimately, though, experience is required to analyze the use for the data. Access to that data can then be optimized to balance the need for small download size and minimal amount of round trips. This is related to the 3^rd fallacy, bandwidth is infinite, which will be covered next.

However, a better strategy is to remove the need to cross the network in the first place. The latency delay of a request you don't have to make is zero.

One obvious strategy is to utilize in-memory caching so that the latency cost is paid by the first request. This will be to the benefit of all those that come afterward; they can share the same saved response. But of course, caching isn't always a possibility, and it introduces its own problems. As it has been said, "There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors." Knowing when to invalidate a cached item, at least without asking the source and negating the benefit of the cache, is a hard thing to do.

Harness the power of events

With asynchronous messaging, you can publish events about information updates immediately, when they happen. Other systems that are interested in this data can then subscribe to receive these events and cache the data locally. This ensures that when the data is needed, it can be provided without requiring a network round trip.

Not every scenario requires this additional complexity, but when used appropriately, it can be very powerful. Instead of having to wait to request updated data, always-up-to-date data can be served instantly.

Summary

We can use a variety of strategies to minimize the number of times we must cross the network. We can carefully architect our system to minimize differences in geography, and we can optimize our network communications so that we return all the information that might be needed, minimizing cross-network chit-chat. This requires experience in carefully analyzing system use patterns in order to optimize how information is delivered between systems.

We can also use a variety of caching strategies, including in-memory cache and content delivery networks, to minimize repeated requests for information. CDNs in particular are useful for moving content "closer" to the end user and minimizing the latency for those specific items.

Publishing events when data changes can be instrumental in managing a distributed infrastructure. Being notified of changes can help disconnected systems remain in sync.

The speed of light may be slow, but it doesn't have to keep you down.

About the author: David Boike is a developer at Particular Software who has never been on a deserted island but is glad to hear there's rum.

This post is part of a series that describes the improvements in NServiceBus Version 6.

In the 1960s, Shigeo Shingo, a Japanese manufacturing consultant from Saga City, Japan, pioneered the concept of poka-yoke, a Japanese term that means "mistake-proofing." In a nutshell, poka-yoke involves saving time and cutting waste by reducing the possibility of defects. Although some number of mistakes will always occur, processes can be put in place to catch those mistakes before they turn into actual customer-facing defects.

This is a model we've been trying to follow with NServiceBus -- not only with regards to our internal development processes, but also in our efforts to guide developers toward building message-driven systems. Through countless API design decisions over the years, we've been making it ever easier to use NServiceBus the right way and ever more difficult to use it wrongly. This way, developers naturally fall in to a pit of success.¹

During the development of NServiceBus 6.0 (V6), we realized that there were some common mistakes that developers tend to make when working with sagas. Just for a bit of background, a saga is an NServiceBus pattern to model long-running processes. It does this by combining multiple message handlers together with a shared "memory" retained between handling messages that are correlated together in some way. For instance, in a typical e-commerce application, a saga might bring together message handlers to respond when an order is placed and the credit card is processed so that it is only shipped when both events have been received.

In the Saga API, we found that some users were accidentally making mistakes in constructing their sagas, especially in defining how messages relate to the saga data. This was resulting in some hard-to-find bugs. By changing the API slightly in V6, we found we could prevent some of these mistakes from being made in the first place. Let's take a look at some examples.

Mapping messages to saga data

In our theoretical e-commerce example, when a message related to a specific order arrives and that message should be handled by a saga, the saga relating to that order needs to be found and invoked.

If we take the OrderId from the message, then we can find the matching saga data by querying the data store for an instance with the matching OrderId. We could even say that there's a kind of mapping between the message's OrderId and the saga's OrderId -- or more generally, a property on the message and a property on the saga data.

protected override void ConfigureHowToFindSaga(SagaPropertyMapper<OrderSagaData> mapper)
{
    mapper.ConfigureMapping<OrderPlaced>(message => message.OrderId)
        .ToSaga(sagaData => sagaData.OrderId);
}

Back in NServiceBus 5.0, we made this a little more discoverable by switching the method signature on the saga base class from virtual to abstract, ensuring that the compiler would complain if the method did not exist.

However, developers often need to change sagas for existing business processes, like taking additional events into account. For example, while we were previously not shipping an order until it had been both placed and charged, we might have a new requirement to wait until the order had been approved by a customer support agent as well.

To fulfill this requirement, we would add a new message handling method to the saga for the new OrderApproved event. In this situation, it's easy to forget that you need to add a mapping for that OrderApproved message. If forgotten, the saga infrastructure would not be able to find the correct saga data, and the saga wouldn't end up handling the message.

Now, with V6, we're making sure you can't forget to add the necessary mappings that assure successfully processed messages. To do that, we added a startup check to see that all the mappings exist for messages that start a saga, throwing a helpful exception right away if anything is missing.

For messages that don't start a saga, it's a little more complicated due to auto correlation.² But if it's impossible to map an incoming message to a saga due to a missing mapping, we will throw a runtime exception. The unmappable message will go through automatic retries and arrive in your error queue so that once you fix the mapping issue according to the instructions in the exception, you can replay that message and pick up right where you left off.

The bottom line is this: instead of the saga behaving unexpectedly or incorrectly because of a missing mapping, an exception will provide an early warning and guidance on exactly what to do.

Being `[Unique]`

One of the guarantees NServiceBus provides you with, through sagas, is the same level of consistency you'd expect from a database. Those guarantees need to hold up even when multiple messages are processed by the same saga in parallel. NServiceBus must make sure that no duplicate database records are ever created, as that would disrupt the correctness of the business process the saga was modeling.

In our example, that means the OrderId column needs to have a unique constraint on it to ensure that two different sets of data can't be created for the same OrderId. The constraint guarantees that only one of the messages would end up creating the saga data. The other message would hit a unique constraint violation exception when it tried to create the same saga data, causing it to fail and retry. On that retry, the infrastructure would then be able to find the saga created by the first message, processing that second message as if the messages had arrived in sequence rather than in parallel.

Previous versions of NServiceBus enforced uniqueness by decorating one of the saga data properties with the [Unique] attribute. This attribute enabled the underlying data persister to create a unique constraint for the decorated property.

Unfortunately, it was far too easy to forget to do that:

public class OrderSagaData : ContainSagaData
{
    // Oops, forgot [Unique] !!
    public string OrderId { get; set; }
}

Omitting the [Unique] attribute would cause the saga data model to be created without the correct constraint. Then, if multiple messages were processed in parallel for that same saga, when each of the threads queried the saga storage and couldn't find an existing instance, they would each go and create a new one. This would result in a defect: duplicate saga data and nondeterministic behavior.

Now, in V6, NServiceBus performs extra checks on the message mappings in the ConfigureHowToFindSaga method on endpoint startup. It ensures that all of the mappings point to a single property on the saga data class. Since this "correlation property" is used to identify the saga, by definition this property must be unique. And the saga infrastructure defaults to treating it as unique so you don't need to put the attribute on it anymore.

Auto-population

With message mappings correctly defined, a developer's attention will turn to the logic in the message handling methods. However, one important task related to message mappings still remains in order to have a correctly-functioning saga.

In earlier versions of NServiceBus, you needed to remember to populate the saga data with information from the message in each Handle method of messages which started the saga:

// V5
public void Handle(OrderPlaced message)
{
    this.Data.OrderId = message.OrderId;

    // Continue with business logic
}

This is all very boilerplate and easy to forget. If omitted, Data.OrderId will be set to its default value of 0 when saved to the database, which means the saga won't be found when querying with the OrderId later on.

However, based on the already established mappings, the saga infrastructure already knows that the OrderPlaced event should start the saga, and the OrderId value in the message must be the same as the OrderId value in the saga data. Requiring a line of code to initialize the saga data based on the value in the message shouldn't really be necessary.

In V6, we automatically set the value of the correlation property for newly created sagas according to the mappings so that you don't have to.

// V6
public async Task Handle(StartOrder message, IMessageHandlerContext context)
{
    // Data.OrderId has already been set. Go ahead with your business logic.
}

Summary

Forgetting to properly map saga data, forgetting a [Unique] attribute, or failing to populate the saga data from the mapped messages are all simple mistakes. Unfortunately, they were mistakes that were all too easy to make for a developer with a thousand other things on their mind.

True to the concept of poka-yoke, NServiceBus 6.0 makes it practically impossible to make these common mistakes, eliminating some hard-to-diagnose issues. We can't prevent all bugs in code, but we'll take these off the table so you won't have to track them down ever again.

And, in case you were wondering, the saga infrastructure is fully backwards compatible, so you can take your existing sagas and start running them in V6 right away. So go ahead. Take NServiceBus 6.0 for a spin now, and see if any of the other developers on your team (wink) made any of these mistakes.

Happy coding!

Footnotes

¹Falling Into The Pit of Success by Jeff Atwood
²Auto Correlation is a feature that embeds the SagaId in messages sent from the saga, which is then reflected back to the saga when a handler processing that message replies. As a result, the reply message also carries the SagaId and does not require a property mapping in order to find the saga.

This post is part of a series describing the improvements in NServiceBus 6.0.

After many years of service, we bid our trusty IBus farewell. It's served us well almost since the very beginning, providing access to many operations like sending messages, subscribing to events, and even manipulating headers.

Over the years, the IBus interface has become like a kitchen knife block, with knives for paring, chopping, cutting steak, and maybe even a pair of herb scissors. Although it's convenient for storing many knives and makes them easily accessible, you don't need all of them when you're having a steak dinner. So it is with IBus. Methods like Reply and ForwardCurrentMessageTo are always available on the interface but only make sense in the context of handling an incoming message. If you use them in other scenarios, such as during endpoint startup, they may throw an exception.

With version 6 of NServiceBus, we've removed IBus entirely and replaced it with focused, context-specific interfaces that provide clear guidance on what can and can't be done based on the methods they expose. This ensures that, in any given circumstance, you’ll have the exact tools needed for the job and nothing else cluttering up your workspace.

Handlers

When implementing IHandleMessages<T>, you'll notice an additional parameter on the Handle method. This parameter is of type IMessageHandlerContext, which exposes methods such as Publish and Send. More importantly, the IMessageHandlerContext type also excludes methods like Subscribe and Unsubscribe— actions that you really shouldn't be performing in a message handler anyway. Migrating your handlers should only require replacing IBus with the newly provided IMessageHandlerContext. For example:

public class v5_MyMessageHandler : IHandleMessages<MyMessage>
{
    private IBus bus;

    public MyMessageHandler(IBus bus)
    {
        this.bus = bus;
    }

    public void Handle(MyMessage message)
    {
        var messageId = bus.CurrentMessageContext.Id;

        bus.Publish<MyEvent>(e => e.Value = messageId);
    }
}

becomes

public class v6_MyMessageHandler : IHandleMessages<MyMessage>
{
    public async Task Handle(MyMessage message, IMessageHandlerContext context)
    {
        await context.Publish<MyEvent>(e => e.Value = context.MessageId);
    }
}

As you can see, in addition to the removal of IBus in version 6 of NServiceBus, the API for handling messages is now async as well. More information on this change can be found in our documentation on asynchronous handlers.

Moving off `IBus`

As you can imagine, removing IBus introduces many changes. For example, you no longer have to inject IBus into your classes using dependency injection. Instead, you're provided with a context-specific object in the method you're working with. Let's take a look at how much simpler our endpoint startup and shutdown code can be without IBus:

public class v5_Startup : IWantToRunWhenBusStartsAndStops
{
    private readonly IBus bus;

    public Startup(IBus bus)
    {
        this.bus = bus;
    }

    public void Start()
    {
        bus.Publish<MyEvent>();
    }

    public void Stop()
    {
    }
}

becomes:

public class v6_Startup : IWantToRunWhenEndpointStartsAndStops
{
    public async Task Start(IMessageSession session)
    {
        await session.Publish<MyEvent>();
    }

    public async Task Stop(IMessageSession session)
    {
    }
}

The IMessageSession parameter presents you with just the operations that are valid at this specific extensiblity point.

Endpoint configuration

The NServiceBus configuration API remains very similar, but it underwent some renaming. Instead of an IBus, you receive an IEndpointInstance, which offers all available bus operations outside the message processing pipeline. For example, here is how you would initialize an endpoint prior to sending a message or publishing an event:

BusConfiguration v5_busConfiguration = new BusConfiguration();
// other endpoint configuration code goes here
IStartableBus v5_startableBus = Bus.Create(v5_busConfiguration);
IBus v5_bus = v5_startableBus.Start();

//use v5_bus to .Send and/or .Publish as required

becomes

EndpointConfiguration v6_endpointConfiguration = new EndpointConfiguration();
// other endpoint configuration code goes here
IEndpointInstance v6_endpoint = await Endpoint.Start(v6_endpointConfiguration);

//use v6_endpoint to .Send and/or .Publish as required

Note that with the removal of IBus, Bus.Create has been replaced with the Endpoint.Start method, which allows you to start your endpoint with less ceremony.

Impact on Dependency Injection

In previous versions of NServiceBus, the IBus would be registered in the IoC container automatically. With IBus no longer in use, this registration doesn't need to happen. Additionally, the new context-specific interfaces will not be registered in the IoC container, either. Instead, as we saw in the samples above, they will be provided to the NServiceBus extensibility points as parameters.

That said, if you do need access to an IEndpointInstance from an IoC container—for example, if you need to send a message from an ASP.NET MVC Controller or a WPF ViewModel—it can be registered as shown here using the Ninject container:

IEndpointInstance endpoint = await Endpoint.Start(config);
kernel.Bind<IEndpointInstance>().ToConstant(endpoint);

Once the object instance is registered with the container, IEndpointInstance will be available via dependency injection wherever you may need it:

``` public class DefaultController : Controller { IEndpointInstance endpoint;

  public DefaultController(IEndpointInstance endpoint)
{
    this.endpoint = endpoint;
}

  public ActionResult Index()
  {
      return View();
  }

  [AllowAnonymous]
  public async Task<ActionResult> Send()
  {
      await endpoint.Send("Samples.Mvc.Endpoint", new MyMessage())
          .ConfigureAwait(false);
      return RedirectToAction("Index", "Default");
  }

} ```

Testing within a context

Context-specific interfaces also have a nice effect on the unit testing experience with NServiceBus, enabling the Arrange Act Assert pattern many developers know and love. It changes this...

Test.Handler<MyHandler>()
    .ExpectSend<OutgoingMessage>(m => true)
    .OnMessage(new IncomingMessage());

...to this:

var handler = new MyHandler();
var testableMessageHandlerContext = new TestableMessageHandlerContext();

await handler.Handle(new IncomingMessage(), testableMessageHandlerContext);

testableMessageHandlerContext.SentMessages.Should().HaveCount(1);

Although this may look like more effort to achieve the same result, this approach provides a lot more testing flexibility. The example above uses the FluentAssertions package to verify results instead of relying on the predefined assertions in the NServiceBus.Testing package. Also, if you don't like the provided testing classes, you're now able to create alternatives using your favorite mocking library.

Summary

We're really excited about the evolution from the IBus interface to more contextual ones. Besides making your code more testable, it simplifies your configuration and makes using the API a much richer experience.

So come and see how much easier it is to develop with the new context-specific interfaces—check out NServiceBus Version 6 now.

Actual knives not included ;-)

Footnotes

¹Unit Testing NServiceBus 6

About the Author: Tim Bussmann is a developer at Particular Software who loves building intention-revealing APIs. When not at his computer, he enjoys hiking in the mountains in his native Switzerland.

This post is part of a series describing the improvements in NServiceBus 6.0.

MSMQ was the very first NServiceBus message transport, and while not overly flashy, it got the job done. You could almost call MSMQ a finished product because, while it's updated with each Windows release, it doesn't really change much. It's solid, reliable, dependable, and overall, it Just Works™.

One of the biggest changes we made in version 6.0 of NServiceBus (V6) is that the framework is now fully async¹. The thing is, the MSMQ's API in the .NET Framework hasn't been updated to support async/await, so what could we do for the MSMQ transport in NServiceBus?

Make it go faster anyway. That's what.

The switch from reserved threads operating synchronously but in parallel to truly async tasks has created an opportunity to squeeze every last drop of performance out of the MSMQ transport.

V5: Threads

In previous versions of NServiceBus, we used multiple threads to process messages in parallel. In NServiceBus V5, you would configure parallel processing settings using the TransportConfig element in an App.config file like this:

<TransportConfig MaximumConcurrencyLevel="10" />

This would prepare 10 physical threads within the endpoint. Each thread would host its own message-processing pipeline that would operate relatively independent of the others. These independent NServiceBus threads would call the MessageQueue.Peek() method to check if there were any messages to be processed.

NServiceBus 5.x Pipeline

Each V5 processing thread is an independent message processor. As a result, each thread does its own peeking and then processes the message it discovers.

This pattern works well, but it doesn't go as far as we'd like. Within each thread, all processing is synchronous, so any IO-bound tasks are destined to slow things down. All of these synchronous blocking calls in the message-processing pipeline also block the thread that's doing the peek, and that prevents the thread from going back to the queue to check for additional messages—or doing anything else for that matter. Each thread is stuck in a silo, unable to do anything to help out its neighbors.

Peek() also throws an exception if no messages are available, and while exceptions are great for exceptional circumstances, message queues empty out all the time. Also, this particular exception, thrown whenever the queue is empty, can be very annoying to a developer when Visual Studio is configured to break on all exceptions.

V6: Tasks

In V6, we've made things work a little differently. Instead of fixed threads, we use tasks throughout the framework, and the .NET TaskScheduler ensures that all tasks are executed in the most efficient way possible.

This also means that it's easier for NServiceBus to manage parallel processing to maximum effect. The message pump now fetches waiting messages using the MessageQueue.GetMessageEnumerator2() method. For each message found, a new Task is created, and the .NET TaskScheduler determines the most efficient way to manage the overall workload, rather than relying upon a fixed number of processing threads.

NServiceBus 6.0 Message Pump

In V5, the thread peeking the message queue becomes repeatedly blocked by all of the synchronous work that has to happen afterwards in order to fully process the message before peeking again. With all of the processing work represented as tasks in V6, the peeking thread is free to go right back to the queue to obtain more work. This ensures that the processing pipeline is always full of work, which in turn results in the best possible throughput.

Although there's no fixed thread count like in V5, you'll still be able to control the number of concurrently processed messages. We do that by using a semaphore² to limit how many messages are dispatched via tasks to be processed, which controls the number of messages that can be "in flight" at any given time.

While we default to max(2, NumberOfLogicalProcessors) concurrent messages, you can configure the value to whatever you like:

endpointConfiguration.LimitMessageProcessingConcurrencyTo(10);

This feature allows you to control simultaneous connections to an overloaded SQL server or rate-limited web service. In these situations, you may even want to limit an endpoint to a single concurrent message. Of course, you can also use it to control the amount of CPU and memory consumed by an endpoint, which is important when those resources are limited.

At first blush, controlling the number in-flight messages in V6 might sound similar enough to controlling the number of fixed threads in V5. Given that, you might assume that the performance characteristics between versions would be roughly the same as well, but the results are pretty surprising.

Results

In informal performance tests with no-op message handlers, we were able to achieve about twice as many messages per second in message throughput using a V6 endpoint with its Task-based message pump as compared to its V5-based counterpart.

We think this is a pretty significant speed boost, and it wouldn't have been possible without the async API.

Of course, it's important to note that most message handlers will not be no-ops but will instead be full of IO-bound code for interacting with databases, calling web services, and sending other messages, so the difference in your own code's performance will vary. However, what is clear is that the more you are able to take advantage of async APIs within your own code, the better the throughput you'll get.

Summary

Although Microsoft provides no async API for MSMQ, we were able to achieve considerable performance improvements by using tasks instead of fixed threads to minimize wasted downtime during IO-bound operations. It's just one reason you should check out NServiceBus 6.0 right now.

More importantly, the performance improvements in the MSMQ transport illustrate an important lesson: switching to an asynchronous API can result in better application performance, even when some parts of the application are still synchronous. The bottom line is that the best time to adopt async/await is right now.

If you're unsure how to get started, take a look at our async/await webinar series. When you make the jump to async/await, we think you'll see the benefits too.

Footnotes

¹Async/Await: It’s time!
²Semaphore (programming) - Wikipedia

About the author: David Boike is a developer at Particular Software who likes things that are fast, like cheetahs, the Millennium Falcon, and the NServiceBus message pipeline.

When I make a promise to someone, I do my best to keep it. If I'm pretty sure I won't be able to do something, I don't make any promises about it. Instead, I say I'll try to address it eventually. It's all about managing expectations. In some ways, a promise is like a software interface — a kind of contract between the other person and me.

With asynchronous computations, we make promises in software too. They are similar to the promises you and I make, representing operations that haven't happened yet but are expected to happen in the future. In JavaScript, for example, there is an explicit Promise¹ construct. In .NET, this is done with the System.Threading.Task class.

Unfortunately, not everyone takes promises seriously — both in real life and in software. Sometimes people promise but don't deliver, violating the implicit contract of a promise. In software, null references are examples of implicit contract violators. Tony Hoare once called null references his billion dollar mistake.² While C# doesn't explicitly prevent it, you should avoid returning null at all costs when using Task-based APIs. The reason for this is that returning null from such an API can result in a NullReferenceException, and that can end up masking other production problems. Let's see how to avoid these kinds of "null promises."

Null doesn't deliver on its promise

Returning null from Task-based APIs can be tempting. Even within a code base that's mostly asynchronous, we might write some synchronous code. Let's take a look at how we might model the real-world promises our friends make to us in software:

interface IFriend {
    Task PromiseMeSomething();
}

The above IFriend interface should be implemented by my friends. Every time I ask them to promise something, I would call the interface's PromiseMeSomething method. Good friends never let you down, but there's always this one "friend" who just doesn't deliver:

class SomeoneWhoPromisesButDoesntDeliver : IFriend {
    public Task PromiseMeSomething() {
        Console.WriteLine("I promise you...");
        return null;
    }
}

Here we can see that the SomeoneWhoPromisesButDoesntDeliver class returns null from the PromiseMeSomething method. Since the code is all synchronous, that seems fine, at least to the compiler. After all, null is a valid return type for reference types like Task. Unfortunately, there are consequences to this implementation for the caller. Let's see what happens when I ask for a promise from my soon-to-be-former friend:

class Me {
  public static async Task WithALittleHelpFromMyFriends() {
      var exFriend = new SomeoneWhoPromisesButDoesntDeliver();
      await exFriend.PromiseMeSomething();
  }
}

When we execute this code, our "friend" causes a NullReferenceException (that son of a...). Unfortunately, analyzing the stack trace of the thrown NullReferenceException doesn't give us much:

  at Program#10.<Me>d__1.MoveNext()
  --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.GetResult()
   at Program#11.<<Initialize>>d__0.MoveNext()

Since the invocation of PromiseMeSomething is done asynchronously, the compiler generates a state machine to execute it. It's in that generated code that the NullReferenceException is thrown. The problem here is that this stack trace doesn't give us any idea as to which code caused the exception. Was it in the Me class in SomeoneWhoPromisesButDoesntDeliver? Or was it some other code that SomeoneWhoPromisesButDoesntDeliver called? There's no way to know other than stepping through the code, line by agonizing line. Now imagine how much more frustrating it would be if we were iterating through a collection of IFriend objects rather than calling just one.

Promise and deliver!

As we just saw, trying to await a null object results in a NullReferenceException. So, even if your code isn't async, you should always return a proper Task object just in case your caller isasync. One way of doing this (in .NET 4.6 or higher) is by returning Task.CompletedTask— a preallocated and already-completed Task object that can be awaited.

Now, let's take a look at an implementation of IFriend from a good friend who would not let me down, even if implemented synchronously:

class SomeoneWhoPromisesAndDelivers : IFriend {
    public Task PromiseMeSomething() {
        Console.WriteLine("I promise you...");
        return Task.CompletedTask;
    }
}

Avoiding null

It's all well and good to tell our friends not to return null in their promises, but our code should be safe for any of our callers — meaning we should check for null before awaiting any task. Modifying our earlier method, we would have this:

  public static async Task WithALittleHelpFromMyFriends() {
      var exFriend = new SomeoneWhoPromisesButDoesntDeliver();
      var promise = exFriend.PromiseMeSomething();
      if (promise != null)
        await promise;
      else
        throw new SomeMeaningfulException();
  }

But it's annoying to always have to check for null, so consider using extension methods to clean things up:

public static async Task WithALittleHelpFromMyFriends() {
    var exFriend = new SomeoneWhoPromisesButDoesntDeliver();
    await exFriend.PromiseMeSomething().ThrowIfNull();
}

The extension methods can then handle the boilerplate code:

public static Task<T> ThrowIfNull<T>(this Task<T> task)
{
    if (task != null)
    {
        return task;
    }
    throw new Exception(TaskIsNullExceptionMessage);
}

public static Task ThrowIfNull(this Task task)
{
    if (task != null)
    {
        return task;
    }
    throw new Exception(TaskIsNullExceptionMessage);
}

With these extension methods, if our IFriend returns null, we'll see exactly where it happens.

Summary

If a method requires you to return a Task and you need to implement the method synchronously, return Task.CompletedTask instead of null to avoid a NullReferenceException happening deep in the call stack of the compiler generated code. You can also use extension methods to protect yourself from code that might return null instead of a Task.

If you want to find out more about avoiding pitfalls in asynchronous codebases, check out our webinar series.

Footnotes

About the Author: Daniel is a solution architect at Particular Software and leads the asyncification of NServiceBus and the ecosystem around it. He doesn't await for good things to happen. He makes them happen.

This post is part of a series describing the improvements in NServiceBus 6.0.

Scaling out a web server is easy. All you have to do is stand up a bunch of web servers behind a load balancer like HAProxy and you're covered. Unfortunately, it hasn't been quite as easy to scale out MSMQ-based NServiceBus systems.

That is, until now.

But first, let's take a look at how things currently work. The MSMQ Distributor component uses roughly the same model as a load balancer, but for MSMQ messages rather than HTTP requests. The main difference is that the distributor can hold messages in a queue, waiting for an available worker to be ready to process it.

The thing is, setting up the distributor so that you can scale out an MSMQ endpoint is quite a bit more complicated than scaling out when using HTTP load balancers or even a broker-based message transport like RabbitMQ or Azure. In those transports, you can scale out simply by adding another instance of an endpoint¹. Granted, the other message transports that use a centralized broker have to worry about the uptime and availability of that broker, whereas MSMQ is inherently distributed and doesn't have to worry about a single point of failure.

But it would still be nice if it were easier to scale out MSMQ.

Anatomy of a Distributor

The distributor receives messages from senders and passes them on to various worker nodes. Just like regular load balancers, the distributor doesn't know how to process messages; it only routes them to the available workers. Since web requests need to be processed right away, they are all assigned to a web server immediately. But MSMQ messages can wait in a queue. For this reason, the distributor doesn't assign a message to a worker until the worker node tells the distributor that a thread has finished processing a message and is now able to handle another one.

Distributor

When a worker checks in, the distributor notes the availability in its own storage. That way, it can route incoming work items to workers that are ready for them. This requires extra messages, which can create a certain amount of overhead.

Just as it's possible to overwhelm a normal load balancer, there are only so many messages per second that a distributor can handle. At some point, the distributor will hit a maximum limit, and, unfortunately, it wasn't designed to be scaled out. For simple tasks that aren't CPU or I/O intensive, adding more workers isn't effective because the distributor becomes the bottleneck. The single-node distributor simply can't get messages to the workers fast enough.

The distributor is also a single point of failure for the messages it handles. It should really be set up for high availability, the way load balancers are. Load balancers are usually made highly available through a kind of active/passive configuration between two servers. The two servers monitor each other through heartbeats on a private network connection, with the standby server ready to jump into action when it detects the primary server is not responsive. Unfortunately, this is a bit harder to do for the distributor, since MSMQ needs to be made highly available as well.

Finally, you need to plan to have a distributor in place for each endpoint before you can scale it out. It's a lot easier with the other message transports. Take, for instance, the RabbitMQ transport. Once you set up the message broker for high availability, any endpoint can be scaled out just by standing up a new worker. When using the distributor, every scaled-out endpoint requires additional work and reconfiguration.

Removing the bottleneck

NServiceBus Version 6 introduces a new feature called sender-side distribution to replace the distributor as a method to scale out with the MSMQ transport. Before, each worker would check in with its distributor. This way, the distributor was the only actor that knew where all the workers were located. Now, with sender-side distribution, the knowledge of where endpoint instances are deployed gets distributed throughout the system so that all endpoints can collaborate with each other directly.

Sender-side distribution

With sender-side distribution, the routing layer at the sending endpoint knows about all the possible worker instance destinations upfront. This makes it possible to rotate between destinations on every message that's sent. It's as if a DNS server returned all possible IP addresses for a domain, and the browser rotated between them on different requests. There's no need for another piece of infrastructure (e.g., a load balancer or even a DNS server) in the middle. This autonomy results in a big payoff for messaging-based systems.

With the distributor removed as a single choke point, scaling out pays off even for small tasks. There's no limit to how many processing nodes can be used and no diminishing returns from adding more. The infrastructure doesn't have to be planned as carefully. Even if you didn't plan for high load from the start, it's easy to add more instances to handle the work without needing to set up a distributor first. High availability is no longer an issue and Windows Failover Clusters are not needed because there is no distributor acting as a single point of failure.

This new topology enables ultra-wide scalability, as you could conceivably stand up hundreds of nodes to collectively process a ridiculously high number of messages per second. Messages would be spread out among all these instances without any centralized actor needed to coordinate anything.

The default message distribution strategy uses a simple round-robin algorithm, rotating through the known endpoint instances for each message sent. However, you can plug in your own custom implementation².

The configuration for sender-side distribution could not be simpler. One XML file defines each endpoint name and its collection of instances:

<endpoints><endpoint name="Sales"><!-- Scaled-out endpoint --><instance machine="Sales1"/><instance machine="Sales2"/></endpoint><endpoint name="Shipping"><instance machine="Shipping1"/></endpoint></endpoints>

Each endpoint instance can have its own instance mapping file, or all endpoints can read the same file from a centralized file server so that changes to system topology can be made in one convenient location. The new routing features would also allow you to integrate directly with service discovery solutions like consul.io, zookeeper, or etcd. Combined with these tools, the routing table would dynamically update as soon as endpoint instances either become available or are removed.

Of course, we won't force you to use this new distribution scheme. The distributor is still supported in legacy mode in NServiceBus Version 6 to ensure backward compatibility. However, we're sure you'll like the sender-side distribution a whole lot more once you give it a try.

Summary

For the MSMQ transport, sender-side distribution reduces the amount of upfront infrastructure planning you need to do. One shared file can keep all endpoints updated on what instances can process messages. Adding and removing endpoint instances can be done without the need to reconfigure the topology to make room for a distributor. Just add more instances when you need them, and remove them when you don't.

You won't need a distributor in a Windows Failover Cluster anymore. To ensure high availability, simply add two or more instances of every endpoint. It's that simple.

Simpler is better. So go ahead, get rid of the distributor and use sender-side distribution instead. Grab NServiceBus Version 6 today and check it out.

Footnotes / Further Reading

¹ On centralized broker-based transports, scaling out is accomplished using the competing consumers pattern.
² Check out our fair load distribution sample for an example of a routing strategy that keeps the queue length of all load-balanced instances equal.
Documentation: Scaling out with sender-side distribution

This post is part of a series describing the improvements in NServiceBus 6.0.

In order to support the async/await keywords in NServiceBus 6.0, we had to make the first ever breaking change to the message handler API. We realize that this is a big change, and it's not one that we made lightly. The move to async/await required that the Handle method signature return a Task instead of void. At the same time, we replaced IBus with context-specific parameters to make it clearer which messaging operations are available from within a message handler.

So now instead of this:

private IBus bus;

public void Handle(MyMessage message)
{
    // Your code here
}

We now have this:

public async Task Handle(MyMessage message, IMessageHandlerContext context)
{
    // Your code here
}

In order to make the conversion process as easy as possible, we've prepared a screencast that demonstrates how to convert a message handler from the previous syntax to the async-enabled API in NServiceBus 6.0.

Take a look:

Of course, we realize that it isn't always fun to pause and replay a video, so here are the same basic instructions in text form:

Because the definition for IHandleMessages<T> has changed in NServiceBus 6.0, reimplement the interface so you can see the new method signature.
Visual Studio will place the new Handle method at the bottom of the class. Copy the method signature, and then remove the empty method.
Paste the method signature from the clipboard over your existing method signature so that it's replaced with the new signature.
Notice the new context method parameter, which replaces the IBus instance.
To ease replacement, rename the current IBus instance from bus (or whatever you use in your project) to context. Because of this refactoring, you won't need to replace every reference.
After renaming, remove the old IBus instance from the class.
Add the async keyword to the handler method so that you have public async Task Handle.
All of the messaging operations on the context parameter are async methods. Add the await keyword to all these calls, and add ConfigureAwait(false) to the end. As an example, this.context.Publish(new MyEvent()); becomes await context.Publish(new MyEvent()).ConfigureAwait(false);.

Keep in mind that NServiceBus 6.0 is fully backwards compatible with NServiceBus 5.x, meaning an older endpoint can send messages to a 6.0 endpoint and vice versa. This means you can take your time and convert one endpoint at a time with minimal risk.

So now that you've learned how to update your handlers, pick an endpoint to convert over and get started with NServiceBus 6.0 right away. For even more complete guidance, check out Asynchronous Handlers in our documentation.

Many .NET developers have been busy upgrading their code to take advantage of the async and await keywords. These keywords make asynchronous programming a lot easier by allowing us to represent a call to an asynchronous method almost as if it were a synchronous one. We just add the await keyword, and the compiler does the hard work of dividing the method into sections and keeping track of where to resume execution once async work completes.

However, it's difficult to hide all the complexity of asynchronous programming behind a couple keywords, and there are a host of pitfalls and gotchas that you should be aware of. Without proper tooling, it's all too easy for any one of them to sneak up and bite you.

We've experienced this firsthand. Through the development of NServiceBus 6.0, we've learned quite a bit about async/await. In the latest version, NServiceBus was completely upgraded to support async/await from top to bottom, for every API that could potentially be I/O bound. There's so much async in NServiceBus that we're intentionally not using the -Async suffix (for good reason) in any of our APIs.

In this article, we'll share some of the tools and techniques we've found that can help make your transition to async/await as smooth as possible.

Treat warnings as errors

When you're writing async code, it can be really easy to forget to await a Task-returning method. You might accidentally or absentmindedly write code like this:

public async Task Handle(DoSomething message, IMessageHandlerContext context)
{
    context.Publish(new SomethingHappened { Id = message.Id });
    // Oops, forgot to await!
}

When you do this (and you will eventually), the IDE will quietly remind you with a blue squiggly line and compiler warning CS4014: "Because this call is not awaited, execution of the current method continues before the call is completed. Consider applying the 'await' operator to the result of the call."

The await keyword is what brings the result of the asynchronous operation back into the calling method, both in cases of success and failure. That means that if you forget the await keyword, an exception raised within that method may be silently ignored. Now, we all know to avoid creating empty catch blocks, but that's exactly what's caused by forgetting the await keyword.

The thing is that it's just too easy to miss these warnings. To make sure these aren't forgotten, consider enabling the Treat warnings as errors option in your build configuration to ensure that these useful hints won't be missed.

ConfigureAwait

Microsoft's async/await best practices state that you should use ConfigureAwait(false) almost anytime you await something. This way, the TaskScheduler knows that it can use any available thread when the method resumes after the await. The only exception to the rule is when execution must continue on a specific thread, such as the UI thread in a client application.

When you use ConfigureAwait in your code, it often looks something like this, with the .ConfigureAwait(false) call tacked on to the end of the method that returns a Task:

public async Task AlreadyAsyncMethod()
{
    await DoSomethingAsync().ConfigureAwait(false);
}

The reason ConfigureAwait is used here is to enable tasks to continue on any available thread. For more information on why this is important, check out the article Context Matters. Suffice it to say, for library code that doesn't care which thread it executes on, you should use ConfigureAwait(false) every time you use await. Eventually, you will find that adding ConfigureAwait all over the place becomes tiresome and is easy to forget.

There are two ways you can deal with this. You can use a Roslyn-based tool to make sure you never forget to add ConfigureAwait, or you can use IL weaving so you don't have to type it manually at all. Either way, you'll come out ahead.

Never forget

The best way to ensure you never forget about ConfigureAwait is to have the compiler remind you. The .NET Compiler Platform ("Roslyn") is the perfect tool for this.

We created the Particular.CodeRules NuGet package to enforce our own usage of ConfigureAwait using Roslyn. With this package in place as a project dependency, forgetting to use ConfigureAwait will result in a compile-time error that fails the build — and nobody will ignore that.

If you like, you can use the NuGet package directly or check out the source code on GitHub.

Set it and forget it

All those calls to ConfigureAwait(false) can get pretty distracting when littered throughout your code. If you'd rather not deal with them at all, there's a way to include them with tooling instead.

You can do this with the ConfigureAwait add-in for Fody, which allows you to set your ConfigureAwait preference at a global level. You can add an attribute to an individual method, a class, or even an entire assembly, and then Fody will add the .ConfigureAwait(value) to the assembly's intermediate language (IL) during compilation at any point where the await keyword is used.

With that preference set, the tooling takes care of the rest, and you can forget about typing .ConfigureAwait(false) ever again.

Optimize async usage

Anytime you're implementing a method that returns a Task or a Task<T>, you have the option of dealing exclusively in tasks or adding the async and await keywords and letting the compiler generate a state machine to run the async continuations for you.

And so, these two methods are functionally equivalent. The only difference is the decision to return the Task or await it:

// Return the Task from the Publish operation directly
public Task Handle(DoSomething message, IMessageHandlerContext context)
{
    return context.Publish(new SomethingHappened { Id = message.Id });
}

// Add the async keyword and await the Publish
public async Task Handle(DoSomething message, IMessageHandlerContext context)
{
    await context.Publish(new SomethingHappened { Id = message.Id });
}

The thing is, the state machine created by the second example is not entirely free. Every bit of executed code, whether created by you or by the compiler, has a cost.

When a method only awaits one operation and does so at the end of the method, it's usually not worth it. Instead, we can directly return the task, as shown in the first method, and avoid the cost of the async state machine altogether. In addition, we don't need ConfigureAwait(false) when the Task isn't awaited either.

These are only small differences in performance, and in the large majority of business code out there, it's probably premature optimization and won't really matter. However, if you are writing code on the hot path in a performance-sensitive system, these things can add up.

Luckily, there is a tool that can easily point out these cases for you. It's called AsyncFixer and is available either as a Visual Studio extension or a NuGet package. In addition to pointing out unnecessary usages of async/await (and offering to fix them for you), the extension will save you from breaking other async/await best practices such as using async void methods.

Summary

Writing asynchronous code doesn't have to require a state of constant vigilance to make sure we avoid pitfalls. Smart developers use tooling to make writing asynchronous code as simple and smooth as possible.

These tools have certainly been helpful to us as we've been implementing async in NServiceBus Version 6. We hope that you find them useful when designing your own systems as well.

And, finally, if you're interested in more guidance on async/await, check out our async/await webinar series to see how you can avoid some of the more common problems in asynchronous codebases.

Do you have any other tools you find useful for wrangling async code? If so, let us know in the comments. We'd love to hear from you.

About the author:David Boike is a developer at Particular Software who used to forget to use ConfigureAwait at least once per week until he found the tooling to handle that for him.

.NET Core is out. Officially. After a lot of waiting, it's finally a real, finished, RTM thing, and as developers who are passionate about the .NET ecosystem, we're very excited about it. How cool is it to know that we can write code in C# (or VB.NET, or F#, etc.) and have it run on Windows, Linux, and even macOS?

So why on earth doesn't NServiceBus support .NET Core yet? Well, good question.

Yeah, why not?

NServiceBus is a framework that provides reliable messaging. Stability and reliability are core features that we can't compromise on. These goals simply aren't served by trying to chase the bleeding edge.

And, even though .NET Core has reached its 1.0 release, it's still bleeding edge. Many familiar APIs are not yet supported, and not enough time has passed for there to be a proven track record of mission-critical .NET Core systems running on non-Windows OSs. At least, not yet. The time will come, but it's currently too early to dive into .NET Core headfirst.

Instead, we've been working very hard on delivering NServiceBus 6.0, which delivers on our promise to make the framework fully async. Unlike .NET Core, async/await is no longer bleeding edge. It's mainstream and broadly used, so that's what we've been focusing on.

However, that doesn't mean we've turned a blind eye to .NET Core. On the contrary, we've been tracking it very closely and, where possible, we've made decisions that will make future support for .NET Core easier to achieve.

Steps in the right direction

Based on our knowledge of what was coming in .NET Core, there are a few things we've already been able to do to work toward full .NET Core support.

For example, in .NET Core, there is no ConfigurationSection class. Prior to Version 6, NServiceBus relied upon multiple configuration sections to control a variety of things like message routing rules, number of message processing threads, and where to send poison messages.

In Version 6, we've already started moving away from XML-based configuration sections, using a code-first configuration style instead. As a result, configuration settings are more discoverable than in XML configuration files, and they can provide IntelliSense support to help developers find and understand the available settings. Going code-first also makes it easy for developers to pull this configuration data from wherever they please, whether that be a config file, a database, or a web service.

One side effect of starting to use code-first configuration is that .NET Core will be that much easier to support. Although we will be continuing our support of ConfigurationSection-based settings in NServiceBus 6.0, we've set the stage for a future where configuration can be controlled optimally for each platform.

There are other features of NServiceBus that have dependencies not available in .NET Core. Luckily, the NServiceBus architecture implements these features as independent components that can be individually enabled or disabled. We're continuing to refine the boundaries of these components so that those dependent upon capabilities not provided by .NET Core can be easily broken off into external packages. This will allow customers who are content to run on Windows and the full .NET Framework to continue using those features just by pulling in additional packages where needed.

An example of this kind of split is the NServiceBus.Callbacks package, which has been split apart from the main NServiceBus library in Version 6. We will use this model to split off additional features, like Windows performance counters, that can't work on non-Windows platforms.

What's left to do

NServiceBus isn't a simple library that can be ported to .NET Core in an afternoon. While not exhaustive, here are some of the larger items that will need to be addressed to fully support .NET Core.

MSMQ Transport

Even if you aren't using the MSMQ transport in your NServiceBus project, you still end up hosting the code for it in your process. The reason for this is that MSMQ support has been embedded into the NServiceBus core library as the default transport from the very beginning. MSMQ is a Windows-only technology, so in order to support .NET Core, the MSMQ transport will need to be split off into a separate NuGet package, similar to the other transports.

The thing is, a built-in, default transport is a good thing to have because it makes it that much easier to get started with NServiceBus. We don't want that to change. So when the MSMQ transport is moved to a separate package, we'll ship a new file-based transport in the NServiceBus core library instead. This will make getting started even simpler than with MSMQ since it won't require any system-level installations.

Of course, you'll still be able to use MSMQ as a transport. You'll just need to use the full .NET Framework and stick with hosting on Windows. And if you do want to run on non-Windows platforms, you'll eventually need to pick a production-ready transport that will support .NET Core, like RabbitMQ, Azure Service Bus, or SQL Server.

Other dependencies

These are some of the smaller items we know will also need some attention in order to complete our support for .NET Core:

System.Transactions— Although used mostly for the MSMQ transport, which can't support .NET Core anyway, TransactionScope and related classes have tendrils into other corners of the framework as well.
AppDomain— This is used for assembly scanning to wire up handlers, sagas, and other components identified at runtime. The AppDomain class is not available in .NET Core, so this functionality will need to be rewritten to use new APIs.
Windows Performance Counters— As these obviously will not work on non-Windows systems, this functionality will need to be split into a separate package. We also need to find a way to enable this functionality on other platforms.
Encryption– NServiceBus uses the RijndaelManaged class for message property encryption. This may become available in .NET Core 1.2 but, in any case, it may make sense to split off the encryption feature into a separate package.

And of course, we're sure this is just the tip of the iceberg. A conversion project like this rarely goes exactly as planned. We're certain there will be surprises, but we're confident that the technical problems are all solvable. There's more to the solution than just technical issues, though.

It's not just about tech

There are a bunch of libraries out on NuGet that have been ported to .NET Core fairly quickly, and you could take those libraries, build an app, and deploy it on Ubuntu without much fuss.

Your scientists were so preoccupied with whether you could, you didn't stop to think if you should.

But let's be honest with each other for a minute here. Nobody has sufficient experience running production .NET workloads on any of these Linux distributions. If you have trouble in production, the maintainers of those libraries may or may not be able to help you. There may be slight variations in behavior between the .NET implementations on each platform that may mean the success or the failure of your mission-critical system.

But that's not how we do things around here. Before we release support for .NET Core, we'll make sure to put each supported platform through the same rigorous testing process we apply with every release. It will also mean that we'll be there for you — even at 3 AM on the weekend — to help you through any issue on each of those platforms.

We won't ship with support for .NET Core until we can make good on those promises.

Summary

.NET Core is here and shows a lot of promise. It's the future of the .NET platform, so we understand all of the excitement around it. In particular, the .NET Standard 2.0 specification seems to do a lot to unify the full .NET Framework with .NET Core, and the .NET Core version that implements that standard will probably benefit from a lot of stability that typically makes its way into "Version 2.0" products. But despite the excitement, it's just too risky for us to go all in on .NET Core just yet.

Users depend upon the reliability of NServiceBus for the backbone of their mission-critical systems, and you can't provide that reliability on an unproven platform. We won't release .NET Core support until it's fully tested and meets the high standards you've come to expect from us. So, for now, we'll continue the gradual, evolutionary approach we've been taking so far.

In the meantime, you can subscribe to receive email updates when we have more information to share.

Happy holidays to all of you readers out there! Lately, several of our staff members have been active in the community, sharing valuable insights into building distributed systems. We thought we'd take this opportunity to share some of those resources, as well as other fun stuff, with you. With a bit of async/await, Azure Service Bus, AngularJS, code-driven visualizations, and even a throwback to the Commodore 64, there's something in here for everyone. We hope you enjoy!

Context Matters

Using async and await can make asynchronous code easier to write because it hides some of the details. But Tim Bussmann points out that, sometimes, knowing where your application is running can make the difference between functional code and a race condition.

Progress bars aren't all liars

Creating progress bars for web applications can be troublesome. Using a bit of AngularJS and NServiceBus, Colin Higgins shows us how to create them so that they are responsive and accurate.

Bend Message Deduplication on Azure Service Bus to Your Will

Detecting duplicates when sending messages to Azure Service Bus is as important as it is non-trivial. But Sean Feldman has your back. He describes how you can leverage Azure's built-in deduplication capabilities to make sure your messages aren't processed more than once.

Getting the most from Azure Service Bus

Thanksgiving might be over, but shopping season is still upon us. Daniel Marbach has a series of posts on how to maximize your throughput on Azure Service Bus to make sure you never miss a sale:

Video: A picture is worth 1000 lines of code

In his presentation at NDC Australia, Mike Minutillo shows a variety of tools and techniques that can be used to visualize software systems, from the macro scale all the way down to the nitty gritty details. In it he shows off C4 diagramming with Structurizr, as well as a method we've developed at Particular Software for visualizing all of the relationships between messages and handlers in your NServiceBus system.

Slack looks great running on a Commodore 64

Since we use Slack for our internal communications, when we saw that Jeff Harris wrote a Slack client (of sorts) for the Commodore 64, we thought it was damn cool—though still not as entertaining as Impossible Mission.

— The team in Particular

This post is part of a series describing the improvements in NServiceBus 6.

The new RabbitMQ transport for NServiceBus 6 has one overriding theme: speed.

Although we've added a few other features as well, the biggest news is how much faster we've made the new version of the RabbitMQ transport. We've redesigned the message pump to be more efficient, so it can handle more incoming messages. Outgoing messages are sent faster. We've even contributed changes to the official RabbitMQ Client project to increase its performance. Almost everything we've done was focused on making your systems faster and more efficient.

Let's take a closer look at the improvements to the RabbitMQ transport and find out just how much faster the new version is.

Faster receiving

In order to handle multiple incoming messages more efficiently, we redesigned the RabbitMQ message pump. Now it's a lot easier to scale a single endpoint for maximum performance while receiving messages.

Previously, if you set the NServiceBus concurrency settings to have up to five messages processed concurrently, the message pump would create five separate polling loops, channels, and queue consumers. In addition to this, a PrefetchCount value in the connection string controlled the number of messages the broker would send a consumer before waiting for message acknowledgement. This prefetch count made sure each consumer always had enough work to keep itself busy, but each consumer would apply this value separately. As a result, an endpoint could end up prefetching more messages than might be expected (Concurrency × PrefetchCount).

While this approach worked, it was more complex than it needed to be. The use of multiple polling loops and consumers put an upper limit on the amount of concurrent work that could be done, and finding that optimal balance between the concurrency level and prefetch count could require a lot of trial and error.

The new design makes this much simpler. Instead of creating multiple polling loops, the new message pump doesn’t have any loops at all. Now it uses event-based polling to create a Task that handles each incoming message. And it does that without any extra channels or consumers either.

The new design also sets the consumer's PrefetchCount to three times the Concurrency value by default so that the endpoint can continue processing messages without waiting for more to be fetched from the server. This default makes it easier to scale effectively without as much trial and error, as there aren't as many switches and levers you need to experiment with. However, for those intent on tweaking for maximum performance, the default multiplier can be changed---or if you like, you can override the whole formula with a specific value.

Increasing receive performance and efficiency is a big win for any message-based system. But we didn't stop there. We wanted to make sending messages faster too.

Faster sending

In the previous version of the transport, whenever you sent a message outside of a message handler, we had to create and open a new channel, use that channel to send the message, and then close the channel. Closing the channel also blocked the calling thread so it could wait for confirmation that the broker received the message. All this resource allocation and thread blocking was a significant drain on performance.

In the new version of the transport, we keep a pool of open channels for sending messages. If there is an unused channel in the pool when one is needed, it's reused. Otherwise, a new channel is created and added to the pool when the sending code finishes with it. This means there's no longer a channel opening/closing cost incurred per message.

Instead, we now create a Task per message and can verify that messages are delivered to the broker without blocking any threads. The use of tasks also allows you to send messages in parallel by starting all your send operations, collecting the tasks, and then waiting for them all to finish with a single await Task.WhenAll(tasks). This is extremely useful in fan-out situations, such as when you process files coming from a third party and send out an individual message for each record in the file.

With these improvements, send performance is going to be much faster across the board. But there were even more performance gains to be made deeper down the call stack.

Faster internals

While we were doing performance testing on the new version of the RabbitMQ transport, we noticed a serious performance drop on more modern CPUs. On a machine with a brand new Skylake i7-6700K, we saw performance that was five times worse than on an older Sandy Bridge i7-2600K. Even an older Core 2 Duo machine was outperforming the brand new Skylake CPU.

Upon further testing, we discovered that performance suffered for every processor generation after Sandy Bridge. The effect was the most pronounced on the newest Skylake chipset, which really should have been the fastest of the bunch.

Since the RabbitMQ .NET client is also open source, we were able to track down some nasty lock contention occurring in its ConsumerWorkService, developed a fix, and got it accepted into their 4.1.0 release. The result is faster performance for all developers using the .NET RabbitMQ library, including those on NServiceBus.

How much faster?

The results are pretty amazing. The RabbitMQ transport is over five times as fast as before, both at sending and receiving messages.

Prior to the release of NServiceBus 6, we ran comprehensive performance tests between NServiceBus versions 5 and 6. In both cases, we used RabbitMQ Server 3.6.5 on Erlang/OTP 18.3. The hardware used for the benchmark doesn't matter much¹ because we compared throughput performance of different versions using the same hardware. The only important detail on the hardware setup is that we used a workstation with a Skylake CPU, which suffers from the performance bug mentioned earlier.

We specifically compared three versions of the RabbitMQ transport:

3.4– NServiceBus 5 with the RabbitMQ client containing the lock contention bug
3.5– NServiceBus 5 with the updated RabbitMQ client fixing the lock contention bug
4.1– NServiceBus 6 with the RabbitMQ client lock contention fix as well as all of the performance improvements in NServiceBus 6 and the newest version of the transport

Each test case was run three times, and the fastest result for each scenario was used.

The following table shows the throughput improvement in each comparison. For instance, a value of 2.0 would mean that the newer version handled twice as many messages per second.

Matchup	Versions Compared	Send Throughput Improvement	Receive Throughput Improvement
V6 improvements only	3.5 => 4.1	5.45	1.69
V6 + lock contention fix	3.4 => 4.1	5.54	6.66

The message here is clear. The RabbitMQ transport is fast – more than five times as fast as before, both at sending and receiving messages.

Other features

Even though making things go fast is one of our favorite things to do, we managed to make a few other improvements in the RabbitMQ transport as well.

Security

As more infrastructure moves to the cloud, it is becoming increasingly important for systems to be able to communicate securely, whether running in the same rack or on opposite sides of the planet.

To enable secure communication with RabbitMQ brokers, we added support for the AMQPS protocol. If your broker already has a certificate installed, securing the connection is as simple as adding UseTls=true to your connection string. For additional security, we also support client-side authentication by using client certificates. These features were back-ported to the 3.2.0 release of the transport, so it's usable with NServiceBus 5 as well.

As a result, you can now easily use a hosted RabbitMQ provider like CloudAMQP to manage and maintain the servers and configuration for you. This makes RabbitMQ a much more attractive transport when it comes to deploying solutions to the cloud.

Built-in connection auto recovery

Another improvement makes the RabbitMQ connection recovery process more efficient. When there is a connection interruption between the endpoint and the broker, the transport needs to be able to reestablish the connection, recreate the channels on that connection, and resume message consumption.

When we first built the RabbitMQ transport, the .NET RabbitMQ client did not have any concept of automatic connection recovery. So we built our own. Over time, the RabbitMQ client has added this feature, so we are now using the built-in auto-recovery in the transport. Being closer to the metal, the RabbitMQ client is able to handle connection recovery much more transparently, without having to destroy and recreate components left in a faulted state. From the outside, the connection more or less appears to pause and then resume automatically, rather than spamming the logs with error messages.

Summary

Rabbits should be fast, so we made the RabbitMQ transport go fast. Suffice it to say, this is the fastest RabbitMQ transport we've ever delivered.

Together with the improvements in security and connection auto-recovery, there are now a lot more reasons to consider building an NServiceBus solution with RabbitMQ.

So go ahead and give NServiceBus 6 a try today.

About the author:Brandon Ording is a developer at Particular Software who maintains both the NServiceBus core and the RabbitMQ transport. He used to have rabbits as a kid, and finds the digital ones to be much easier to care for.

¹ The actual hardware used for the performance benchmark was a workstation with an Intel Core i7 6700K "Skylake" CPU processor at 4.4 GHz, RAID 1 hard disk array (non-SSD) and 32GB of RAM. RAM utilization was quite low during the test and shouldn't be considered an important factor. Since RabbitMQ is heavily I/O bound in terms of performance, faster SSD disks would be a great way to improve overall throughput. But since the benchmark shows performance improvement, the disks used largely don't matter either.

This post is part of a series describing the improvements in NServiceBus 6.

Tests are the lifeblood of many large codebases. They protect you from introducing bugs and, in some cases, are instrumental in your code's design. Because of this, the maintenance of those tests is every bit as crucial as the underlying code being tested. Like the rest of the project, your tests should be clear, concise, and consistent with your code style. Otherwise, tests might fall into disrepair and end up in a large bucket called technical debt, never to be heard from again.

Crossroads: Success or Failure by Chris Potter, reused via CC by 2.0 license

With NServiceBus 6, we've made it even easier to build tests that follow established conventions and that align more closely with your existing code. Before we see what's new, let's take a quick look at how tests for NServiceBus handlers were written in previous versions.

Testing with the fluent API

Let's say we want to test to make sure that when a handler receives a StartProcess event, it will publish the Started event but not send a ContinueProcess command. Using the fluent API in NServiceBus 5, we initialize our handler in the Test entry class, set up what we expect to happen, then trigger the action. Behind the scenes, the framework ensures all the expectations have been met and reports back if they haven't:

[Test]
public void TestHandler()
{
    Test.Handler(new SampleHandler(new SomeDependency()))
        .ExpectPublish<Started>(msg => msg.SomeId == 123)
        .ExpectNotSend<ContinueProcess>(msg => true)
        .OnMessage(new StartProcess { SomeId = 123 });
}

Although this style of unit testing is very compact and readable, the fluent API does introduce some challenges. For example, with all API calls rolled into a single statement, it can be difficult to step through a test and pinpoint exactly where it is failing.

While we still support that fluent API, we introduced another way of testing your handlers in NServiceBus 6. Let's take a look at it now.

Tests that match your style

Starting with NServiceBus 6, the new version of the testing framework has less ceremony and doesn't force you to use the fluent API. Because of this, you can write tests for your message handlers that can adapt better to your existing test suite.

For example, you might be familiar with the Arrange-Act-Assert pattern of writing unit tests. With this testing pattern, we arrange the subject under test with all required parameters and dependencies. Then we act by invoking the method we want to test. Finally, we assert that everything worked as expected. With NServiceBus 6, it's easy to follow this pattern to test your message handlers, sagas, and other custom NServiceBus components.

Let's rewrite our previous example using the Arrange-Act-Assert approach:

[Test]
public async Task TestHandlerAAA()
{
    // Arrange
    var handler = new SampleHandler(new SomeDependency());
    var context = new NServiceBus.Testing.TestableMessageHandlerContext();

    // Act
    await handler.Handle(new StartMsg {SomeId = 123}, context);

    // Assert
    Assert.AreEqual(1, context.PublishedMessages.Length);
    Assert.AreEqual(0, context.SentMessages.Length);
    var started = context.PublishedMessages[0].Message as Started;
    Assert.AreEqual("test", started.SomeId);
}

While this may be more verbose than it would be with the fluent API, it's clear to see exactly what we're testing: the Handle method on our SampleHandler. Furthermore, we can more easily see what conditions must be met in order for this test to pass. And if it fails, we'll know precisely which condition caused the failure.

You may have noticed in the Act stage above that message handlers now receive an additional context parameter. From the message handler, the context parameter allows us to send or publish messages, as well as access message headers. With the testable implementations provided by the NServiceBus.Testing package, we can inspect the results of these operations to ensure our handler behaved correctly.

Another advantage of testing with NServiceBus 6 is that you can extend it to be even more expressive. For example, we previously saw how we could make assertions on various elements of the SentMessages and PublishedMessages collections. These collections can also be extended to be more intention-revealing. We could, for instance, change the Assert.AreEqual(0, context.SentMessages.Length); above to context.SentMessages.Should().BeEmpty(); by using the FluentAssertions library.

Do I have to rewrite my old tests?

Not at all. We recognize that it's not practical to migrate an entire suite of fully functioning tests from one framework to another so your existing tests will continue to work as is. The new approach is not intended to replace your existing NServiceBus tests but to allow you to write new tests that match the conventions and style of your current non-NServiceBus tests.

The advantage of having both styles of testing available is that it removes the pressure of upgrading your tests from previous versions of NServiceBus to version 6. You can migrate your tests from one approach to the other gradually. Over time, your tests will become more maintainable, as developers old and new can follow the same conventions for the entire test suite.

Summary

NServiceBus 6 makes writing tests for your NServiceBus message handlers and other components easier than ever. They can be written more expressively with the same conventions and libraries used in the rest of your code, making them easier to understand and more maintainable. We can't wait for you to try it out!

The new testing library is part of NServiceBus Version 6. For some examples of testing with NServiceBus 6, check out our unit testing sample.

Give it a try – your tests will thank you for it.

About the author: Tim Bussmann is a developer at Particular Software. When not at his computer, he enjoys hiking in the mountains in his native Switzerland.

This post is part of a series describing the improvements in NServiceBus 6.

I'll admit it: I'm a huge fan of The Hitchhiker's Guide to the Galaxy. I've voted for Zaphod Beeblebrox in more than one election and had a cat in college named The Ravenous Bugblatter Beast (Rav, for short). True to her name, she was an adept and ruthless hunter of cockroaches.

One of the more clever beasts in the book is the Babel fish, a small leech-like creature that, when inserted into your ear, allows you to communicate with any other creature in the universe in any language. It feeds on brain wave energy, and no matter what language is spoken to you, it is automatically converted into one you understand.

Fish Bowl Fish Tank Aquarium Goldfish Jump by audiencestack.com, reused via CC by 2.0 license

If you've done any significant amount of systems integration in your career, this may sound familiar. As your system grows and integrates with more and more applications, it needs to understand more and more formats. Initially, it understands JSON natively. Later, through a merger, it has to accept input from an accounting system that exports only in XML. Then the company purchases an HR system that speaks some proprietary format: say, the binary language of moisture vaporators.

Adding support for new formats to a system can be a challenge as time goes on. If you don't plan ahead, you can end up with a lot of custom code sprinkled throughout your application, increasing the cost of maintenance.

So with NServiceBus 6, we set out to make this process easier.

Building a better Babel fish

In NServiceBus 6, endpoints can be easily configured to support multiple message deserializers. In this example, we register both a JSON and an XML deserializer on an endpoint:

var salesEndpointConfiguration = new EndpointConfiguration("SalesEndpoint");

// setup the primary serializer and deserializer
salesEndpointConfiguration.UseSerialization<JsonSerializer>();

// setup additional deserializer
salesEndpointConfiguration.AddDeserializer<XmlSerializer>();

With this configuration, an endpoint can receive messages in either JSON or XML format and they will get processed in the same way.

The support for multiple message deserializers in NServiceBus 6 is really just an expression of the robustness principle, also known as Postel's Law: Be conservative in what you send, be liberal in what you accept. In the example above, our endpoint can accept messages serialized either with JSON or XML. As you can see, we could easily add support for additional formats using endpointConfiguration.AddDeserializer<T>().

How does it work?

Each message serializer contains an identifying value called Content-Type. When messages are serialized, the Content-Type is included with the message so that receiving endpoints can identify how a message was serialized. By registering a deserializer on our receiving endpoints, the Content-Type of each incoming message is matched to the correct deserializer for that message.

By default, .UseSerialization<T>() will configure both a serializer and a deserializer for that NServiceBus endpoint. Enabling additional deserializers for an endpoint is done via the AddDeserializer<T>() method on the EndpointConfiguration, as we saw above. This method can be called multiple times---once for each additional deserializer you want to add. Presto! Instant Babel fish.

In NServiceBus 6, support for multiple deserializers is a first class citizen, but you can still make it work in NServiceBus 5 by customizing the message pipeline. A full code sample showing how to do this is available in our documentation.

The sample also shows how to specify a serializer based on the message type. This can be useful in scenarios where you have no control over the format that is accepted by an external system.

Other formats

While we've highlighted the built-in XML and JSON deserializers because these formats are the most common, there are several others you could use, depending on the needs of your project. Protocol Buffers, ZeroFormatter, and Message Pack are three that are focused on speed. These are external deserializers created by the community, and they allow you to quickly add support for one of these formats with only a call to AddDeserializer<T>().

For cases where none of the built-in serializers meet your exact requirements, such as a proprietary binary format, you can even implement your own custom serializer. Check out our customer serializer sample for more information.

Summary

NServiceBus 6 simplifies support for multiple deserialization types. Come and see how easy it is to put a Babel fish in your ear - Download NServiceBus 6 now.

About the authors: Hadi Eskandari and Kyle Baley are engineers at Particular Software. One of them is an avid photographer and the other likes playing show tunes on the piano, but they aren't saying which is which.

Welcome back. Over the last few months, the Particular Slack channels have been awash with interesting links, thoughts, and blog posts. Here are the ones that bubbled to the surface.

Building in resilience

In the first post in a series on refactoring towards resilience, Jimmy Bogard looks at an existing application and challenges us to think about how it should deal with failures.

Long-running tasks, transaction timeouts, and messaging

Processing long-running jobs in a manner that provides meaningful user feedback is a challenging task. Mauro Servienti explains how to handle a long-running OCR process using a messaging architecture and SignalR.

The end of Enterprise IT

Organizational structures are a common discussion point here at Particular Software. Mary Poppendieck explains how ING Netherlands transformed their structure to be flatter and more agile.

Words matter

Creating software is an act that encompasses much more than just writing code. Weronika Łabaj points out that code just might be the easiest part of our jobs.

People play a big part

While having lunch one day, David Boike had an interaction that led to some interesting thoughts about the business bits that surround an app.

There are limits

We set limits all the time: Speed limits, work-in-progress (WIP) limits, circuit breaker thresholds, etc. Tyler Treat talks about explicitly declaring limits in your software and why that is especially important in distributed systems.

Commute optional

Have you ever wondered what a remote workday at Particular looks like? Donald Belcham walks you through what his day is like, the tools he uses, and the problems he faces.

Think of the children!

The engineers at Particular Software have talked about teaching our offspring to program on numerous occasions. We revisited the topic when someone pointed out this game-based approach. Honestly, it was for the kids...we swear.

Async always surprises

Just when you think you've learned everything there is to know about async/await in .NET, something will jump out and surprise you. Szymon Kulec had this happen to him, and he reflects on how he missed it.

Events in your system

Outside of UI interactions, many developers struggle with using events in their code. Cesar de la Torre takes a look at domain and integration events and how they relate to Domain Driven Design and microservices.

You want us to pay how much?

Software pricing is a dark art. The folks at Reify talk about some of the things that they consider to be contributors to the final price tag that you see.

See a need, fill a need

Most OSS projects are started because people can't find a solution to a problem. Hadi Eskandari created this one to provide better support for Farsi in WinForms and WPF software.

Building a foundation

Confused about what a service bus really is? Let Dennis van der Stelt give you a primer of the fundamentals of a service bus architecture.

— The team, in Particular

If you're using Azure Storage Persistence and haven't upgraded to NServiceBus 6 yet, get ready for a tremendous performance boost for your application when you do---especially if you make use of sagas. In the previous version of Azure Storage Persistence, looking up a saga by a correlation property was not as fast as looking it up by SagaId. In the new version, both the correlation property and SagaId are indexed, so retrieving a saga is much quicker regardless of whether it's looked up by SagaId or the correlation property.

If you're the TL;DR type, you can stop here and head over to the NServiceBus 6 page to try it out. But if you're interested in how we did it, read on!

Azure Table Storage, where saga data is stored, is limited to indexing on two columns: the Partition Key and the Row Key. To boost the performance of saga retrieval, we now create two rows in the table for each saga entry rather than one. One row is indexed on the SagaId and contains all of the saga data. The second row is indexed on the correlation property and contains only the SagaId. To retrieve saga data using the correlation property, we first look up the row based on the correlation property, which gives us the SagaId. Then we look up the saga data directly, based on the retrieved SagaId.

Here's how it might look in practice:

Secondary indices in Azure Storage Persistence

In this example, the first two rows represent a saga with a SagaId of 9bc38..., and the correlation property is the telephone number. To look up the saga based on a specific phone number, we search for the row with Partition Key = 555.1234, which gives us a SagaId of 9bc38.... Now we can look up the full saga data using this ID and querying where Partition Key = 9bc38.... Both lookups are very fast since they are retrieved using the natively indexed Partition Key values.

This secondary index pattern provides great performance for our saga persistence. The amount of improvement depends on how much data your persister is managing at one time, but suffice it to say that the improvement was so good that we implemented it for Timeout lookups too.

So if you want to experience these performance boosts in your own system, come download NServiceBus 6.

Async/await

Clearer and more focused API

Azure

Azure Storage Queues

Azure Service Bus

Azure Storage Persistence

MSMQ

RabbitMQ

SQL Server Transport

Try it out!

Failed message grouping

Very large messages

More focused tooling

Stack trace coloring

Better 3rd party integration

Streamlining ServiceInsight message list

Usability improvements

Summary

On the use of tin can phones in primitive culture

The network is not reliable

Solutions

Asynchronous messaging

Abandon request/response

Summary

Visualizing NServiceBus Systems

What does it look like?

How does it work?

In closing

Footnotes

On the efficacy of messages in bottles

Latency is not zero

What is latency?

Ignore at your own risk

Careful with objects

Solutions

Accounting for geography

Go big or don't go

Harness the power of events

Summary

Mapping messages to saga data

Being [Unique]

Auto-population

Summary

Footnotes

Handlers

Moving off IBus

Endpoint configuration

Impact on Dependency Injection

Testing within a context

Summary

Footnotes

V5: Threads

V6: Tasks

Results

Summary

Footnotes

Null doesn't deliver on its promise

Promise and deliver!

Avoiding null

Summary

Footnotes

Anatomy of a Distributor

Removing the bottleneck

Summary

Footnotes / Further Reading

Treat warnings as errors

ConfigureAwait

Never forget

Set it and forget it

Optimize async usage

Summary

Yeah, why not?

Steps in the right direction

What's left to do

MSMQ Transport

Other dependencies

It's not just about tech

Summary

Getting the most from Azure Service Bus

Faster receiving

Being `[Unique]`

Moving off `IBus`