Particular Software

When you start to accumulate home theater components to go with your TV, you end up with a bunch of remote controls. One remote for the TV, one for the receiver, and one for the DVD/Blu-Ray player. One for a cable box, satellite, or set-top streaming box. Maybe you've got a multiple-disc CD player, or even a turntable. (Vinyl is making a comeback!) Maybe you've even still got a VHS?…no I'm kidding, nobody has those anymore.

The solution, rather than dealing with a stack of remotes, is to get one universal remote to rule them all. But once you do, you find that not all components are created equal, at least when it comes to the infrared codes (the actual signals sent by the remote control) that they accept.

A cheap A/V receiver might accept codes like PowerToggle or NextInput. These make it nearly impossible to reliably turn the unit on and set the right input, especially if someone has messed with the receiver manually.

A good receiver will have additional codes for PowerOn and PowerOff, even though those buttons don’t exist on the remote, and will also have an independent code for each input, like InputCable, InputSatellite, and InputBluRay.

This is the essence of idempotence. No matter how many times your universal remote sends the PowerOn and InputBluRay commands, the receiver will still be turned on and ready to watch Forrest Gump. PowerToggle and InputNext, however, are not idempotent. If you repeat those commands multiple times, the component will be left in a different (and unknown) state each time.

So what?

The concept of idempotence is extremely important in distributed systems because it’s hard to get really strong guarantees over how many times a command will be invoked or processed.

Because networks are fundamentally unreliable, most distributed systems cannot guarantee exactly-once message delivery or processing, even when using a message broker like RabbitMQ or SQS. Most brokers offer at-least-once delivery, relying on the logic to retry processing as many times as necessary until it acknowledges that message processing is complete.

That means if a message fails to process for any reason, it’s going to be retried. Let's say we have a message handler like the A/V receiver above. If the message is unambiguous like InputBluRay then it's fairly easy to write code that will handle it as the user intended, no matter how many times the message is reprocessed. On the other hand, if the message is InputNext, it can be very difficult to write logic to fulfill the user intent under conditions of unknown numbers of retries.

In short, if every single message handler in our system is idempotent, we can retry any message as many times as we want, and it won’t affect the overall correctness of the system.

That sounds great, so why don’t we do that?

Idempotence is hard

Imagine you need to do a fairly simple operation: create a new user in the database, and then publish a UserCreated event to let other parts of the system know what happened.

Seems simple enough, let’s try some pseudocode:

Handle(CreateUser message){  DB.Store(new User());  Bus.Publish(new UserCreated());}

This looks good in theory, but what if your message broker doesn’t support any form of transaction? (Spoiler alert: most don’t!) If a failure occurs between these two lines of code, then the database record will be created, but the UserCreated message won't be published. When the message is retried, a new database record will be written, and then the message will be published.

These extra zombie records are created in the database, most of the time duplicating valid records, without any message ever going out to the rest of the system. It can be difficult to even notice this happening, and even more difficult to clean up the mess later on.

So this should be easy to fix, right? Let's just flip the order of the statements to fix our zombie record problem:

Handle(CreateUser message){    Bus.Publish(new UserCreated());    DB.Store(new User());}

Now we've got the inverse problem. If something bad happens after the publish but before the database call, we produce a ghost message, an announcement to the rest of our system about an event that never really happened. If anyone tries to look up that User, they won't find it, because it was never created. But the rest of the downstream processes continue to run on that message, perhaps even billing credit cards but failing to actually ship orders!

If you believe transactions will save you, think again. Wrapping the entire message handler in a database transaction only reorders all of the database operations to the end of the process, where the Commit() occurs. Effectively, a database transaction will turn the first code snippet (with the database first) into the second snippet (with the database last) when it executes.

When designing a reliable system you want to think in terms of what would happen if, on any given line of code, someone pulled out the server’s power cable. There are three operations at play here–receiving the incoming message, the database operation, and sending the outgoing message–and due to the lack of a common transaction it's very difficult to code this without the possibility for zombie records, ghost messages, and other gotchas.

If only there was a better way?

The Outbox pattern

What we need is database-like consistency between our messaging operations (both consuming the incoming message, and sending outgoing messages) and our changes to business data in our database. Using the Outbox pattern, we can piggyback on a local database transaction to do just that, and turn the message broker's at-least-once delivery guarantee into an exactly-once processing guarantee.

To implement the outbox pattern, the message handling logic is divided into two phases: the message handler phase, and the dispatch phase.

During the message handler phase, we don't immediately dispatch outgoing messages to the message broker, instead we store them in memory until the end of the message handler. At that point, we store all accumulated outgoing messages to a database table using the same transaction as for our business data, using the MessageId as the primary key.

insert into OutboxData (MessageId, TransportOperations)values (@MessageId, @OutgoingMessagesJson)

This data is committed to the database in the very same transaction as the business data. This concludes the message handler phase.

Next, in the dispatch phase, all the outgoing messages stored in the Outbox data are physically sent to the message broker. If all goes well, the outgoing messages are sent out and the incoming message is consumed, and all is well. However, it's still possible for a problem to occur at this point, and for only some of the messages to be dispatched, forcing us to try again.

This can actually generate duplicate messages, but this is by design.

The Outbox pattern is paired with an Inbox, so that when any duplicate message is processed (or a message that fails in the dispatch phase is retried) the Outbox data is retrieved from the database first. If it exists, that means the message has already been successfully processed, and we should skip over the message handling phase and proceed directly to the dispatch phase. If the message happens to be a duplicate, and the outgoing messages have already been dispatched, then the dispatch phase can be skipped over as well.

Expressed in pseudocode, the entire Outbox+Inbox process looks like this:

var message = PeekMessage();// Check for deduplication datavar outbox = DB.GetOutboxData(message.Id);// Message Handler Phaseif(outbox == null){  using(var transaction = DB.StartTransaction())  {    var transportOperations = ExecuteMessageHandler(message);    outbox = new OutboxData(message.Id, transportOperations);    DB.StoreOutboxData(outbox);    transaction.Commit();  }}// Dispatch Phaseif(!outbox.IsDispatched){  Bus.DispatchMessages(outbox.TransportOperations);  DB.SetOutboxAsDispatched(message.Id);}AckMessage(message);

Using this pattern, we get idempotence on the handling side of the equation, when you can tell a duplicate just by looking at the MessageId. After all, if you actually pressed PowerToggle multiple times, it would be more like sending duplicate messages but with different MessageId values. In truth, an operation like PowerToggle is inherently not idempotent, and there's nothing the infrastructure can do to help with that, but that's a topic for another post.

Summary

Idempotence is an important attribute of distributed systems but can be tricky to implement reliably. The bugs that crop up as a result of doing it wrong are often easy to overlook, and then difficult to diagnose, appearing as if the result of race conditions impossible to reproduce under any sort of controlled conditions.

It's much easier to utilize infrastructure like the Outbox that can take advantage of the local database transaction already in use for storing business data, and use that transaction to build consistency between incoming/outgoing messaging operations and the business data being stored in the database.

If you're interested in taking a look at this yourself, check out our Using Outbox with RabbitMQ sample, which shows how to get exactly-once message processing using RabbitMQ for the message queuing infrastructure, and SQL Server for the business data storage. Don't worry if you don't have RabbitMQ or SQL Server installed–the sample includes a docker-compose file with instructions so that you can run all the dependencies in Docker containers.

About the author: David Boike is a developer at Particular Software who refuses to juggle multiple remote controls.

There are a myriad of security-obsessed organizations scattered throughout the world that take security concerns to the verge of paranoia.

In one such organization I've heard of, there existed two separate networks. Everyone had two computers without external disk drives of any kind. Inserting a USB drive would not work, and trying to use one would instantly alert the sysadmins that a workstation was compromised. To get data from a different network, you needed to browse in a separate room, as workstations did not have access to the Internet.

Once you found the data you needed, you could download it to a floppy disk and then hand the floppy over to a sysop. The sysop would copy the contents to a mirror folder, which would analyze the contents with every virus scanner imaginable before mirroring them to the development network. But that sync only occurred once per hour.

Paranoid? Maybe. If you're just selling widgets on a website, then probably. But if your organization is working on defense contracts or controls critical infrastructure like electrical grids, perhaps the paranoia is justified.

The only truly secure computer is one that is disconnected from any and all networks, turned off, buried in the ground, and encased in concrete. But that computer isn't terribly useful.

From Udi Dahan's Advanced Distributed Systems Design Course, Day 1

Solutions

Unfortunately, security tends to be one of those areas that goes completely overlooked until it's too late.

Check it twice

We have best practices and checklists, but all too often, these go unfollowed. The OSWASP Top 10 shows us what some of the most common threats are and how to mitigate them. It's very sad that, in this modern era, the vulnerability at the top of the list is a simple injection attack, which is trivially avoided using parameterized database queries.

The bigger problem is that our understanding of the types of threats we are under hasn't evolved very much. By their very nature, our systems are fundamentally exposed, and we start from a position of weakness. Some major attack will hit the news, and then we'll update our best practices to deal with that threat, but it's impossible to be prepared for everything. The attacks will come. It's not a matter of if but a matter of when, and we simply don't understand the amount of computing power in the hands of attackers.

Because of this, the conversation needs to change. We can guarantee that specific security holes will be plugged, but we can't guarantee 100% security.

What have we got to lose?

We need to perform a threat model analysis on our system. What are the possible consequences of a breach? We could be talking about losing competitive advantage, being sued and losing our reputation, having sensitive financial or health data leaked — or we could be talking about hackers seeing our pictures of cats. If someone were to get access to our data, how much would it be worth to them, and how much would the loss mean to us? Based on that, what is their incentive to attempt to hack us? What resources can they marshal to perform that attack? How much will it cost to protect ourselves against it?

After we analyze the threat, we need to bring this to the attention of business stakeholders. Include the public relations department. When the system is breached, who is going to talk to customers? Who is going to talk to the press? What are they going to say? Bring legal in as well to help determine the ramifications of exposure under these circumstances. Perhaps the end-user license agreement (EULA) can be modified to mitigate some damage.

Hopefully, an attack will never happen, but it's good to be prepared.

Attackers won't hesitate to spend thousands of dollars to breach a system if they stand to gain millions. That's just simple economics. But they may not even have to. With LinkedIn, it's easy to find out who your organization's database administrators (DBAs) are. An attack could be as simple as applying social pressure to a DBA to get them to "misplace" a database backup. If that occurred, would you even notice?

Ultimately, ensuring a high level of security is a large cost, and it's all about tradeoffs. We need to have an honest conversation about what those tradeoffs are.

Summary

Security in our software. It's hard, it's expensive, and it's complicated by the fact that attackers are always one step ahead of us. Attacks will occur. Sometimes we will win the day. Sometimes we will not be so lucky. Breaches will happen, despite our best efforts.

While we need to do our best to follow our industry best practices for security, we also need to have conversations with business, public relations, and legal teams so that the risks are well understood and to ensure that a plan is in place if a breach occurs.

Because it's not a matter of if, it's a matter of when.

About the author: David Boike is a developer at Particular Software who sometimes forgets to lock his car doors when he parks on the driveway.

Multi-tenant systems are a popular way to use the same codebase to provide services to different customers while minimizing the effect they have on each other. In a distributed message-based system you need to partition customer information and segregate messages from different customers as well. Additionally, you have to make sure different system components are tenant-aware.

In NServiceBus SQL Persistence 4.6, we have added new features that make it a bit easier to create multi-tenant systems. Let's see how it all works.

Flowing tenant identifier

The basic building block of a multi-tenant system is the notion of a Tenant Identifier, which is passed from message to message so that it flows through an entire conversation of messages. Once you flow this TenantId, you know the active tenant in the context of the running code.

In a distributed message-based system, you begin when the conversation or the workflow starts. Usually, when the first message that initiates a conversation is sent, you know who is triggering it, either via an external endpoint, or the data available at runtime. That allows setting the TenantId as a message header. We use a message header for this purpose because the tenant identifier is not part of the message itself, but an infrastructure concern.

As an example, when sending a message from a WebAPI endpoint, the code would look something like the following snippet. Note that there’s no tenant information in the message body:

var options = new SendOptions();options.SetHeader("tenant_id", tenantId);await endpointInstance.Send(new PlaceOrder(), options)    .ConfigureAwait(false);

From this point on, you can access the header and read back the TenantId further down in your message handling pipeline. On systems that rely minimally on the tenant information, doing this manually might be enough. However, the better approach is to automate it with a pair of incoming and outgoing behaviors that propagate the TenantId from an incoming message to any outgoing messages sent by the message handler.

As long as we have these infrastructure behaviors, we have seamless access to the tenant information throughout the system. The message processing pipeline with automated behaviors would look like the following diagram:

Message Processing Pipeline

Tenant data isolation

When hosting multiple tenants, proper data isolation is a key ingredient. One common (but problematic) approach is to store everything in one single database with a discriminator column (like CustomerId) to separate tenant-specific data. This method can be dangerous: it is not possible to treat important customers differently–for example to move them to a faster database–even if the business demands it.

A better approach is to have a dedicated database for each tenant, which provides ideal isolation and allows fine-tuning each database separately.

The new version of SQL Persistence in NServiceBus now allows you to set up an automatic mapping from the tenant identifier in the message header to the tenant database connection string, so that the database-per-tenant pattern can be achieved. In this mode, the SQL Persistence stores saga and outbox data in the same tenant database so they are also isolated.

Setup is straightforward, requiring a factory like this:

var persistence = endpointConfiguration.UsePersistence<SqlPersistence>();persistence.MultiTenantConnectionBuilder(tenantIdHeaderName: "tenant_id",    buildConnectionFromTenantData: tenantId =>    {        var connection = $@"Data Source=.\SqlExpress;Initial Catalog=DatabaseForTenant_{tenantId};Integrated Security=True";        return new SqlConnection(connection);    });

There are also more complex ways to construct a tenant connection, such as using multiple headers to determine the tenant, which are covered in the documentation.

Tenant-aware components

Good data isolation, along with tenant information flowing automatically through all messages, makes a very good foundation for a multi-tenant system, but we can do even more. We can piggyback on the incoming/outgoing behaviors and inject tenant-aware components into the system so we don't need to litter our code with lots of conditional tenant-based checks.

Imagine a system where you have different business rules for each tenant. Trying to embed those rules directly in message handlers would make for some messy, hard-to-maintain code:

class PlaceOrderHandler : IHandleMessages<PlaceOrder>{    public async Task Handle(PlaceOrder message, IMessageHandlerContext context)    {        var tenant = context.MessageHeaders["tenant_id"];        var tenantRules = null;        switch(tenant)         {            case "TenantA":                tenantRules = LoadRulesForTenantA();                break;            case "TenantB":                tenantRules = LoadRulesForTenantB();                break;            default:                throw new NotImplementedException("Rules for " + tenant + " was not implemented");        }                if (tenantRules.ShouldDispatchOrderImmediately())        {            await context.Send(new DispatchOrder())                .ConfigureAwait(false);        }    }}

A better approach is to separate the concerns where your infrastructure code does the plumbing work and your handler is centered around the business logic. This can be achieved using a behavior in the message processing pipeline, similar to the ones mentioned above. The behavior would set up a tenant-specific component for use while processing a message, based on the TenantId that message contains. The message handler above after the refactoring would look like this:

class PlaceOrderHandler : IHandleMessages<PlaceOrder>{    ITenantRuleComponent tenantRuleComponent;    public MyMessageHandler(ITenantRuleComponent tenantRuleComponent)    {        // Injected component knows the TenantId in the message being processed        this.tenantRuleComponent = tenantRuleComponent;    }    public async Task Handle(PlaceOrder message, IMessageHandlerContext context)    {        if (tenantRuleComponent.ShouldDispatchOrderImmediately())        {            await context.Send(new FurtherMessages())                .ConfigureAwait(false);        }    }}

Our Injecting tenant-aware components to message handlers sample explains this approach in more detail.

Summary

Developing effective multi-tenant systems is hard. You must distinguish which messages belong to which tenant, determine what data belongs to whom, and always be aware of the current tenant when executing code. The new 4.6 version of SQL Persistence in NServiceBus along with the multi-tenant samples we've created should make your multi-tenant systems development just that little bit easier.

About the author: Hadi Eskandari is a developer at Particular Software who dreams of web-scale system architecture.

Did you ever look at a failed message in ServicePulse and think "I wish I could just edit this quickly and dump it back in the queue"? Yes, you should follow best practices and investigate the root cause and make sure the error doesn't come back, but the real world moves faster than that and sometimes you just need to get &#@%! done and figure it out later.

Now you can. Using ServicePulse 1.21.0+ and ServiceControl 4.1.0+, you can edit the headers and body of a failed message and quickly get back on track.

How it works

When a message fails and you want to edit it, you can open the failed message details in ServicePulse via the Failed Message Group page or from the All Failed Messages page. Once you’re browsing the failed message details page, click the Edit Message button in the toolbar and modify what you need.

Editor screenshot

Some headers are locked and cannot be modified because doing so will break your system, or because there’s no need to edit them and it’s best to guarantee consistency with the rest of the system. ServicePulse will also warn you when editing certain headers that you may need to edit but risk causing unexpected behavior. There are some additional limitations to editing and retrying messages in ServicePulse, so we encourage you to read the docs.

When you’re done and retry the edited message, the original message is marked as processed, and a new message (with the modified contents) is created and queued for processing.

Accessing the feature

At the moment, this feature is experimental. Editing messages can be dangerous, so we want to make sure we have a good understanding of why customers need it, and the use cases they are facing, so that we can refine the feature further.

As a result, we've designed this feature to be disabled by default for now. If you'd like to use it, please contact us so that we can understand how the feature is going to be used.

Summary

We know you'd prefer to run your system according to best practices all the time, but we also know that real life isn't always so easy. Sometimes it's not worth the effort to truly fix something that was a one-off situation in the first place and should never happen again anyway.

Now, the ability to edit messages in ServicePulse before retrying them will give you the flexibility to do that.

This is only the first step. We're looking forward to talking with you when you contact us to enable the feature in your systems, so that we can learn from your use cases, which will enable us to extend this feature to be even more useful to you in the future.

If you're using (or considering) MongoDB with NServiceBus, we've got good news: we've just released our official MongoDB persister. This replaces the previous community-created (but now abandoned) MongoDB persistence options and is now a fully supported part of the Particular Service Platform.

As NoSQL databases have gained in popularity, MongoDB has emerged as one of the go-to choices in the field. MongoDB databases provide high availability and scalability, features which are near and dear to our hearts here at Particular, so it was an easy decision to provide a MongoDB package. As more and more people have embraced MongoDB, we felt it was important to provide an officially supported option for our customers.

With the new package, you can now use MongoDB with NServiceBus version 7. The package supports sagas, subscriptions, and the outbox for all message transports. Full details are available in our MongoDB documentation.

Nice work making the process super easy! The documentation is quite clear about how to make use of the persistence, as well as how to perform a migration from the community NServiceBus.MongoDB package. Billy Wolfington, Senior Software Engineer

Migrating from the community packages

There are two existing MongoDB packages, both of which work only for NServiceBus version 6. Luckily, migrating from either one is straightforward:

For the NServiceBus.Persistence.MongoDB package originally created by Ryan Hoffman, we have an upgrade guide and a video outlining the necessary steps.
For the NServiceBus.MongoDB package originally created by Carlos Sandoval, we also have an upgrade guide so you're covered either way.

The existing community packages are still available, but they are no longer actively developed. We strongly encourage everyone to migrate to NServiceBus version 7 and the official package to take advantage of the new features and our support.

It has been completely seamless for our existing micro-services to migrate to the new mongodb persistence package and “just work”. This was a huge concern for us with the amount of micro-services we have running in production using the legacy package. In addition, we now have first class support for transactions and full confidence in the maintainability of the package itself Matt Biddle, Lead Software Engineer
eMarketer

Summary

If you've been on the fence about using MongoDB with NServiceBus, now's the time to take a look. The new package is fully supported, complete with documentation and samples.

Get the official NServiceBus.Storage.MongoDB package today and let us know what you think.

When it starts to get colder outside I start to think about soup. I love soup, especially chili. But I don't want any of that watery gunk that's just tomato soup with a few lonely beans floating in there somewhere. No sir! I want it thick and chunky. Load it up with ground meat, beans, onions, tomatoes, cheese, green peppers, jalepeños, pineapple–it's all good!

Just like with chili, we sometimes see code that feels kind of "thick and chunky." It's got validation, logging, exception handling, database communication, business logic, and so much more. But unlike chili, the result does not taste good.

We see this kind of bloated, muddled code all over the place, regardless of what language or framework is being used, and NServiceBus is no exception. Here's an example where someone has stuffed an NServiceBus message handler full to the breaking point:

public class UserCreator : IHandleMessages<CreateUser>{  MySessionProvider sessionProvider;  ILog log = LogManager.GetLogger(typeof(UserCreator));  public UserCreator(MySessionProvider sessionProvider)  {    this.sessionProvider = sessionProvider;  }  public async Task Handle(CreateUser message, IMessageHandlerContext context)  {    log.Info($"Starting message handler: {nameof(UserCreator)}");    var stopwatch = Stopwatch.StartNew();    var signature = context.MessageHeaders["MessageSignature.SHA256"];    if(!Utilities.ValidateMessageSignature(message, signature))    {      throw new MessageSignatureValidationException(message);    }    using (var session = await sessionProvider.Open())    {      try      {        await session.Store(new User());        await context.Publish(new UserCreated());        await session.Commit();      }      catch(Exception)      {        await session.Rollback();        throw;      }      finally      {        var elapsedMilliseconds = stopwatch.ElapsedMilliseconds;        log.Info($"Message handler UserCreator complete: {elapsedMilliseconds}ms taken");      }    }  }}

Yikes! There must be a better way to go about this. Most frameworks have a way to counter this kind of bloat, pushing infrastructure concerns into separate components that can be reused.

Let's take a closer look at the problem and see what we can do about it.

Single responsiblity

That code had a lot going on. Let's break down everything happening in that block of code:

A session provider is injected from a dependency injection container. This is used to create a database session object implementing a Unit of Work pattern.
A logger and stopwatch are set up to log the beginning and end of the method execution, as well as how long the handler took to execute.
An SHA256 signature is retrieved from a message header and validated.
A user is created in the database, and an event is published to announce that action.

The 4th item, creating a user, is the whole point of this message handler, but it's buried in a jumble of infrastructure soup! It looks like somebody was trying to treat this code like chili and throw all the stuff into the message handler in the hopes that it would taste good.

But when it comes to code, I want it to be simple, bland, and easy to digest, like this:

public class BetterUserCreator : IHandleMessages<CreateUser>{  public async Task Handle(CreateUser message, IMessageHandlerContext context)  {    await context.Store(new User());    await context.Publish(new UserCreated());  }}

This is the point of the single responsibility principle, making it so that code is much more understandable and easier to maintain. Any developer can look at this code and instantly see what's going on. Developers don't need a checklist each time they create a new message handler, and can focus only on the business task at hand.

So how do we get there?

Where infrastructure belongs

If this code were in an ASP.NET Core app, we could use a Filter to separate our infrastructure concerns. In classic ASP.NET MVC, it was called an Action Filter. In Express for Node.js, it's called middleware.

In NServiceBus systems, we can bundle infrastructure into pipeline behaviors for reuse throughout the system. Pipeline behaviors are deceptively simple yet extremely powerful.

Let's look at a simple behavior that implements the logging and stopwatch infrastructure concerns from the code snippet in the introduction:

public class LoggingBehavior : Behavior<IInvokeHandlerContext>{  ILog log = LogManager.GetLogger(typeof(LoggingBehavior));  public override Task Invoke(IInvokeHandlerContext context, Func<Task> next)  {    log.Info($"Starting message handler: {context.MessageHandler.HandlerType.FullName}");    var stopwatch = Stopwatch.StartNew();    try    {      // Invokes the rest of the message handling pipeline      return next();    }    finally    {      var elapsedMilliseconds = stopwatch.ElapsedMilliseconds;      log.Info($"Message handler {context.MessageHandler.HandlerType.FullName} complete: {elapsedMilliseconds}ms taken");    }  }}

A behavior will first identify a pipeline stage to act upon, in this case, IInvokeHandlerContext. Many different stages are available just on the incoming message pipeline:

ITransportReceiveContext: To control the very beginning and end of the message processing operation.
IIncomingPhysicalMessageContext: To act on the raw message body before it is deserialized. This is the ideal place to implement the SHA256 signature validation.
IIncomingLogicalMessageContext: To act after the message has been deserialized. This is the ideal place to implement the database session and unit of work requirements.
IInvokeHandlerContext: To act once per message handler, with information available about the handler, like the logging handler above.

The magic comes with the call to return next(); which instructs NServiceBus to call the remaining pipeline behaviors. At runtime, all the behaviors will be compiled down into a single method using compiled expression trees so it executes just as fast as if someone had taken the time to hand-craft a single, monolithic RunPipeline() method.

That's sufficient for simple behaviors, but behaviors are also able to share data through the context parameter, enabling you to do just about anything to modify and customize how NServiceBus handles messages.

Home grown

The flexibility offered by behaviors is exactly the reason why there are certain features that NServiceBus does not try to implement on purpose. This is because the complexity required to create a one-size-fits-all solution for everybody is immense compared to the simplicity of a few lines of infrastructure code in a behavior.

Consider the questions that would need to be answered before implementing a full-message encryption feature:

Which ciphers will be supported?
Where will keys be stored?
How will keys be loaded?
Will certificates be supported in addition to keys?
How many keys will be allowed at the same time?
How will keys be rotated given that in-flight messages may encrypted with an out-of-date key?

Trying to provide a universal solution that would cover all the answers to the above questions would result in a complex beast of an implementation. Luckily, developers can create their own custom encryption and decryption behaviors addressing only the specific data security policies of their own organization with just a few lines of code–a much more sustainable approach, all things considered.

Summary

Code and soup are not the same thing. All your code, and your message handlers in particular, should not be complex and flavorful like a five-layer brisket chili. Quite the opposite, message handler code should be as simple and bland as canned chicken broth…the low-sodium kind. After all, your message handler is but one raw ingredient. Combine it with a few infrastructure behaviors specifically tailored to your environment, plus all the other message handlers in your system–that's how you create a rich, flavorful meal.

If you're interested in the infrastructure bits I teased in this article, we have samples available showing how to implement all of them:

The Add handler timing pipeline sample shows how to implement the logging and timing, in the case of the sample, logging a warning when a handler takes longer than 500ms to process.
The Unit of work using the pipeline sample shows how to implement a database session and unit of work.
The Message signing using the pipeline sample shows how to implement cryptographic signing of outgoing messages with signature validation on incoming messages. The same pattern could be extended to implement symmetric encryption of the entire message body.

About the author: David Boike is a developer at Particular Software who is still learning when to stop adding spicy peppers to his chili, much to his wife's chagrin.

It's easy for something that started out simply to become much more complicated as time wears on. I once had a client who started out with a very noncomplex server infrastructure. The hosting provider had given them ownership of an internal IP subnet, and so they started out with two load-balanced public web servers: X.X.X.100 and X.X.X.101 (Public100 and Public101 for short). They also had a third public web server, Public102, to host an FTP server and a couple random utility applications.

And then, despite the best laid plans, the slow creep of chaos eventually took over.

In an era before virtualized infrastructure made allocating additional server resources much easier, Public102 became somewhat of a "junk drawer" server. Like that drawer in your kitchen that contains the can opener, pot holders and an apple corer, the Public102 server continued to accumulate small, random tasks until it reached a breaking point. Public102 was nearly overloaded, and at the same time, it needed an OS upgrade because Windows 2000 was nearing the end of extended support.

It's easy enough to upgrade load-balanced web servers. You create a new server with the same software and add it to the load balancer. Once it's proven, you can remove the old one and decommission it. It's not so easy with a junk drawer server.

Decommissioning Public102 was an exercise in the mundane, gradually transitioning tiny service after tiny service to new homes over the course of weeks, as the development schedule allowed. It was made even more difficult by the discovery that, because of the public FTP server, several random jobs held configuration values (both hardcoded and in configuration files) that referred to Public102 by a UNC path containing the server's internal IP address.

When finally we had all the processes migrated, we celebrated as we decommissioned Public102. Unfortunately, the network operations had a cruel surprise for us. For a reason I fail to recall now, they needed to change the subnet that all of our servers occupied.

And so we started it all again.

From Udi Dahan's Advanced Distributed Systems Design Course, Day 1

Embrace change

The only constant in the universe is change. This maxim applies just as well to servers and networks as it does to the entirety of existence.

In any interesting business environment, we try to think in terms of server and network diagrams that do not change. But eventually, a server will go down and need to be replaced, or a server will move to a different subnet.

Even if server infrastructures are relatively static or changes are planned in advance, we can still get into trouble with changing network topology. Some protocols can run afoul of this changing topology — even something simple, like a wireless client disconnect.

For example, in WCF duplex communication (which thanks to not being included in .NET Core, appears to be on its way out) multiple clients connect to a server. The server creates a client proxy for each instance and holds it in a list. Whenever the server needs to communicate something to the client, it runs through the list of client proxies and sends information to each one.

If one of these clients is on a wireless connection that gets interrupted, the client proxy continues to exist. As activity continues to happen on the server, multiple threads can all become blocked attempting to contact the same client, waiting up to 30 seconds for it to time out.

This creates a miniature denial of service attack. One client can connect to a system, do a little bit of work, shut down unexpectedly, and then cause a 30 second disruption to all other clients.

Cloudy with a chance of containers

With the advent of cloud computing, deployment topologies have become even more subject to constant change. Cloud providers allow us to change server topology on a whim and to provision and deprovision servers as total system load changes.

Software container solutions such as Docker and Kubernetes, along with managed container hosting solutions such as Azure Kubernetes Service, Amazon's Elastic Container Service and Elastic Kubernetes Service, and Google Kubernetes Engine, allow applications to live within an isolated process not tied to any infrastructure, enabling them to run on any computer, on any infrastructure or in any cloud. This level of freedom enables deployment topology to change with almost reckless abandon.

When taking these technologies into account, it's clear that we must not only accept that network topology might change, but indeed, we need to plan ahead for it. Unless we enable our software infrastructure to adapt to a constantly changing network layout, we won't be able to take advantage of these new technologies and all the benefits they promise.

Solutions

The solution to hardcoded IP addresses is easy: don't do it! And for the sake of this discussion, let's assume that configuration files alone don't solve the problem. An IP address in a config file is still hardcoded; it's just hardcoded in XML instead of code.

Additionally, we need to think about how changes to topology, even minor changes like the disconnection of a wireless client, can affect the systems we're creating. When topology does change, will your system be able to maintain its response-time requirements?

Ultimately, this comes down to testing. Beyond just checking how your system behaves when everything is working, try turning things off. Make sure the system maintains the same response times. If I change where things are on the network, how quickly will that be discovered?

Some companies, like Netflix, will take this to the extreme. Netflix uses Chaos Monkey, a service that randomly terminates production systems so that they can ensure their systems are built to withstand disruptions.

Design systems that adjust to changing topology: See what Udi has to say in Day 1 of his Advanced Distributed Systems Design video course.

DevOps

Along with the rise of the cloud and containers, we have also seen the rise of DevOps, which is becoming a common part of the software development process at many organizations.

Some of the concepts of DevOps include infrastructure as code and the concept of throwaway infrastructure. With this level of deployment automation, you can recreate your entire infrastructure within minutes, then throw it away and recreate it on a whim.

These approaches, using tools like Octopus Deploy, Chef, Puppet, and Ansible, mean that addresses and ports for locating services become a variable in code that is determined whenever the infrastructure is deployed.

This alone makes a system much more resilient to change.

Service discovery

At their core, service discovery tools map a named service to an IP address and port, something that DNS is unfortunately unequipped to handle. In dynamic environments, especially in the cloud or when using software containers like Docker, a service discovery mechanism allows a client to reliably connect to a service even as that service redeploys to a new physical location.

Zookeeper is an open-source lock service, based on the Paxos consensus algorithm, which acts as a consistent and highly available key/value store, suitable for storing centralized system configuration and service directory information.

The Paxos consensus algorithm is decidedly non-trivial to implement, so more recently, Raft was published as a simpler alternative. The CoreOS etcd is an example of a Raft implementation, and Consul also builds on these ideas.

It's important to acknowledge how complex it is to properly implement the distributed consensus necessary in order to maintain a highly-available service discovery mechanism. The Paxos algorithm is insanely complex, and even though the Raft algorithm is simpler, it's no walk in the park.

It would be foolish to try implementing a service discovery scheme on your own when good options already exist, unless service discovery is specifically your business domain.

Summary

If you know there will be a potential problem with a technology, test for it in advance and see how it behaves. Even if you don't know there will be a problem, test anyway, early in your development cycle. Don't wait until your system is in production to blow up.

Even for smaller deployments, it's worth investigating DevOps practices of continuous delivery and infrastructure as code. This level of deployment automation gives you a competitive advantage and increases your overall agility as a software producer. Especially if you want to take advantage of cloud computing or software container technology, deployment automation is a must.

For the most complex and highly dynamic scenarios, service discovery tools can allow a system to keep running even when topology is changing every minute.

A network diagram on paper is a lie waiting to happen. The topology will change, and we need to be prepared for it.

About the author: David Boike is a developer at Particular Software who moved within the last year and has discovered that the postal service is the worst service discovery implementation of all.

We gather here today to mourn the passing of a dear friend. Microsoft Message Queuing, better known by its nickname MSMQ, passed away peacefully in its hometown of Redmond, Washington on October 14, 2019, at the age of 22. It was born in May 1997 and through 6.3 versions lived a very full life, bringing the promise of reliable messaging patterns to users all around the globe. It is preceded in death by Windows Communication Foundation and Windows Workflow Foundation, and survived by Azure Queues and Azure Service Bus. It will be greatly missed.

Ok, but seriously…

So MSMQ isn't dead as a doornail dead. As a Windows component, MSMQ is technically "supported" as long as it's carried by a supported version of Windows. Since it exists in Windows 10 and Windows Server 2019, MSMQ will continue to live on until at least 2029—and much longer assuming it isn't removed from future versions of Windows. The System.Messaging namespace lives on in .NET Framework 4.8 (which will be its last release ever, being completely supplanted by .NET 5), so likewise that is likely to be supported for decades to come.

But make no mistake. For all practical purposes, MSMQ is dead.

The real cause of MSMQ's demise is .NET Core. On October 14, 2019, Microsoft announced that .NET Core 3.0 concludes the .NET Framework API porting project, and closed the issue for adding .NET Core support for the System.Messaging namespace. But even before the official announcement, Damian Edwards basically said as much at an NDC Oslo panel discussion on the future of .NET in June 2019.

According to Microsoft, .NET Core is the future of .NET:

New applications should be built on .NET Core. .NET Core is where future investments in .NET will happen. Existing applications are safe to remain on .NET Framework which will be supported. Existing applications that want to take advantage of the new features in .NET should consider moving to .NET Core. As we plan into the future, we will be bringing in even more capabilities to the platform.

So as a company dedicated to making our customers better at building, maintaining, and running complex software systems, it's basically impossible for us to recommend building a new system using MSMQ.

So what do I need to do?

What if you have a current system already running on MSMQ? What do you do then? First of all…

Don't Panic

If you're comfortable with your system continuing to run on .NET Framework, then you don't really need to do anything. Microsoft will continue to support MSMQ for years and years, as long as you don't care about any of the improvements promised in .NET 5 and beyond.

If you do want to stay with the times, and your MSMQ system happens to run on NServiceBus, you're already a bit ahead of the curve. The NServiceBus API gives you an abstraction away from the raw MSMQ API, making it quite a bit easier to transition to some other message transport than if you'd written directly against MSMQ to start with.

The first thing you'll need to do is select an alternate message transport. Microsoft would of course prefer you use MSMQ's younger sibling, Azure Service Bus, but there are a lot of other options out there. We have a resource to help you make a decision in our documentation.

Once you've made that decision, there are a few main things to keep in mind when considering a move off MSMQ.

MSMQ is a distributed or federated messaging system where every server hosts the queue infrastructure, and outgoing messages are sent locally before being delivered to their ultimate address. In contrast, most other messaging systems are centralized or broker-based, meaning there is only one logical queue system, which usually exists in a cluster to provide for high availability. This makes scale-out quite a bit easier as multiple physical endpoint instances can both compete for messages on the same queue using the competing consumers pattern.
MSMQ was fairly unique in that it supported distributed transactions, meaning that database operations in SQL Server and messaging operations through the queues could all be combined in one atomic transaction so that either everything succeeded, or everything failed. Most other message transports do not support distributed transactions, but if you're using NServiceBus, we have a component called the Outbox that simulates the reliability of a distributed transaction by piggybacking on a local database transaction. See our blog post What does idempotent mean? for more details.
While not exactly easy, it is possible to gradually move a system from one transport to another while bridging between them. Check out our MSMQ-to-SQL Relay sample for a simple example of how to bridge across transports, or the NServiceBus Router for a more comprehensive solution.

Our solution architects are ready to help you come up with the best strategy for your specific situation. Contact us any time and we'll talk.

Our future plans

As far as Particular Software is concerned, we don't yet have any concrete plans to discontinue our own support for MSMQ systems.

At this time, you can still build and maintain NServiceBus systems with the MSMQ transport using NServiceBus version 7, which targets .NET Framework 4.5.2+ and .NET Standard 2.0, which we will support for as long as NServiceBus 7 is supported.

While we don't have any immediate plans to do so, the most likely event that would affect our support for MSMQ would be the release of a new major version of NServiceBus where we felt we needed to drop support for the .NET Framework altogether. However, our goal is to be backward compatible for as long as possible.

Once we released a major version of NServiceBus without MSMQ, according to our support policy the previous major version of NServiceBus would have mainstream support for all customers for 2 additional years, with an opportunity to purchase extended support for 2 additional years after that.

However, it bears repeating: We don't have any concrete plans to start that process yet. NServiceBus will still be supporting MSMQ systems for some time to come.

But what about open source?

One attempt has already been made to create a .NET Core version of System.Messaging using code harvested from .NET Framework reference source code. So, why not attempt to use that to create a .NET Core version of the NServiceBus MSMQ transport?

First, the license for that code only covers reference use, meaning "use of the software within your company as a reference, in read only form, for the sole purposes of debugging your products, maintaining your products, or enhancing the interoperability of your products with the software, and specifically excludes the right to distribute the software outside of your company."

Even if that weren't so, or if Microsoft were to make System.Messaging available under the MIT license, our customers value the reliability they get from our software, and the support guarantees we have in place to back that up. The risk of taking ownership of that code, in order to support our customers, would be too great.

And it wouldn't be the same anyway. As noted above, one of the defining characteristics of MSMQ was its support of distributed transactions, which aren't supported in .NET Core. Users would need to make significant changes to their systems just to account for this—it would by no means be a smooth transition. We don't think it's likely that companies would take that risk (if available) knowing that Microsoft has already rung the bells for MSMQ's funeral.

Summary

In 2007, the very first version of NServiceBus was a wrapper around MSMQ. Support for other message transports was added through the years. MSMQ is an important part of our history, so we have mixed feelings about its passing.

But we feel a measure of relief as well, because for about a year the future of MSMQ was murky at best. Customers would ask us if MSMQ was dead and we would answer that honestly, we don't know. It might be a bit painful, but at the very least Microsoft's announcement gives some certainty and finality to that question, so that we can all move forward.

Breaking up is hard to do, but for all practical purposes, MSMQ is gone and it's time to move on.

Rest in peace MSMQ. So long and thanks for all the fish.

About the author: David Boike is a developer at Particular Software who began his journey with distributed systems and messaging a decade ago with MSMQ and NServiceBus 2.0. It's been quite the ride.

The history of dependency injection (DI) containers in .NET is almost as long as .NET itself. StructureMap and Castle Windsor were released in 2004, Spring.NET in 2005, and more after that, each with their own unique API, some more opinionated than others.

And yet each of these libraries fulfills a fairly simple task: a place to hold on to dependencies so that objects can get them when they need them.

With .NET Core, Microsoft has created a DI container abstraction that is quickly becoming a de facto standard: Microsoft.Extensions.DependencyInjection.

NServiceBus now supports this same container abstraction via our new NServiceBus.Extensions.DependencyInjection package, which means you can use any container that conforms to the Microsoft abstraction with NServiceBus. This has a ton of advantages, but also means the time has come to retire our existing container adapters.

A common abstraction

The new Microsoft container abstraction provides a common abstraction that all container authors can conform to. But what does that really mean?

For the most part, we don't care what container you use, as long as you're happy with it. We don't want to be a blocker to you using the container of your choice…except many times, that's exactly what happens.

For example, in order to support Autofac, we have an NServiceBus.Autofac package that acts as an adapter between NServiceBus and Autofac. We have similar packages for CastleWindsor, Ninject, Spring, StructureMap, and Unity.

So when Autofac ships a new major version…you can't use it with NServiceBus! At least, not until we release a new version of our NServiceBus.Autofac adapter package, which we aren't always able to do quickly. Not that we don't want to, but like everyone else, we have a limited staff of engineers and have to prioritize what we work on.

But thanks to the widespread adoption of the new Microsoft abstraction, when Autofac 5.0 was released, there was also a release of Autofac.Extensions.DependencyInjection on the same day.

With the common abstraction and implementations provided by the container authors, you aren't dependent upon us to upgrade your container version. When a new version is released, you can upgrade immediately…or not. It's completely up to you.

More choices

Now that NServiceBus supports the Microsoft container abstraction, that means you don't have to be limited to the container adapters we've previously supported.

Previously-supported containers that can now be used via the Microsoft container abstraction include:

In addition to these, at the time of this writing you can also use the following containers which have never had a Particular-supplied adapter package:

Lamar (the successor to StructureMap) via Lamar.Microsoft.DependencyInjection
Dryloc via DryIoc.Microsoft.DependencyInjection
Grace via Grace.DependencyInjection.Extensions
LightInject via LightInject.Microsoft.DependencyInjection
Stashbox via Stashbox.Extensions.Dependencyinjection

And all of these packages should work great with the .NET Generic Host making them easy to use in all your .NET Core applications, including with NServiceBus.

Moving forward

Using the Microsoft abstraction gives our customers so much more freedom and choice that we've realized the time has come for those old adapter packages to be retired.

We've released new minor versions of the following packages, which obsoletes the adapter API with a warning:

NServiceBus.Autofac
NServiceBus.CastleWindsor
NServiceBus.Unity
NServiceBus.StructureMap

We will continue fix bugs in these packages according to our support policy, but the next major version (which we would likely release whenever we release NServiceBus 8.0) will mark the API as obsolete with a build error, prompting you to switch to the Microsoft abstraction.

If you're using one of these packages, you have three choices:

Continue using your existing version of your container adapter, and do not upgrade to the newest version that includes the deprecated API.
Upgrade to the minor version, and let the build warning serve as a TODO for later. (This is not an option if your project uses the TreatWarningsAsErrors project property.)
Use the NServiceBus.Extensions.DependencyInjection package along with the adapter package provided by your container of choice.

Unfortunately, there are a couple of containers that do not support the new abstraction:

Ninject, at the time of this writing, has decided not to provide an adapter.
Spring.NET does not currently support .NET Core, which means it cannot support the Microsoft abstraction either.

Since there is no clear upgrade strategy for these containers, we aren't taking the step of deprecating those packages yet. However, it's very unlikely that we would release new versions of these packages for a new major version of NServiceBus, and we may decide to add compile warnings to the packages in the future (as we have with the other packages) in order to guide users toward something else.

Ultimately, the choice of how to proceed is up to you.

Summary

With support for the new Microsoft dependency injection abstraction (say that three times fast), we've removed ourselves from the DI equation. That means you now have a much wider choice of DI containers to use, and the freedom to upgrade them whenever it makes sense for you.

Effectively, we've gotten ourselves out of the way, so now you can do what you need to do.

Most of the existing NServiceBus container adapters can be replaced with the new package and the adapter provided by the container author.

To see how this is done, check out the documentation for the new NServiceBus.Extensions.DependencyInjection package, the generic sample, or one of our newly updated container-specific samples:

About the author: Sean Feldman is a generalist at Particular Software who can appreciate simple and elegant code, as well as how much work it takes to create and maintain that simplicity.

A few months back, Microsoft released a brand new SQL client library called Microsoft.Data.SqlClient on NuGet. Wait…didn't we already have a stable SQL client library shipped as a part of .NET Framework? Why reinvent the wheel?

Let's see why, which one you should choose, and how this affects our NServiceBus packages.

Update Cycles

Have you noticed that database servers are getting more frequent updates? To make use of new features in the server, you need an updated client library. This is especially true in the cloud with services such as Azure SQL Database that tend to update very often.

Unfortunately, the System.Data.SqlClient library is also shipped as part of the .NET Framework and can only be updated when a new version of the .NET Framework is released. Releasing new versions of the .NET Framework is a big task and is limited by the Microsoft support and backward compatibility guarantees. New versions of the .NET Framework are not released as often as the client library needs to be updated.

To resolve this situation Microsoft released a new client library called Microsoft.Data.SqlClient. This client library is shipped outside of the .NET Framework, only as a NuGet package. The NuGet package can be released much more frequently than the .NET Framework, allowing clients to keep up to date with the latest changes on the server.

In most cases, the Microsoft.Data.SqlClient will be the one you should use for any project moving forward, and you will probably need to convert existing projects to use it over time as well.

Migration

As there is feature parity between the old and the new SQL clients at the time of writing this post, it should be fairly easy to migrate to the new SQL client.

For NServiceBus users, we have a few packages that are affected:

For the SQL Server Transport, we now have two separate packages:
- NServiceBus.SqlServer has been updated to version 6, and will continue to use the System.Data.SqlClient.
- NServiceBus.Transport.SqlServer is a new package that contains the same code, but uses the new Microsoft.Data.SqlClient package. The first release of this package is also version 6 and we will continue to release both packages in lockstep.
For SQL Persistence, the NServiceBus.Persistence.Sql package has been updated to version 5, which supports the new SQL client. Because SQL Persistence uses a connection builder approach to creating the SqlConnection object, you can use your choice of SQL client with the same SQL Persistence package.

As indicated in our support policy, the older packages that rely on the old SQL client will remain supported until Feb 2021.

If you are using an older version of libraries such as EntityFramework, Dapper, or other third-party libraries relying on SQL client, the best approach is to upgrade them all at once, as the chances are high that the newer version of those libraries won't support the old SQL client.

Summary

There is a newer SQL client library from Microsoft that will be receiving faster updates. We have made that available in our SQL Transport and Persistence libraries so you can start using it in your solutions now.

For more infomation, please see:

About the Authors: Hadi Eskandari and Mike Minutillo are both developers at Particular Software. When not at his computer, Hadi is on the lookout to take pictures of the amazing landscapes in Australia and beyond. Mike can be found playing tabletop games with people all around the world using online collaboration tools. As parents, they are both excited to see SqlClient growing up and striking out on its own.

For better or for worse, online education has become the most common (and in many cases, the only) way to learn these days. Schools and universities have had to adapt to a remote environment while companies such as Pluralsight have stepped up and offered online courses for free or at a discount. In-person training and conferences have been postponed, cancelled, or transitioned to the virtual world.

We’ve certainly been affected by this shift at Particular with the cancellation of our popular Advanced Distributed Systems Design (ADSD) course this year. In response, we’ve made the videos from the course free for a limited time for online viewing.

But the real reason for this post is to tell you about Martin Hägerås and Mattias Olgerfelt from the Swedish Transport Agency. Martin attended the live ADSD course in London in 2016 and the two recently held their own version of the course in a unique way. Over the course of a few months, they and 40-50 others went through the video course together as a group (using the Team Lifetime Access version of our full-featured video course), stopping after each section for a group discussion.

We had a chance to chat with Martin about their experience.

Q: I understand you watched the videos together as a group. How was that structured?

We divided the participants into two groups of about 20-25 people. We thought of using a meeting room at the office, but instead we booked a largish room at a local conference center, in order to get away from everyday interuptions. It was the right choice as it made it possible for people to focus completely on the course.

Q: How long did it take?

We dedicated entire days to the course, and in total it took us 7 days per group. We spent maybe 5 hours each day watching the videos. We skipped a few sessions (the exercises), but in retrospect we should have watched them all I think. We plan on revisiting the ones we skipped later.

We had 3+ weeks between sessions, so the 7 course days were spread out across 6 months or so.

Q: How did you handle discussion?

We paused for discussion after each video clip, unless they were really short. Usually I would summarize the episode we’d just watched and try to explain especially tricky parts, especially how it relates to our specific situation at work. And we also answered questions of course.

There were a lot of questions, especially from more senior architects. So there was a lot of time for discussion. I think about half the time was just discussions. I think it helped people understand the concepts better. We had a lot of good feedback about these summaries and discussions. Many people said that it really helped them.

Q: How was the experience different than if everyone simply watched the videos on their own time?

A few things. First, getting away from the day-to-day work schedule is really important for you to be able to absorb material of this complexity. Second, the in-between discussions and extra clarifications helped people understand, and made it less abstract.

And finally, there’s just something about learning together as a group that feels good I think. You get the feeling that you’re in this together. It’s also more fun than watching by yourself.

Q: What advice do you have for any group that wants to do this themselves?

Allow for time between each video for discussions. It’s even better if someone has already attended the course and had some time to think about it.

Q: How was the experience different than attending the live course?

I attended the course in London with three colleagues. Since then we have spent a lot of time debating the concepts and thinking about how to apply them to our daily work.

For me it was interesting to watch the whole course again (twice even). The recording is from the course I attended so I got to listen to all my own questions, which is interesting too :)

I think it’s a course that is worth attending more than once. My colleagues who’d attended it before felt the same way.

It also worked very well as a video course. Though maybe it would have worked less well if none of us had attended it before.

Takeaways

Information-dense videos on software architecture can be difficult to consume on your own. It’s better to learn together in a group. Other group members encourage you to keep progressing, and provide an outlet for questions and discussion to deepen your learning.
Even if you can’t gather together in one place, group discussion to help cement the concepts is very helpful. If necessary, the videos can be watched together and discussed over a regularly-scheduled Zoom call.
The number of hours per session and the spacing of the sessions is up to you, but being able to truly step away from the daily grind to focus exclusively on the content is important.

Now it’s your turn

In our current global situation, of course, the idea of 20-25 people in the same room together might sound like a fantasy and our goal isn’t to highlight the glory days of eight weeks ago. And few people will claim that watching densely-packed architectural videos all day is a substitute for in-person training. But it shows how a bit of organization, enthusiasm, and out-of-the-box thinking can help you get the most out of the resources available to you. By combining the accessibility of online videos with the interactivity of social contact (even over the internet), you can learn more than you might by watching the videos on your own.

We hope during this period of change you can take full advantage of the opportunity to watch, in the words of one participant, “the best course I’ve ever taken in my 20 plus years of IT”. When you do, tell us about it at https://discuss.particular.net/.

More and more companies are realizing that it’s in their best interest to move to the cloud. As software consultants at Headspring, we’re often tasked with helping organizations migrate their highly complex legacy systems to modern cloud environments. For clients using NServiceBus to facilitate messaging, moving to the cloud without it seems untenable. So how do we prepare to take NServiceBus with us?

For complex software systems like the ones we’re used to working with, moving to the cloud often involves more than a simple “lift and shift.” NServiceBus proves to be just as valuable in a cloud environment, but various aspects of its configuration and integration into the system need to be modified in order to take full advantage.

One of our recent clients—a government agency—was already using NServiceBus and wanted to migrate their system to Azure. The steps we took to get NServiceBus ready to run in Azure may be helpful to you as you map your own cloud moves. I’ll walk you through exactly what we had to do differently in the cloud, explain the hybrid-cloud solution we built with NServiceBus for reporting, and demonstrate how we handled logging in the cloud, as well as our client’s need to run locally on occasion.

Making the move to cloud computing

We were looking to move a software system that had been running successfully on premises for a decade. The system uses NServiceBus to facilitate heavy auditing required by government regulations, without slowing down the site, and has run without a hitch since deployment.

In order to get up to date with changing regulatory requirements in their space, embrace new efficiencies, and set themselves up for the future, the client decided to move to the cloud. They went with Azure because the Azure Government Cloud environment is compliant with new hosting regulations that they needed to meet. The client’s existing servers had, over time, become an elaborate house of cards that people were somewhat afraid to touch. They loved the idea of a simple, configured app service instance that could be spun up or down, grown or shrunk, or even destroyed as needed without losing anything valuable.

A focused approach to getting cloud-ready

This was a legacy system, and there was a lot that had remained unchanged for far too long. In deciding the level of change we wanted to introduce as part of our cloud migration, we had to decide what needed to change in order to get them to the cloud and leave the nice-to-haves to future efforts.

We were on a limited timeline and budget, and the client was looking for the final switchover to be smooth. To keep the surface area of change minimal, we decided to make only the modifications necessary to support the migration—no unrelated library updates that could introduce behavior changes, or unnecessary UI improvements. We’ve found these projects tend to go better if you touch one thing at a time, and the first thing to do was support the cloud migration.

Even with this policy in place, as we’ll see, there would still be a few areas to address to get it ready to run in the cloud.

Cloud-ready message transport

The first aspect we needed to upgrade was the old application’s use of MSMQ. MSMQ is not a cloud-friendly technology—it only provides local queues. Any queue traffic between applications on different machines would be between MSMQ instances, using MSMQ’s specific communication. There is no single queue server, just queues that live on each application server—and that violates the idea of instances being able to be spun up or down with nothing lost. Plus, MSMQ is pretty much dead anyway.

The natural replacement for MSMQ in Azure is Azure Service Bus. Unlike MSMQ, Azure Service Bus provides a single, centralized broker that is separate from any individual instance of the application. The next step was to wire in this new message broker, and the first thing we’d have to do was upgrade our existing version of NServiceBus.

Getting versions up to speed

NServiceBus has made a lot of great improvements over the last few years—not only in supporting Azure, but also in embracing async/await, configuration improvements, supporting .NET Core, and many others. However, we had some changes to make in the system to support the latest version of NServiceBus. The application was also on a very old version of .NET, so, driven by our guiding principle of only making changes that would support the cloud, we bumped the application up to .NET 4.6.2 (the minimum version we could get away with) and snapped in a modern version of NServiceBus.

The specific updates we had to make as part of our NServiceBus upgrade were related to the bus metaphor and the configuration. Since NServiceBus was added to the system, the product has moved away from the catch-all IBus to more granular, intention-revealing interfaces. Swapping in these new interfaces meant looking at how we actually used NServiceBus in each location and matching it to one of the new interfaces. That took work, but resulted in message handlers that better reflected their actual use. The configuration of NServiceBus changed as well: We moved from the older, specialized XML-Based configuration to the modern code-based configuration that is common to newer tools.

Creating a hybrid cloud solution for reports

Of course, unexpected things always come up in big projects like this, and in this case, our surprise “gotcha” was SQL Server Reporting Services (SSRS) reports.

The Azure environment we were in didn’t have a great story for migrating the existing SSRS reports over—other than upgrading them all to Power BI. We just didn’t have time to tackle that kind of upgrade, so we ended up tying together our all-cloud infrastructure with a client-owned server to run the actual reports.

We used NServiceBus from our cloud environment to send a message to a small application on the client-owned server, had that server generate an SSRS report for us, and then send the report back to our main cloud-hosted application. Since this was a government-specific system, outbound traffic was quite locked down—we were unable to reach anything hosted on premises. However, we were able to access the cloud transport (the queue in Azure) from anywhere, including from this on-premises server. This hybrid cloud solution quickly got us past what could have been a huge blocker for a critical part of the project.

Centralized logging

Like message queues, application logging requires special consideration in the cloud. NServiceBus comes with logging out of the box, so it logs quickly and easily to the filesystem without any configuration. This is fantastic when you’re getting started, and it works fine on a traditional server.

As we saw with MSMQ, the cloud is fundamentally different, so appropriate configuration is required. In the cloud, the filesystem is part of the whole system that can be spun up, spun down, or thrown away. We want to keep our log somewhere different in order to be sure to preserve them even if a particular instance of the site were to disappear.

Luckily, NServiceBus has no problem working in this environment, and there are several options for implementation. The open source tool, Serilog, for example, includes the concept of “sinks,” which are destinations for logs. We added a sink for Azure Monitor, which provides modern, centralized, searchable structured logs in Azure. Since NServiceBus has an adapter to log into Serilog, we were good to go to log in the cloud. For those on current versions of NServiceBus, you can now take advantage of the NServiceBus.Extensions.Logging package, which lets you use just about anything you want. If we had been able to more fully modernize the code, we would have likely used this to avoid any direct ties into a particular logging framework.

Running locally

NServiceBus is critical to our application, but it is only one piece of the whole. Sometimes, it is useful to have the option of running the application locally and disconnected while developing functions unrelated to messaging, such as when working on business logic or UI.

Certain scenarios make a local option attractive: When traveling, connecting via a restrictive VPN, or working from a location with an unreliable internet connection such as hotels—or even home when there is a lot of neighborhood internet traffic. In our client’s case, we added an alternate configuration to run the Learning Transport provided by Particular, and we can switch between these modes using application configuration.

While using a local transport can come in handy, I would not recommend it as a way to test or validate your NServiceBus system. Different transports have very different properties and failure modes, so confirming that a system works with one transport says almost nothing about how that system will perform when configured with a different transport. Removing blockers to working on other parts of the system is the only reason I’d use this kind of alternate configuration—in this case, testing your system in a staging environment equivalent to production before launch is even more critical.

If you plan to develop with a consistent internet connection, you should consider a simpler development configuration that does without any switching. Instead, try using developer-specific namespaces in Azure Service Bus to develop locally—you will get to see how your code will do in Azure while you are developing, with no extra effort.

So, how’d we do?

We launched the updated system in December, and it’s been running successfully ever since. The system has been running on its more updated version of .NET, Azure Service Bus has been receiving and distributing messages, Azure Monitor has been receiving NServiceBus logging messages, and we’ve even run locally on occasion as well. We’re a long way away from fully modernizing the entire system, but the move to Azure has set us up for success.

This example scenario isn’t a comprehensive guide for moving your own software system to the cloud with NServiceBus. However, it does prove that it is possible, doable, and not too hard after all! I hope this story, at minimum, helps you navigate some of the challenges you may run into if you decide to make a similar move. In the end, you’ll end up with all the benefits of NServiceBus working for you in the cloud—which these days, is the place to be.

One of the challenges in building a distributed software system is that with so many processes doing their own thing, it can be hard to get the bird’s-eye view of what’s going on. Is everything healthy? Is the performance ok? Is there anything going on that could cause a problem?

ServicePulse is our application for monitoring your entire distributed system. It’s where heartbeats tell you if endpoints are healthy or have failed. It’s where our monitoring tools tell you how many messages are in each queue, how long it’s taking to process those message, and if the system is able to keep up with the load or is falling behind. It’s also where you find out if messages have failed processing, see the error (without having to dig through log files), and get the system back on track once the error is fixed.

In this post we’ll tell you about some of the usability enhancements we’ve added in our new version, ServicePulse 1.25.

Managing endpoint instances

First, from the Monitoring page we can now get a quick glimpse at how many instances of each logical endpoint are running in our system.

Here we can see that the Sales endpoint has two running endpoint instances, while every other endpoint is only running a single instance:

If one of these instances fails and is no longer sending monitoring information to ServiceControl, that number will adjust to 1 to show that you’re only seeing the metrics from one endpoint instance.

When this occurs, you can click on the endpoint name to view the detail, and then below the large summary graphs, click on the Instances tab.

In this case, the Sales endpoint’s instance-2 has failed. If the endpoint has the Heartbeat plugin installed, you’ll also see this failure in the Heartbeats tab.

Sometimes this is expected, especially in a dynamic-scale scenario where additional endpoint instances are added to cover peaks and then later retired. In this case the instance hasn’t failed—you shut it down on purpose. In this case, we want to remove the retired endpoint instance. With ServicePulse version 1.25, now we can.

Hover over the failed instance, and you’ll see a trash can icon, enabling you to remove it:

Once the retired instance is removed, the data will return to the graphs on the main monitoring page.

Heartbeats filtered

Complex systems can have dozens upon dozens of endpoints, and scrolling to find a specific one isn’t fun. Some users will organize these by endpoint name into a kind of hierarchy, such as LogicalService.Category.SpecificEndpoint which can help to organize them, but doesn’t always make it much easier to zero in on the necessary information.

In the new version, the Heartbeats page now contains sorting controls allowing you to sort by endpoint name or by last received heartbeat, in both ascending and descending orders, as well as a text filter for the endpoint name.

The text filter can make it easy to find all the endpoints within a specific logical service or category if using hierarchy-based names as shown above, and sorting by last heartbeat can make it easy to identify endpoints in trouble.

Failed messages

Failed messages is one of the most powerful features of ServicePulse, presenting failed messages in groups, and allowing you to view the details of the failures and retry messages through their original endpoints.

Sorting and grouping

We’ve now added control over how failed message groups are sorted and grouped:

In addition to grouping failed messages, we’ve added the ability to group archived messages in the same way.

When there are a lot of archived messages, this makes it a lot easier to sort through them in case you need to retry one of the messages or even just to investigate. Similar to normal failed message groups, when you click on a group, you can see the details of each failed message. There are additional filtering controls to only show messages archived within the last two hours, one day, or 7 days if displaying everything is still too much.

Quick access to details

Finally, for failed messages, if you click on a failed event message from the Dashboard or Events page, now it will take you directly to the details of the failed message.

See all events

The Dashboard page has always shown you the last 10 monitoring events that have happened in your system. This includes things like endpoints starting and stopping, confirmed heartbeats, failures to receive heartbeats, failed custom checks, etc. However, in a busy system with lots of endpoints, it can be easy for things to get lost.

Now, we’ve added an Events page to the top navigation that will display the full history of events, so that events don’t get lost. You can also get to this page from a button we’ve added to the bottom of the Dashboard.

Small improvements

A variety of tiny improvements round out ServicePulse 1.25:

In the case where ServicePulse has successfully connected to ServiceControl but there happens to be no monitoring data available, the difference between that and “no connection” is now more obvious. This is most helpful when originally setting up monitoring, before the monitoring plugin has been added to the endpoints.
We moved the Heartbeats configuration to the Heartbeats page, to keep all heartbeat-related information together.
We added a version check for ServiceControl Monitoring which will let you know when that service requires a software update. (This feature requires ServiceControl version 4.9.0 or later.)

Summary

ServicePulse is an indispensible tool to help you keep tabs on your NServiceBus system, giving you peace of mind that everything is operating as it should and helping you to fix things when problems arise.

With our new licensing model, all customers can use ServicePulse—once you migrate it makes no difference what tier you’re on. So if you haven’t tried using ServicePulse in your system before, now’s the time. Check out our monitoring demo for an easy way to see ServicePulse in action without having to install anything.

If you’re already running ServicePulse in your system, get the latest patch version from our downloads page.

We hope the improvements in ServicePulse 1.25 will make ServicePulse more enjoyable to use, so you can get your work done faster.

A while back, we introduced a brand new transport for use with Azure Service Bus. This transport was a necessary step in our Azure offering to allow users to target .NET Standard and .NET Core. It also used the new Microsoft.Azure.ServiceBus client rather than the older, deprecated client.

More importantly, it started the process of deprecating the now-legacy Azure Service Bus transport. At the time, we didn’t have details about how we would do that; now we do.

The legacy Azure Service Bus will be deprecated as of May 1, 2021.

What does that mean?

After May 1, 2021, the legacy transport will no longer be supported and won’t receive updates, patches, or fixes. Until that date, it will receive only critical updates and bug fixes, just as it has over the last eighteen months.

If you’re still on the legacy Azure Service Bus transport, we have an upgrade guide to help you migrate. You have just shy of a year to do so but we strongly recommend you don’t wait until the last minute. Of course, if you run into problems, our support is just a click away.

Why are we deprecating the legacy transport?

The legacy transport was built against the WindowsAzure.ServiceBus client, which is old, not actively developed and all but abandoned. Microsoft has stopped short of saying this client is not officially supported, but the fact that it doesn’t (and will never) support .NET Core speaks volumes about its future.

Summary

We know upgrading a transport is never high on someone’s to-do list but the countdown has started on the legacy Azure Service Bus transport.

If you haven’t migrated to the new transport yet, we encourage you to do so soon. When you do, we’ve got you covered with both a migration guide and full support for the new transport.

Supporting NServiceBus on .NET Core 3.1 has been a focus for us here at Particular in 2020. Well, the day is finally here. We’re happy to announce that with the latest version of NServiceBus, you can build and deploy your NServiceBus endpoints on .NET Core 3.1, taking advantage of great features like built-in hosting, logging, and dependency injection.

Support for .NET Core 3.1

Microsoft has been investing in .NET Core as their next major framework, going as far as announcing that .NET Framework 4.8 will be the last major version of the .NET Framework. All of their efforts have been focused on the .NET Core space, and they are heading towards .NET 5 as we speak.

Our last major release of NServiceBus, introduced support for .NET Core 2.1. In the meantime, Microsoft has brought us another LTS (long term support) version of .NET Core, version 3.1.

Even though you could technically run NServiceBus on .NET Core 3.1, as it’s backwards compatible with .NET Core 2, we weren’t comfortable recommending doing so to our customers until we had tested it for ourselves. In order for us to add .NET Core 3.1 to our list of supported platforms, there were a lot of boxes we wanted to tick off to make sure we continue to offer our users the best experience possible. So we went on a journey of updating our build infrastructure and running our extensive test suite targeting .NET Core 3.1. This exposed a handful of edge cases, which were then tackled and fixed.

To further improve the user experience, we collected more than a hundred samples available on our documentation website, and made them available as downloads specifically targeting .NET Core 3.1 as shown here.

Target platforms for samples

If you were holding back on upgrading, now is the time. If you need help upgrading from another version of the .NET Core framework, this article might be useful to you. To upgrade to the latest version of NServiceBus, there are upgrade guides available here.

But wait, there’s more!

NServiceBus has provided abstractions for Hosting, Logging, and Dependency Injection for over a decade. Over the last years, Microsoft has invested heavily in these areas, bringing us brand new built-in APIs under the Microsoft.Extensions namespaces and packages. Now that the Microsoft’s Extensions APIs are quickly becoming industry standards, we’re more than happy to retire our implementations and rely on theirs. Don’t panic, we haven’t retired anything yet. When we actually do, you will have up to five years to upgrade.

One of those APIs is their built-in implementation of dependency injection in ASP.NET Core. This API reduces the need to reference an external package and covers the dependency injection needs of most projects.

With this new version, your handlers can take dependencies on services registered in ASP.NET Core services:

class SomeHandler : IHandleMessages<SomeMessage>{    ICalculateStuff stuffCalculator;    public SomeHandler(ICalculateStuff stuffCalculator)    {        this.stuffCalculator = stuffCalculator;    }    public async Task Handle(SomeMessage message, IMessageHandlerContext context)    {        await stuffCalculator.Calculate(message.Input);        // Do some more stuff if needed    }}

We are continuing our efforts to integrate with the Microsoft.Extensions packages in order to streamline our usage of these APIs. In retiring our own implementation of these concerns, we can evolve more quickly down the line, and increase our focus on the essence of our platform: messaging.

If you’re interesting in how all of these parts function together, check out this sample, or take our latest release for a spin.

Microsoft Azure Functions provide a simple way to run your code in the Azure Cloud. They are easy to deploy, scale automatically, and provide many out-of-the-box ways to trigger your code. Rather than pay for an entire virtual machine, you only pay for compute while your code is being executed.

NServiceBus makes Azure Functions even better. You can use a simple but powerful API to consume and dispatch messages, add robust reliability features like delayed retries, an extensible message processing pipeline, and a suite of tools to help you monitor and maintain your message-driven system. We think they go together like milk and cookies.

Let’s take a look at how using NServiceBus and Azure Functions together can make your serverless applications even better.

Better together

While you can use Azure Functions HTTP triggers to create a web API, what’s more interesting with NServiceBus is to set up a queue-triggered Function using Azure Service Bus.

Normally, you’d need to set up a function on a queue and then handle that message natively:

[FunctionName("MyFunction")]public static async Task Run(    [ServiceBusTrigger(queueName: "queuename")]    string message,    ILogger logger,    ExecutionContext executionContext){    // Process the message}

This is fine but has its limits. The message can be just about any type, but you can only support one type. If you need another message type, you need another queue for that, or you have to get fancy and handle the message serialization and routing yourself. It can get complex pretty quickly.

NServiceBus solves all that with a little bit of code that converts the function into a full-fledged NServiceBus endpoint that can handle multiple message types:

[FunctionName("MyFunction")]public static async Task Run(    [ServiceBusTrigger(queueName: "queuename")]    Message message,    ILogger logger,    ExecutionContext executionContext){    await endpoint.Process(message, executionContext, logger);}static readonly FunctionEndpoint endpoint = new FunctionEndpoint(executionContext =>{    var serviceBusTriggeredEndpointConfiguration = ServiceBusTriggeredEndpointConfiguration.FromAttributes();    return serviceBusTriggeredEndpointConfiguration;});

After this, to handle any message, you add a handler, just like any other NServiceBus endpoint:

class MyRequestHandler : IHandleMessages<MyRequestClass>{    static ILog Log = LogManager.GetLogger<MyRequestHandler>();        Task Handle(MyRequestClass message, IMessageHandlerContext context)    {        Log.LogInformation("Hey, a message with id {0} arrived!", message.Id);    }}

Now you can take advantage of all of the powerful features of NServiceBus in your Azure Function.

Evolving message contracts

Due to the way Azure Functions encodes the incoming message type in the function’s method signature, things can get dicey when you need to evolve that contract without breaking in-flight messages.

In this example, the function processes a MyRequestClass message and emits a MyNextRequestClass message:

[FunctionName("MyFunction")][return: ServiceBus("nextreceiverqueue")]public static async Task Run(    [ServiceBusTrigger(queueName: "receiverqueue")]    MyRequestClass message,    ILogger logger){    logger.LogInformation("Hey, a message with id {0} arrived!", message.Id);        return new MyNextRequestClass { Id = message.Id };}

If your message is JSON serialized, this just works. However, message contracts are not always stable. Any burgeoning project is sure to need to change a message contract eventually due to evolving business requirements, changes to third-party integrations, or just plain old growing pains.

Since you may have messages with the old contract still in-flight, it is safest to create a new Function with a new queue for the new contract version. Otherwise, you will have to handle the serialization yourself in the Function handler and add the alternate paths to process each contract version.

With NServiceBus, we can just add another handler to the project:

class MyRequestHandlers :    IHandleMessages<MyRequestClass>,   IHandleMessages<MyRequestClassV2>{    static ILog Log = LogManager.GetLogger<MyRequestHandler>();        async Task Handle(MyRequestClass message, IMessageHandlerContext context)    {        Log.LogInformation("Hey, a v1 message with id {0} arrived!", message.Id);        await context.Send(new MyNextRequestClass { Id = message.Id });    }           async Task Handle(MyRequestClassV2 message, IMessageHandlerContext context)    {        Log.LogInformation("Hey, a v2 message with id {0} arrived!", message.Id);        await context.Send(new MyNextRequestClassV2 { Id = message.Id });    }}

Simple message dispatching

NServiceBus also makes it much easier to send messages within the Function. For example, what if we wanted to send another message at the end of the previous handler?

In a normal Azure Function, the extremely flexible bindings offer a lot of options for sending outgoing messages. An attribute must decorate the method to use the function’s return value as an outgoing message, but can also decorate method parameters instead. Those parameters could be a normal type if you want to send one message, but if you need to send multiple messages then you must use an ICollector<T> or an IAsyncCollector<T>. In the attribute parameters, you need to specify a different EntityType if you want to publish to a topic rather than send directly to a queue.

[FunctionName("MyFunction")][return: ServiceBus("nextreceiverqueue")]public static async Task Run(    [ServiceBusTrigger(queueName: "receiverqueue")]    MyRequestClass message,    [ServiceBus("topic", Connection = "ConnectionString", EntityType = EntityType.Topic)]IAsyncCollector<SomethingHappenedEvent> collector,    ILogger logger){    logger.LogInformation("Hey, a message with id {0} arrived!", message.Id);    await collector.AddAsync(new SomethingHappenedEvent { Id = message.Id });    return new MyNextRequestClass { Id = message.Id };}

With NServiceBus, all these messaging operations are available via the handler’s IMessageHandlerContext, and NServiceBus will get the messages where they need to go without requiring a change to the function definition itself:

async Task Handle(MyRequestClass message, IMessageHandlerContext context){    Log.LogInformation("Hey, a message with id {0} arrived!", message.Id);    await context.Publish(new SomethingHappenedEvent { Id = message.Id });    await context.Send(new MyNextRequestClass { Id = message.Id });}

Rapid prototyping

One of the biggest advantages to Azure Functions is the ability to easily and cheaply deploy a proof of concept solution and then scale it up to production, especially when you’re unsure of what scale the solution will ultimately need to fulfill.

Eventually, solutions that start off on Azure Functions may reach a level of maturity where hosting on Azure Functions is no longer preferable. The cost of Azure Functions, while flexible, may be higher than running the system on dedicated resources.

In this case, using NServiceBus as an abstraction layer over Azure Functions really pays off. The message transport (Azure Service Bus) stays the same, and the message handlers (classes that implement NServiceBus’s IHandleMessages interface) all stay the same. Meanwhile, hosting can easily be shifted to WebJobs in an Azure App Service or to containers running on Azure Kubernetes Service. Only the minimal boilerplate code that creates endpoints out of queues is coupled to the Azure Functions stack.

Other benefits

A few other benefits of integrating NServiceBus with Azure Functions include:

Non-JSON serialized messages, including handling multiple serialization types.
Message body or message property encryption and decryption.
Message auditing
Adding an additional message pre- or post-processing behavior is easy and powerful but doesn’t clutter up your business processing.
Centralized error handling
Provisioning the messaging infrastructure is easier with the help of our tooling for creating queues and subscriptions

Introducing our Preview program

Our support for Azure Functions is being released as the first result of our new preview program.

But make no mistake: Preview does not mean substandard quality. Our goal is to be enterprise-grade, which means stability. All our previews are production-ready capabilities, but they’re licensed separately from the rest of our tools, and come with their own support policy.

We get a lot of requests to support new technologies with the Particular Service Platform. As you can imagine, it is hard to keep up with them all. Deciding what to add and when to add it has been hard. Over the last year or so we have been working internally on how to become more innovative while still maintaining the high standards we hold ourselves to. This is why we have created the new program we are calling Previews.

Previews are new software that either expands the technologies that our Platform integrates with or adds entirely new ideas to our platform. Since it is new that comes with some risk. While we do our research, we still are not sure if our customers are willing to adopt the software we release. The Preview program is designed to be the final test of any new innovation we have been exploring. By releasing a minimally featured version, with support in your production environments, we can validate that the solution we are bringing forward is the correct one and that it will be well received by our current and future customers.

The current Preview software we are offering and the status of each one can be found on our Platform previews page. Let us know if you are using any of our Preview software and keep an eye out here and on our twitter feed as we launch additional Previews.

Summary

With support for Azure Functions, we are bringing the Particular Service Platform to the cost-efficient serverless environment. NServiceBus enables simple message handling while taking care of serialization, routing, error handling, recoverability, and other complexities of messaging in distributed systems.

For another look at NServiceBus and Azure Functions together, check out Azure Functions for NServiceBus by Simon Timms.

To get started with some code yourself, check out our Using NServiceBus in Azure Functions with Service Bus triggers sample. We also have a version of the sample for Azure Queues if that’s more your style.

At Particular Software, we don’t believe in packaging all the cool new features into major versions that make it hard to upgrade. We prefer to deliver improvements incrementally in minor releases that are easy upgrades, deferring breaking changes until the next major release to clean things up before the next set of incremental improvements.

That means that when it comes to new goodies, minor releases are where it’s at, and this minor release is no exception.

NServiceBus 7.4 includes some great new enhancements that make mapping sagas easier and more powerful, enable multiple different strategies for message conventions, and subdivide a message flow into multiple conversations.

Let’s take a closer look.

Finding sagas

Saga mapping is a configuration activity that tells the platform how to find a particular saga instance when it is processing a message. We’ve created a new API that makes mapping sagas more straightforward and less duplicative.

This is what saga mapping used to look like:

public override void ConfigureMapping(SagaPropertyMapper<MySagaData> mapper){  mapper.CreateMapping<OrderShipped>(msg => msg.CustomerId).ToSaga(saga => saga.CustomerId);  mapper.CreateMapping<OrderBilled>(msg => msg.CustomerId).ToSaga(saga => saga.CustomerId);}

This API works, and there are thousands of sagas out there that are defined in this way. However, the existing API isn’t very obvious, and results in a lot of duplication especially with multiple mapped messages.

For historical reasons, each message mapping has to define how it correlates to a saga. That’s the ToSaga() call, and if you leave it out the message doesn’t get mapped. The API also implies that each message can correlate to the saga with a different property, but that doesn’t work. Sagas can only have one correlation property, so we check for that and throw an exception if you try.

We introduced a new API that maps the saga correlation property first and then uses that correlation to create a mapping for each message.

public override void ConfigureMapping(SagaPropertyMapper<MySagaData> mapper){  mapper.MapSaga(saga => saga.CustomerId)    .ToMessage<OrderShipped>(msg => msg.CustomerId)    .ToMessage<OrderBilled>(msg => msg.CustomerId);}

This new API better represents the intent of the mapping and results in cleaner, simpler saga mapping with no unnecessary duplication.

Note: If you use SQL Persistence, that will also need to be upgraded due to how SQL Persistence analyzes the source code of the ConfigureMapping method to determine the correlation property.

Saga mapping by headers

While we were working with the saga mapping API, we took the opportunity to enable saga mapping via a message header. In this example, when processing an OrderShipped message, the system will use the Shipping.CustomerId header to find a saga instance.

public override void ConfigureMapping(SagaPropertyMapper<MySagaData> mapper){  mapper.MapSaga(saga => saga.CustomerId)    .ToMessageHeader<OrderShipped>("Shipping.CustomerId")    .ToMessage<OrderBilled>(msg => msg.CustomerId);}

The sample also demonstrates that you can mix and match message property and message header correlation within a single saga. For more complex saga mappings, you can always create a behavior that performs your correlation logic and elevates the correlation property into a header on the message.

It’s important to note that the existing saga mapping API is not being deprecated, so if your sagas are working just fine, there’s no need to do anything. But we think you’ll like the more concise API for all of your new sagas going forward.

Multiple message conventions

In NServiceBus version 7.4, we’ve introduced a way to support multiple rules for message conventions.¹

NServiceBus comes with easy-to-use message conventions based on marker interfaces. Commands implement the ICommand interface, and events implement the IEvent interface. (Some other messages, such as replies, implement the IMessage interface.) For many projects this is good enough, but you’ve always been able to override these conventions if you want to.

Here’s how you might currently identify messages based on a type name suffix:

endpointConfiguration.Conventions()  .DefiningCommandsAs(t => t.FullName.EndsWith("Command"))  .DefiningEventsAs(t => t.FullName.EndsWith("Event"))  .DefiningMessagesAs(t => t.FullName.EndsWith("Message"));

The problem is that this overrides the current convention every time it is called, meaning that you can only have a single message convention that all messages must adhere to. As a system grows over time, or merges with other systems, message conventions are likely to shift and change. Dealing with that using a single message convention can lead to some complicated and tightly-coupled code. To handle these cases, we have introduced a new message convention abstraction. Here’s the same suffix message convention applied using this new abstraction.

class TypeNameSuffixConvention : IMessageConvention{  public string Name { get; } = "Type name suffix";  public bool IsCommandType(Type t) => t.FullName.EndsWith("Command");  public bool IsEventType(Type t) => t.FullName.EndsWith("Event");  public bool IsMessageType(Type t) => t.FullName.EndsWith("Message");}endpointConfiguration.Conventions()  .Add(new TypeNameSuffixConvention());

The Add(IMessageConvention) method doesn’t override the existing convention, so now the endpoint will recognize types that match either the built-in default convention (based on marker interfaces) or the type name suffix convention above. As the system grows over time and messages are added from other areas of the solution, you can define new message type conventions and add them to the list.

endpointConfiguration.Conventions()  .Add(new SalesMessageConvention())  .Add(new ShippingMessageConvention())  .Add(new BillingMessageConvention());

If you need to know which message conventions are configured for an endpoint, this info has been added to the startup diagnostics file.

{  // ...  Messages: {    CustomConventionsUsed: true,    MessageConventions: [      "NServiceBus Marker Interfaces",      "Sales messages",      "Shipping messages",      "Billing messages"    ]  }  // ...}

If you need to know which convention was applied to each message type, turn on debug logging:

2020-07-20 13:20:01.576 DEBUG TestMultiConventions.PlaceOrder identified as message type by Sales messages convention.

New conversations

A message conversation is a set of related messages. Each message sent by NServiceBus contains a header that states what conversation that message is a part of. This header is sticky. When a message is processed, the conversation header is copied to any outgoing messages. This is one of the key headers that is used to build the visualizations in ServiceInsight. All of the messages in a single diagram share the same conversation id.

ServiceInsight Flow Diagram

You’ve always been able to control the conversation id by setting a custom conversation id strategy for the endpoint or setting the header manually for an outgoing message. Both of these techniques only work for the very first message—the one that starts the conversation. Once that first message is sent, you can’t alter the conversation id for any subsequent messages as that would break the concept of the logical conversation—including the visualizations in ServiceInsight.

There are some scenarios where it does make sense to start a new logical conversation:

Sagas are sometimes used as schedulers to trigger some recurring task. In this case, every instance of the recurring task would be part of the same giant never-ending conversation.
An integration handler could load data from a data store and initiate a process for each record. It would make sense to start new conversations rather than having all of these messages be a part of the same massive conversation.

Now we have a declarative API for starting a new conversation:

public async Task Handle(UpdatePricing message, IMessageHandlerContext context){    var pricingRecords = LoadPricingRecords();    foreach(var record in pricingRecords)    {        var sendOptions = new SendOptions();        sendOptions.StartNewConversation();        await context.Send(new UpdatePrice(record.ProductId), sendOptions);    }}

This API clearly shows the intent of starting a new message conversation, but it also allows us to record the original conversation id using a new header NServiceBus.PreviousConversationId. In the future, we may upgrade ServiceInsight to use this header to show cross-conversation links.

Summary

There are several other more minor improvements in NServiceBus version 7.4.0, mostly intended to guide you toward the pit of success by keeping you out of trouble, providing feedback through helpful exceptions, and making reporting more helpful for when things go wrong and you need to contact our support. You can find out all about them in the full release notes.

Our documentation has been updated for all of the new APIs. Where code snippets are used, you will see a new NServiceBus 7.4 option in the Switch Version dropdown button.

As always, NServiceBus version 7.4 is available on NuGet. Happy coding!

Message conventions determine what classes in an assembly are considered messages. ↩

The .NET Generic Host has become the de facto method of hosting applications built on .NET Core. To support the generic host, we released the NServiceBus.Extensions.Hosting package to make it simple to host NServiceBus within a .NET Core process or ASP.NET Core web application using the .UseNServiceBus(…) extension method.

While a massive improvement for easily hosting NServiceBus in a .NET Core process, a few rough edges remained. For version 1.1, we aimed to make it easier than ever.

Let’s have a look at what has changed.

Logging integration

Since the release of the NServiceBus.Extensions.Logging package, people have been asking how to use their logging framework of choice when hosting endpoints using the .NET Generic Host. This was possible before, but there was a division between the .NET Core logging and the NServiceBus logging that made things difficult.

If you wanted to use a specific logging library, you’d need to add a specific dependency for that library within the NServiceBus code. Additionally, if you wanted to set the logging level to DEBUG, you’d have to do that specifically within the NServiceBus APIs.

Now, any logging using the NServiceBus LogManager will be automatically forwarded to Microsoft.Extensions.Logging. So if you want to pick your own logging library, NServiceBus cooperates with whatever the generic host has set up. If you want to set the logging level, you do it the same as you would with any other .NET Core application. There’s no longer any need to add extra dependencies or specifically call NServiceBus APIs to handle things that are supposed to be the host’s job.

This also means that if you’re using the NServiceBus.Extensions.Hosting package, you don’t need to use the NServiceBus.Extensions.Logging package or the NServiceBus.MicrosoftLogging packages anymore. Those are only needed if you are self-hosting an endpoint without using the Generic Host. If you aren’t, you can safely remove both packages with the latest update, and everything will just work without additional dependencies.

Improved message session management for WebAPI and MVC

With the rise of the Generic Host, use of IWebHostBuilder is discouraged for ASP.NET Core 3.1 and higher.¹ We’ve seen many of our customers adopting the generic host to combine the power of NServiceBus with ASP.NET Core. A primary use case is to send messages by using the IMessageSession interface.

A typical WebAPI controller might look like this:

[ApiController][Route("")]public class SendMessageController : Controller{    IMessageSession messageSession;    public SendMessageController(IMessageSession messageSession)    {        this.messageSession = messageSession;    }    [HttpPost]    public async Task<string> Post()    {        var message = new MyMessage();        await messageSession.Send(message);        return "Message sent to endpoint";    }}

In order to get NServiceBus with ASP.NET Core working, you might use the following host builder configuration:

var host = Host.CreateDefaultBuilder()   .ConfigureWebHostDefaults(c => c.UseStartup<Startup>())   // Not ideal ordering!   .UseNServiceBus(context =>    {      var endpointConfiguration = new EndpointConfiguration("ASPNETCore.Sender");      return endpointConfiguration;   })

Unfortunately, the host builder is order-dependent. What that means for the above example is that because we have ConfigureWebHostDefaults declared before UseNServiceBus, the ASPNET Core pipeline is started first and can accept incoming HTTP requests before NServiceBus gets the chance to get started. With the previous version of the package, under those circumstances, the following exception was thrown:

System.InvalidOperationException: The message session can only be used after the endpoint is started.

This occurred as a race condition, based on whether or not incoming HTTP requests attempted to use NServiceBus before it was ready.

In addition to that, it was impossible to recover from this problem at runtime—you had to restart the service. With this update, we made the exception non-permanent, which means HTTP retries done by the client would eventually be able to get a grip on a valid message session once NServiceBus is started. This means that if your app is using the incorrect configuration, it won’t completely hang the entire process one day when requests happen to arrive in an unlucky order.

We also improved the exception message to help you resolve this problem by moving the order of the builder calls like:

var host = Host.CreateDefaultBuilder()   // NServiceBus gets properly configured before accepting web requests   .UseNServiceBus(context => {      var endpointConfiguration = new EndpointConfiguration("ASPNETCore.Sender");      return endpointConfiguration;   })   .ConfigureWebHostDefaults(c => c.UseStartup<Startup>())

When possible, it’s always preferable to make sure that NServiceBus start code precedes the configuration of the web host.

Improved message session management in hosted services

The generic host provides runtime extensibility via hosted services that enable any kind of meaningful work as part of starting the host. One common thing to do in the generic host is to register a hosted service which has access to the message session. The most obvious way to get access to the session is to inject the IMessageSession instance into the constructor.

class MyHostedService : IHostedService{   public HostedService(IMessageSession messageSession)   {     this.messageSession = messageSession;   }}

Unfortunately, this caused a very cryptic exception that didn’t really tell you how to resolve it:

System.InvalidOperationException: The message session can only be used after the endpoint is started.

The only way to access the message session in a hosted service was to inject the service provider and not resolve the session until the StartAsync method was called:

class MyHostedService : IHostedService{   public HostedService(IServiceProvider serviceProvider)   {     this.serviceProvider = serviceProvider;   }   public async Task StartAsync(CancellationToken cancellationToken)   {      var messageSession = serviceProvider.GetService<IMessageSession>();      await messageSession.Publish(new IAmStartedEvent());   }}

This is not exactly intuitive, and accessing the service provider directly is not considered best-practice.² With this release, we made accessing the message session work the way you’d expect it to do, without requiring access to the ServiceProvider. Finally, we can write the code we always wanted to in the first place:

class MyHostedService : IHostedService{   public HostedService(IMessageSession messageSession)   {     this.messageSession = messageSession; // It works!   }   public async Task StartAsync(CancellationToken cancellationToken)   {      await messageSession.Publish(new IAmStartedEvent());   }}

Now if you attempt to access the message session too early (for example, in the constructor of the hosted service) or get the order of registration wrong on the host builder configuration, the exception thrown will now show you how to resolve the problem with a more meaningful message:

The message session can’t be used before NServiceBus is started. Place UseNServiceBus() on the host builder before registering any hosted service (i.ex. services.AddHostedService<HostedServiceAccessingTheSession>()) or the web host configuration (i.ex. builder.ConfigureWebHostDefaults) should hosted services or controllers require access to the session.

No more painful debugging of ordering problems and no more access to the service provider. It will just work or tell you accordingly how to resolve it.

Unit of encapsulation prevents multi-hosting

By design, the generic host assumes it is the unit of encapsulation for the application or service that it hosts. All application resources applied to the same host instance share a single service collection, and therefore, a single service provider. The conclusion of these design assumptions is that a generic host can only host a single service of a specific type by default.

So, for example, you can host ASP.NET Core WebAPI together with NServiceBus, but it wouldn’t be possible to host multiple ASP.NET Core WebAPIs or NServiceBus instances as part of the generic host without doing advanced trickery like overriding controller detection for WebApi, or in case of NServiceBus, heavy customization of assembly scanning and more. What both NServiceBus and WebAPI have in common is that, by default, they assume they own all the assemblies found in the application domain for convenience and ease of use.

Still, once you have experienced the power of the .UseNServiceBus(…) method you might be tempted to use it to host multiple NServiceBus instances on the same host like this:

var host = Host.CreateDefaultBuilder()   .UseNServiceBus(hostBuilderContext =>   {         var endpointConfiguration = new EndpointConfiguration("MyEndpoint");         // ...         return endpointConfiguration;   })   // Once == good, twice == bad   .UseNServiceBus(hostBuilderContext =>   {         var endpointConfiguration = new EndpointConfiguration("MyOtherEndpoint");         // ...         return endpointConfiguration;   })

In previous versions of the extension, this could not be prevented and would create obstructive runtime behavior like the last endpoint configuration being used overriding the previous configuration.

With this release, a proper safeguard has been put into place which warns about the improper usage.

UseNServiceBus can only be used once on the same host instance because subsequent calls would override each other. […]

With the broad adoption of continuous deployment pipelines³ and service orchestration mechanisms⁴ we recommend only one NServiceBus instance per generic host. Multiple endpoints sharing the same generic host drastically increases configuration complexity and couples those endpoints together. For example if one endpoint has high CPU/memory consumption, the other hosted in the same process might starve.

These are the reasons that multiple usage of UseNServiceBus on the same host is prevented, and we are not planning to implement support for running multiple endpoints via this extension method.

Should you still have good reasons to host multiple endpoints in the same generic host, we recommend multiple generic hosts to achieve proper isolation. Keep in mind, you still need to configure assembly scanning very carefully. Our multi-hosting sample demonstrates how to host multiple endpoints using generic host instances.

Summary

With the changes in NServiceBus.Extensions.Hosting 1.1, the .NET Generic Host has become the easiest and best method of encapsulating and hosting an NServiceBus endpoint in a .NET Core process.

The improved logging integration ensures that your NServiceBus endpoint will defer to whatever logging is set up for the host without any additional dependencies. The improved session management ensures helps you ensure that NServiceBus and the web host are initialized in the correct order. Both API controllers and implementations of IHostedService can now easily access the NServiceBus IMessageSession to send messages. And finally, the safety check to prevent multi-hosting makes sure you don’t create an invalid configuration.

To see all this in action, check out our Using NServiceBus in an ASP.NET Core WebAPI Application sample.

Be sure to install our templates package to use NServiceBus.Extensions.Hosting in your own projects.

In the article on the ASP.NET Core Web Host, Microsoft states "This article covers the Web Host, which remains available only for backward compatibility. The Generic Host is recommended for all app types." ↩
Microsoft's Dependency Injection recommendations state "Avoid using the service locator pattern. For example, don't invoke GetService to obtain a service instance when you can use DI instead."Also, see Mark Seemann's post Service Locator is an Anti-Pattern. ↩
Such as GitHub Actions, Octopus Deploy, TeamCity, and dozens of others… ↩
Kubernetes, Service Fabric, Docker Compose, etc. ↩

Over the last several months we’ve been noticing that some of our customers were having trouble with the performance of some of their sagas. Invariably, the sagas that were at fault were examples of the scatter-gather pattern, and the ultimate culprit was contention at the storage level due to the optimistic concurrency strategy that was being used.

This is a fairly common pattern for sagas to implement, and we didn’t want our customers to have to change how they model their business process to get around a performance problem, so we decided to fix it.

In this article, I’ll explain what the scatter-gather pattern is, how a pure optimistic concurrency strategy created a problem, and what we’ve done to fix it for you.

What is scatter-gather?

For a simplified example, let’s say you’re running an email campaign. You need a business process to send 1,000 emails, and at the end, report back how many were sent successfully. At a fundamental level, you want two things: do a bunch of operations and aggregate the results of each operation into a total result.

Scattering is easy. A message-based system like NServiceBus naturally excels at the scatter part:

public async Task Handle(SendEmailBatch message, IMessageHandlerContext context){    foreach(var emailAddress in message.Addresses)    {        this.Data.MessagesSent = 0;        this.Data.TotalMessages = message.Addresses.Count;        await context.Send(new SendOneEmail        {            EmailAddress = emailAddress,            Contents = message.Contents        });    }}

Instead of calling out directly to the email server for each email in sequence, which runs the risk of a failure in the middle, we send a message for each email and let those messages be handled one-by-one elsewhere.

The problem comes when we want to aggregate the responses as they come in over time.

The gather portion looks like this:

public async Task Handle(SendOneEmailReply message, IMessageHandlerContext context){    this.Data.MessagesSent++;    if(this.Data.MessagesSent == messages.TotalMessages)    {        await context.Publish(new EmailBatchCompleted());        this.MarkAsComplete();    }}

This works great until you have hundreds of SendOneEmailReply messages all competing to update MessagesSent at the same time.

If you have to update the saga state after each message, you need some way to make sure that multiple messages don’t update the state simultaneously. Our sagas use optimistic concurrency control for the saga instance to handle this. If you get two responses simultaneously, they will both try to update the saga, but as they try to persist the new state, only one will be allowed to do so. The other will detect a change and roll back its changes.

In the example of two messages, this is trivial. The first message completes, and the second rolls back. The second message immediately retries and completes successfully. All done.

An issue presents itself as the number of concurrent responses grows, though. Let us say you got 500 responses back simultaneously instead. In this high-load scenario, only one of them would succeed in updating the saga state—the other 499 would have to retry. Then another completes, forcing the additional 498 to retry again. And so on, until all of the responses got a chance to update the saga state with an incremented count.

In theory, the last response could be forced to retry many times in the most unlucky scenario. In practice, your retry policy details would dictate the exact outcome in a real-world scenario. However, any reasonable retry policy would result in lots of retries scheduled, and a massive number of messages moved to the error queue as their allowed number of retries was exhausted.

As you increase the number of scatter messages, you get more responses. More responses lead to more gather collisions. More collisions lead to more rollbacks and retries. More retries and rollbacks put pressure on the messaging and saga persistence infrastructure. The result is increased load, decreased message throughput, and lots of messages in the error queue. Not a great situation overall.

When being pessimistic is good

Where optimistic concurrency is letting everyone through the door and hoping a fight won’t break out inside, pessimistic concurrency is like a bouncer making sure only one person is allowed access until their business is concluded.

With pessimistic locking, you start by locking an entity for exclusive use until you finish. At that point, business data updates and the lock releases at the same time. Since the lock is done upfront and is exclusive, no one else can lock it and start their work until they can claim it themselves.

In our scatter-gather saga, this means that once one message locks the saga, all the other messages have to wait—they can’t even get in the front door. No more collisions, no more rollbacks, no storm of retries, and much less stress on the messaging and persistence infrastructure.

The drawback of pessimistic locking is the locks carry a cost in the database. In some database architecures, the database can decide to optimize too many locks by combining them. For example, in SQL Server, a row lock can be escalated to a page lock or full table lock, effectively locking many more things than needed in the name of resource optimization.

Optimistic locking is preferable when you expect very few write collisions since the average case, a non-collision, costs very little with no up-front locking necessary. But in optimistic locking, the cost of a collision is high. This is why sagas were originally designed to use optimistic locking.

So the more contention you have, the more you should favor pessimistic locking, as the fixed cost of the upfront lock is small compared to dealing with floods of rollbacks and retries.

Since the gather part of the scatter-gather pattern is high contention by its nature, implementing it using a pessimistic locking scheme makes sense.

What we’ve done

To better support scatter-gather saga implementations, we’ve changed most of our persistence options to use pessimistic locks for saga updates.

SQL Persistence uses a combination of both optimistic concurrency and pessimistic locking in version 4.1.1 and higher. Because of slight differences in lock implementations of the supported database engines, we can not trust a pessimistic lock alone. We use optimistic concurrency for correctness, but also add a pessimistic lock to avoid the retry storms and improve the performance in high-contention scenarios. SQL Persistence does not support switching to optimistic concurrency only.
NHibernate Persistence has been using the same “optimistic for correctness, pessimistic for performance” combination as mentioned above since version 4.1.0. We think this is a sensible default for most scenarios, but it is possible to switch to pure optimistic concurrency by adjusting the lock mode.
MongoDB Persistence uses pessimistic locking by default in versions 2.2.0 and above. It does not support switching to optimistic concurrency only.
Service Fabric Persistence uses pessimistic locking (called an exclusive lock) by default in versions 2.2 and above. It does not support switching to optimistic concurrency only.

For all of these packages, you only need to update to the most recent version to get the benefits of pessimistic locking in your scatter-gather sagas.

RavenDB Persistence also offers pessimistic locking in versions 6.4 and above. However, because RavenDB does not provide a pessimistic lock option natively, you must currently configure the persister to use pessimistic locking instead of the default optimistic concurrency control for sagas. We plan to default to pessimistic locking for RavenDB in a future version, but for now, you must enable it explicitly:

var persistence = endpointConfiguration.UsePersistence<RavenDBPersistence>();var sagasConfig = persistence.Sagas();sagasConfig.UsePessimisticLocking();

You also have the option of tweaking locking options for RavenDB like the lease time and acquisition timeout. See our docs section on sagas pessimistic locking for details.

Keep in mind

Whether or not a saga employs the scatter-gather pattern, and no matter what locking strategy it uses, it’s still important to design the saga to minimize locking.

A saga is a message-driven state machine, and so any saga handler should focus on using the data in the incoming message together with the stored saga state to make decisions (the business logic) and then emit messages: either by sending commands, publishing events, or requesting timeouts.

A well-designed saga should not access another database, call a web service, or do anything else external to the saga other than emitting messages. Doing so causes the lock on the saga data to remain open for a longer period of time, raising the likelihood that another message needing to access the same saga will cause contention.

If these types of things need to occur, send a message to an external message handler, where there is no lock held open for the saga data. That handler can do the work and then return the results to the saga using another message.

Summary

While it is possible to address the performance of scatter-gather sagas while still using optimistic locking (such as dividing up a large batch into smaller batches first) we wanted to provide an experience to our customers that would Just Work® without causing performance bottlenecks.

To get started with the new pessimistic locking, you probably only need to update your persister. See the links for each persistence in the section above.

If you want to get more of a general overview, the saga concurrency article in our documentation is a good starting place.

And if you would like to learn more about how to model long-running business processes using sagas, check out our saga tutorials.

A common problem with most distributed systems is that it’s hard to get a good view of how all the disconnected pieces work together—unless you’re using ServiceInsight. From the audit records of successfully processed messages, ServiceInsight shows how messages are related and gives you visualizations so you can better understand how one message results in another, and how the different parts of the system communicate.

ServiceInsight 2.2 is out now, with plenty of improvements. This release adds new insights (pun intended) to your distributed systems.

More message metadata

In the last few releases we’ve gradually added more message metadata to ServiceInsight including Correlation ID, Critical Time, Delivery Time, Processing Ended, and Processing Started. In this release we’ve continued that trend and added Retries. This is the number of times a message has been retried, regardless of whether it has yet been successfully processed.

To keep the UI uncluttered, all this new metadata is hidden by default. To show it, right click the Messages window to open the Column Chooser:

Column Chooser

More insights

We’ve made it even easier to understand the behavior of your distributed system and to detect potential problems. Look out for the yellow warnings!

Clock drift

Clock drift is when the clocks on the machines running your endpoints are not synchronized. If the clocks have drifted significantly, you may end up seeing a message which was processed before it was sent! This results in negative Delivery Time, and usually in negative Critical Time. In these cases, yellow warning icons are shown:

Clock Drift

You can avoid this for future messages by synchronizing the clocks on all the machines running your endpoints, using a technology such as NTP.

Successful retries

In previous versions of ServiceInsight, when retries were required to successfully process specific messages, those messages were displayed with standard green success icons. In this release, those icons have a warning overlay:

Succesful retry

Together with the new Retries metadata, you can now see which messages required retries before they were successfully processed, and how many retries were required. If you spot repeated patterns of these transient failures, it may be an opportunity to avoid them and optimize the performance of your system.

This does not apply to messages which were edited and retried in ServicePulse. Those messages are effectively new messages, and not retries of the original messages.

Sequence diagram

The sequence diagram shows a timeline of all messages in a conversation and how they were processed. This helps you understand the sequence of operations which led to specific outcome, which is often a challenge with a distributed system.

Endpoints are often scaled out, which means a single logical endpoint may consist of multiple physical endpoints. For example, two machines may each be running an instance of the same physical endpoint and consuming messages from the same queue in a competing-consumer pattern. These two physical endpoints act as a single logical endpoint.

In previous versions of ServiceInsight, each physical endpoint was shown separately:

Old view of scaled-out endpoints

In the screenshot above, notice how two physical Sales endpoints are shown, even though there is only one logical Sales endpoint. In a high-level view like the sequence diagram, this level of detail is usually not required, and the disconnected nature of the diagram may even lead to confusion. In this release, only the logical endpoints are shown. If you still need to know which physical endpoint handled a message, that is shown as a tooltip:

New view of scaled-out endpoints

Note that a ServiceInsight sequence diagram is slightly different from a standard UML sequence diagram, so we’ve added the Sequence Diagram Help button to explain the differences.

But wait… there’s more!

The list of improvements goes on…

When you click Open in ServiceInsight in ServicePulse, you will switch to an existing ServiceInsight instead of always launching a new one. The existing ServiceInsight will simply switch to the same instance of ServiceControl that ServicePulse is connected to.

We’ve also made some tweaks and enhancements to the Saga view.

And the cherry on top: we’ve switched to vector graphics, to ensure ServiceInsight looks crisp and beautiful on high-DPI monitors.

Summary

ServiceInsight 2.2 introduces a number of improvements that help ensure 20/20 vision of your system.

If you like what we’ve done, or want to suggest other improvements, we’d love to hear more. Just click the Feedback button in ServiceInsight and drop us a line:

Feedback button

Our release announcement contains the full list of the changes in this release.

As always, you can install the latest version of ServiceInsight from our website.