0

I'm trying to wrap my head around this article on OSS service discovery and I'm having a tough time seeing the forest through the trees.

In that article, the author poses the main problem for service discovery:

The problem seems simple at first: How do clients determine the IP and port for a service that exist on multiple hosts?

But I'm not even sure what this means/implies. When talking about service discovery, what exactly are we talking about here? Is this like, if we want to connect to a database, we might need a host and port number defined somewhere, like:

host=mydatabase01.example.com
port=9300

??? Is this what the author is talking about, and is what is implied by "service discovery"?

If so, wouldn't the (obvious) solution be to just put a web service in front of everything? That way, clients don't care about what specific host/port they're connecting to, they just make RESTful calls to, say, http://my-data-service.example.com.

It can't be this simple, as that article goes on to talk about things like ZooKeeper and Eureka, which seem to be very complex beasts. I'm clearly missing something; if so, can someone provide a specific, concrete use case of what is meant when we talk about "service discovery"?

herpylderp
  • 2,027
  • 3
  • 21
  • 30
  • 1
    I don't have the time to read through the link right now, but I imagine that with OSS anyone can set up a server and there is no central place to know to look for services. Compare this with something like Windows Update or an MMO game server where there are hard-coded, central server IP addresses. Unfortunately only the author can definitively explain what he means. –  Aug 14 '14 at 18:28
  • Thanks @Snowman (+1) - I guess I'm just fundamentally not understanding the problem. You say "...I imagine that with OSS anyone can set up a server and there is no central place to know where to look for services." (1) Well, what kind of server would be setup? (2) What services? Can you give a concrete example here? Thanks again! – herpylderp Aug 14 '14 at 23:37
  • I had time to take a quick read through the article and provide a better response. I actually studied this in my graduate operating systems class a few years ago and had to write a simulation of this in a program. It was interesting in the way that only a CS major could love. –  Aug 15 '14 at 01:08

2 Answers2

4

This article is discussing distributed services. A popular example is a peer-to-peer (P2P) file sharing network. Nodes connect to a swarm to share data and may drop out at any time.

The article discusses several open-source "registries" which allow clients to connect to the network and announce their presence so they may use the services on the network. Potentially, depending on the application, the client itself may become a network resource as well. In this context, "network" means a finite number of client systems that are communicating. They may be on different physical networks: this is similar to a P2P distributed system. They may on the same physical network: this is similar to a corporate intranet.

How do these clients know about each other? If I sit at my desk and say "I would really like to print to printer X, or share files with computer Y" how do I get from staring at my cup of coffee to actually performing those actions? In a traditional corporate environment there will be network domain servers that handle this. Maybe MIS tells me an IP address. But what if those systems are not on a corporate or home network? Somewhere out there, in the ephemeral Internet, is a computer to which I want to connect. This is the problem the author is trying to solve: how do I find and connect to those clients?

Service discovery may occur in one of several ways, and this is by no means an exhaustive list. Each protocol is different, and new protocols are invented on a regular basis (check the ACM Digital Library if it is available to you, there is a lot of info on this in there)

  • Rely on a central server to manage connections and disconnections. This is similar to a P2P tracker file or Kerberos, and is close to the example you provided in the question.
  • Broadcast on a subnet looking at a specific port. If you use SMB or CIFS for home file sharing, it is similar to this.
  • Preconfigure other client IP addresses to talk to. Those clients may give you other clients that they are attached to.
  • Rely on another protocol such as DNS to provide IP addresses.

Once connected to such a network, there is a wealth of follow-on questions that are beyond the scope of this question, but if you find this interesting, you might want to dig into. Check out distributed computing.

References

Here are some of the papers I read that might help understand this topic a bit more. Note that you will need access to the ACM Digital Library:

Managing update conflicts in Bayou, a weakly connected replicated storage system

Disconnected Operation in the Coda File System

Flexible, Wide-Area Storage for Distributed Systems with WheelFS

  • Thanks again @Snowman (+1) - I appreciate you taking the time to circle back on this. However, I think you're assuming that I understand something very fundamental that I do in fact not understand! You say that the article "...discusses several open source registries, which allow client to connect to the network and announce their presence..." What I'm saying is: the "network" would have to be represented by some server or cluster, but nevertheless a URL (http://my-data-service) to connect to. – herpylderp Aug 15 '14 at 01:16
  • So to me, I don't understand the value of these registry services because I don't even see the problem that they claim they are trying to solve! And as for services announcing their presence, this could be accomplished via the client connecting to the server (http://my-data-service) and then the server maintaining an in-memory map of all clients connected to it. – herpylderp Aug 15 '14 at 01:17
  • What if that centralized server goes down? What if DNS is offline? Distributed computing tries to solve problems such as these. Of course, you still need to know about the other clients, wherever they are. Maybe a client is in the office next to you. Maybe on the other side of the world. How do we get these clients to hook up and share data? THAT is the problem the article is discussing. –  Aug 15 '14 at 01:21
  • 1
    I updated my answer again, let me know if that helps. –  Aug 15 '14 at 01:25
  • Ahhhhh lightbulb. Thanks for the perserverance @Snowman (+1 again). For some reason I was stuck in a mental "client/server" rut and was glazing over your repeated use of "peer-to-peer". Thank you! – herpylderp Aug 15 '14 at 12:54
  • Glad I could help. Distributed computing is still not quite as mainstream as more traditional concepts, and some of the algorithms can have a steep learning curve. Not that this is necessarily a difficult concept, just an unfamiliar one to most. I admit that until my graduate operating systems class this did not make a lot of sense to me, either. Once I had to implement a simulation of a distributed file system it all made sense. –  Aug 15 '14 at 13:03
2

I'm the author of the article.

The context for the post is really around distributed systems and specifically service oriented architectures (SOA). The solutions are usually used at larger software as a service (SaaS) providers where they have many backend services that are used to provide their service offering.

As an example, many SaaS providers have a way to login to their system. In a SOA, you might have an authentication service in the backend that handles login requests. It's fairly common to have a web layer in front of those backend services that actually serves up the login page and handles the login HTTP requests. That layer would delegate the login request to the authentication service. It may also delegate other functions to other services on the backend. Those backends might provide an HTTP based API or something else but it's commonly a service spread across multiple hosts.

Why can't you put a web service in front of everything?

You can and that's commonly done. The service discovery aspect comes in when trying to keep http://my-data-service.example.com pointing to the right hosts when services are down, failing, being upgraded, scaled up, etc.

If my-data-service.example.com is just a round-robin DNS entry, and you have three instances providing the service and one goes down, you'll have some failed requests while that host is brought back online. With DNS, you also have TTLs to consider so clients that have cached those entries will continue to try the downed host until the TTLs expire and it refreshes. If you are adding hosts, you'll have to wait for the TTLs to expire as well before it starts to service requests.

An alternative is to point my-data-service-example.com to a load balancer or have your client applications implement some load balancing themselves.

This presents a new problem:

How do you keep the backend hosts configured in your load balancer up to date?

In an environment like AWS, hosts can be brought up and down frequently and their IPs can change. If you are using docker, IPs and ports are usually different when a new container is started. Trying to keep this configured manually is usually not possible in these kinds of environments... especially when you have multiple services and hundreds or thousands of hosts.

To keep a load balancer automatically up to date, you need some form of dynamic service discovery.

This usually entails having:

  1. A registry to keep track of what is up/down, it's locations, etc.,
  2. A registration process to register service locations when they are online
  3. A discovery process to discover services and keep routing information up to date.

The original article describes how different companies have implemented those components in different ways. There are many other ways to do it as well.

For a more concrete example, I wrote another post showing one way do service discovery with docker using etcd and haproxy. That might be helpful to understand the context of the article.