Troubleshooting Tools for Microservices Architecture

Example
Running Application
Conclusion

One of the biggest challenges when transitioning to a microservices architecture is troubleshooting and debugging. When the number of microservices grows, a simple HTTP request can hit dozens of applications, and in case something goes wrong or the performance is not as good as expected, it might be quite tricky to know where the issue is.

Logging and instrumentation are very important tools to understand what is going on in a microservices architecture.

Logging: it is used to show what is happening inside your application (errors, exceptions, timeouts, HTTP status…).
Instrumentation: it should be used mainly to capture Request rate (count), Error rate (count), and Duration of requests.

One of the most famous tools to capture and process logs is Splunk. Basically, this tool allows us to centralize and manage application logs. It also provides metrics, generates reports and alerts for a particular search, and offers a friendly user interface.

On the other hand, for instrumentation we can use Zipkin, which provides a distributed tracing system, and will help you troubleshoot latency issues.

In order to understand how these tools can help us, we are going to build a Spring Boot application which will be fully integrated with Zipkin through Spring Cloud Sleuth.

Example

This web application will only contain an endpoint which will log a simple string with Slf4j.

@Slf4j
@RestController
@SpringBootApplication
public class SpringBootZipkinSleuthSplunkApplication {
    public static void main(String[] args) {
        SpringApplication.run(SpringBootZipkinSleuthSplunkApplication.class, args);
    }
    @GetMapping
    public String helloWorld() {
        log.info("Hello World Endpoint");
        return "Hello World!";
    }
}

The application.properties file will contain the configuration for Sleuth and Zipkin.

spring.application.name=spring-boot-zipkin-sleuth-splunk
spring.sleuth.sampler.probability=1.0
spring.zipkin.service.name=spring-boot-zipkin-sleuth-splunk
spring.zipkin.base-url=http://zipkin:9411

Note: In order to explicitly provide a different service name for all spans coming from your application we can set spring.zipkin.service.name to the desired name.

spring.sleuth.sampler.probability value is 1, which is 100% (default: 0.1, which is 10 percent)

localhost cannot be used in this example, since the Spring Boot application will run inside a Docker container, so we have to use the name specified for Zipkin on our docker compose file (http://zipkin:9411)

Sleuth sends its tracing data to Zipkin by default, if the following dependency is added to your project.

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>

This library is also responsible for adding the trace IDs and store them locally in order to continue the trace.

A Logback configuration file is also necessary in order to visualize the traces ID in Splunk.

<configuration>
    <include resource="org/springframework/boot/logging/logback/base.xml"/>
    ​
    <springProperty scope="context" name="appName" source="spring.application.name"/>
    ​
    <appender name="logstash" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>${LOG_FILE}.json</file>
        <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
            <fileNamePattern>${LOG_FILE}.json.%d{yyyy-MM-dd}.gz</fileNamePattern>
            <maxHistory>7</maxHistory>
        </rollingPolicy>
        <encoder class="net.logstash.logback.encoder.LoggingEventCompositeJsonEncoder">
            <providers>
                <timestamp>
                    <timeZone>UTC</timeZone>
                </timestamp>
                <pattern>
                    <pattern>
                        {
                        "severity": "%level",
                        "service": "${appName:-}",
                        "trace": "%X{X-B3-TraceId:-}",
                        "span": "%X{X-B3-SpanId:-}",
                        "parent": "%X{X-B3-ParentSpanId:-}",
                        "thread": "%thread",
                        "class": "%logger{40}",
                        "message": "%message"
                        }
                    </pattern>
                </pattern>
            </providers>
        </encoder>
    </appender>
    ​
    <root level="INFO">
        <appender-ref ref="logstash"/>
    </root>
</configuration>

Zipkin defines three kind of ids:

Trace Id: ID shared by every span in a trace
Span Id: ID of a particular span which might be the same as the trace id if only one service is involved during the tracing.
Parent Id: ID present on child spans to know which was the previous span ID. If the span does not have a parent id is considered the root of the trace.

Finally, here is the docker-compose.yml that powers the whole setup.

version: '2'
services:
  app:
    image: com.thedeveloperhive/spring-boot-zipkin-sleuth-splunk
    environment:
      - LOGGING_FILE=/logs/app.log
    ports:
      - '8080:8080'
    volumes:
      - log_volume:/logs
  splunk:
    image: splunk/splunk
    hostname: splunk
    environment:
      - SPLUNK_START_ARGS=--accept-license
      - SPLUNK_ENABLE_LISTEN=9997
      - SPLUNK_PASSWORD=password
    ports:
      - '8000:8000'
  splunkforwarder:
    image: splunk/universalforwarder:6.5.3-monitor
    hostname: splunkforwarder
    environment:
      - SPLUNK_START_ARGS=--accept-license --answer-yes
      - SPLUNK_FORWARD_SERVER=splunk:9997
      - SPLUNK_ADD=monitor /logs
      - SPLUNK_PASSWORD=password
    restart: always
    depends_on:
      - splunk
    volumes:
      - log_volume:/logs
  zipkin:
    image: openzipkin/zipkin
    ports:
      - '9411:9411'
volumes:
  log_volume:

Note: Splunk works on the client-server model. Splunk Forwarder is used to collect the machine generated data from client side and forward to Splunk server.

Running Application

Build application and run docker compose:

mvn clean install
docker-compose up

Note: A Docker image will be built during the Maven install stage, since the Spotify docker-maven-plugin was added on the pom file.

If we hit the web service at http://localhost:8080 a log entry will be generated and forwarded to Splunk and Zipkin.

app_1 | 2019-03-22 21:55:35.647  INFO [-,73d169f5b5e76599,73d169f5b5e76599,false] 1 --- [or-http-epoll-3] .SpringBootZipkinSleuthSplunkApplication : Hello World Endpoint

As we can see the trace ID is the same as the span ID, since only one service was involved during the tracing, and there is no parent ID.

Zipkin interface is available at http://localhost:9411

Now login into Splunk web console (http://localhost:8000) and search for “73d169f5b5e76599”.

Conclusion

Both logging and instrumentation are essential in any enterprise microservices architecture, and tools like Splunk and Zipkin can be excellent allies to act fast and precisely when issues arise.

Source Code

Sergio Martin Rubio

Troubleshooting Tools for Microservices Architecture

Example

Running Application

Conclusion

Recent Articles