Logging the cloud with SimpleDB

by Gloria Quintanilla in


In a previous article, our chief architect Marcel Panse talked you through minimizing downtime on Amazon AWS. We showed how to deploy an application to a bunch of servers without using SSH or Windows Remote Desktop. This awesome approach saves us lots of time and money, but there might be another area we have yet to explore and optimize.

Dude, where are my logs?

Let assume you run an elastic cluster of EC2 nodes. Something bad happens, and boom, a server dies. Of course, you prepared for the worst, but I bet you would like to know what the hell happened. However, in an elastic infrastructure, you might not even know on which server the error occurred.

Now what? Normally, you would need to check the logs for those ugly stack traces. To view the server logs, you would have to log into the remote server. If you run a large number of servers, you might even have to check all of them to find the right stack trace. That is a lot of work.

A common approach is to use an SMTP appender. This appender will email you a little report for every error in the log, immediately when the bad stuff occurs. Usually, this really draws the attention of the programmers, but it lacks critical information. You would still need to log in to gather more information. To analyse the problem you would need to read the rest of the log file and find out what happened before the problem occurred.

So far, so good. However, servers on AWS are not persistent. Servers also tend to crash, even those at Amazon. When such a server instance dies or gets terminated on purpose, you will lose all of your logging from that instance from the Beginning of Time. So, you really need to store your logs somewhere more permanent.

Sweet! They are in the cloud.

The solution? Store logs in SimpleDB. Amazon SimpleDB is a highly available, flexible and scalable non-relational data store. It is perfect for this situation. It is eventually consistent, write-optimized, highly available and extremely durable. It can handle extremely large tables that can keep the logging data very well. It can query and filter logs. Oh, and it is also really cheap.

Logging abstraction

I switched from Log4j to Slf4j. Slf4j is a simple logging facade. It serves as a simple abstraction between various logging frameworks, like Commons-logging, Log4j and Logback. To switch, you need to remove all dependencies on all logging frameworks first and add the Slf4j jars. Then, you should add the logging engine of your choice. I used Logback. Finally, add the bridges to make it all work.

To wrap it all up:

  • use Logback as logging engine;
  • use Jcl-over-Slf4j for Spring;
  • use Log4j-over-Slf4j for Cobertura.

Logging engine

Logback is intended as a successor to the popular Log4j project, picking it up where Log4j stopped. It is build by the same people who build Log4j. It has some very interesting improvements over Log4j: it is faster and boasts more appenders, filters and conditional processing. We also have a SimpleDB appender for it. The SimpleDB appender logs everything to a SimpleDB table. We can also add some extra information to it like host information, IP address and application version. Lets look at the configuration of our logback.xml.

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
	<property file="${catalina.home}/conf/platform/deployment.properties" />

	<contextName>printcloud</contextName>

 	<appender name="roll" class="ch.qos.logback.core.rolling.RollingFileAppender">
		<file>${catalina.home}/logs/tomcat.log</file>
		<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
  			<fileNamePattern>tomcat.%d{yyyy-MM-dd}.%i.log</fileNamePattern>
  			<maxHistory>30</maxHistory>
  			<timeBasedFileNamingAndTriggeringPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP">
			<maxFileSize>100MB</maxFileSize>
  			</timeBasedFileNamingAndTriggeringPolicy>
		</rollingPolicy>
  		<encoder>
			<pattern>%date %contextName %level [%thread] %logger{10} [%file:%line] %msg%n</pattern>
  		</encoder>
	</appender>


	<appender name="stdout" class="ch.qos.logback.core.ConsoleAppender">
 		<encoder>
			<pattern>%d %contextName [%thread] %-5level %logger{36} - %msg%n</pattern>
  		</encoder>
	</appender>
	
	<appender name="simpledb" class="com.kikini.logging.simpledb.SimpleDBAppender">
  		<domainName>peecho_logging</domainName>
  		<accessId>ENTER_ACCESS_KEY</accessId>
  		<secretKey>ENTER_SECRET_KEY</secretKey>
  		<server>sdb.eu-west-1.amazonaws.com</server>
  		<componentName>printcloud (${system.version})</componentName>
 		<host>${HOSTNAME}</host>
	</appender>
	
	<root level="info">
  		<appender-ref ref="roll"/>
  		<appender-ref ref="stdout" />
  		<appender-ref ref="simpledb" />
	</root>
</configuration>

This configuration logs to stdout, a file and SimpleDB. To log to file, we used the RollingFileAppender, which creates a new file everyday with a maximum of 100 MB per file and keeps history for 30 days. It automatically cleans up old log files. You can also specify a different configuration file for (junit-)test, which in our case will only log to stdout. The Simpledb appender has a property 'server' which is not supported in the Simpledb-appender project, but was needed to switch the host to our European region. You can use the original project if you are located in the US, otherwise use this custom download - or patch it yourself.

Querying Simpledb

@Override
public List<LogRow> getLogging(String hostname) {
  List<LogRow> result = new ArrayList<LogRow>();
  String findExpression = "select * from `" + domain + "` where `host` = '" + hostname + "' and `time` != '' order by time desc limit 500";
  SelectRequest selectRequest = new SelectRequest(findExpression, false);
  for (Item item : simpleDb.select(selectRequest).getItems()) {
	List<Attribute> allAttributes = item.getAttributes();
	Map<String, String> attributes = new HashMap<String, String>();
	for (Attribute attr : allAttributes) {
  		attributes.put(attr.getName(), attr.getValue());
	}
	LogRow logRow = new LogRow(attributes);
	result.add(logRow);
  }
  LOGGER.info("Found " + result.size() + " log records.");
  return result;
}

LogRow.java:

public class LogRow {
  private String msg;
  private String host;
  private String component;
  private String level;
  private String time;

  public LogRow(Map<String, String> attributes) {
	this.msg = attributes.get("msg");
	this.host = attributes.get("host");
	this.component = attributes.get("component");
	this.level = attributes.get("level");
	this.time = attributes.get("time");
  }

  public String getFull() { 
	DateTime dateTime = new DateTime(time);
	DateTimeFormatter fmt = DateTimeFormat.forPattern("dd-MM-yyyy HH:mm:ss");
	return String.format("%s %s %s - %s", dateTime.toString(fmt), level, component, msg);
  }

  // getters and setters
}

This piece of code queries the log table by host name. You can use the LogRows to print it to your HTML page. Another fancy thing you can do is to add filtering by level to the query (info/warn/error). You can also show the logs aggregated instead of filtered by host name. You can query SimpleDB as you like to search the logs.

Happy logging!

So, log in SimpleDB. At all times, you can view all logs from all instances without the need to log in, even when the instance was terminated. Spent less time on managing your infrastructure - and more time on the stuff that really matters.