The case of slowly dropping recharge rate

1-Recharge-Nov1n2-annotated

This shows recharge rates on Nov 1 and Nov 2.

Since 1:30pm on Nov 2, we began to notice a slow decrease in the recharge rate. The normal pattern we observe almost daily is a steady drop in the afternoon hours, before recharges begin exponentially picking up pace again in the early evening – but this looked different and the decrease rate continued through 5:30pm, which was odd. It was not a sudden drop, so any system change seemed unlikely to have caused this. A look at the HTTP connections graph also showed the same story. The actual traffic to the site was slowly dropping since 1:30pm. Maybe it was because of Diwali. But still, we were not satisfied with the lack of a singular culprit.

I could not find anything obviously wrong with our systems and got in touch with Chandrashekhar (our devops guru). He also agreed that the shape of the chart looked suspicious.

After ho-humming about it for a while, CNB asked me to turn on the TV and watch the India-Australia cricket match; I acquiesced, and watched the last two overs of the superb Indian innings. Around 10 minutes after the Indian innings got over, CNB tells me to look at the chart again. A noticeable spike – in recharge rate and traffic as well!

Take a look at the second chart.

2-Recharge-2Nov-11am-6pm-annotated

The correlation was just too much to ignore. When India bats, we experience lower recharge rates. Mind = blown!

Call alerts with KooKoo

Making sure all systems are working fine round the clock is very important for us. We use the popular monitoring solution, Nagios, to do the job of alerting us when things are not quite ok. Now, configuring Nagios with email alerts is pretty simple and we set it up with that.

But sometimes, email alerts are simply not good enough – say – a server is experiencing low memory situation in the night. The solution is to have Nagios call up a telephone number for critical alerts. This is where KooKoo comes in.

KooKoo has a web based API for call control. Although most of their services are aimed towards incoming calls, they do have a simple outgoing call feature as well. We wrote a quick shell script – “kookoo_call.sh” which takes the phone number and the message to be delivered. The real task it does is to simply make an HTTP request:


wget --quiet --timeout=10 -t 1 -O /tmp/kookoo_call.$$.out "http://www.kookoo.in/outbound/outbound.php?phone_no=$PHONENUM&api_key=XXX&extra_data=$MESSAGE ... repeating message ... $MESSAGE"

KooKoo uses a decent Text-to-Speech engine which generates the message on the phone call. Still, repeating the message does not hurt – helps you to rub your eyes and become sane enough to understand what is being said :-)

Next, use this script as a Nagios alert command:

/path/to/kookoo_call.sh -p $CONTACTNUMBER$
-m "Hello Nagios alert $NOTIFICATIONTYPE$ Host $HOSTALIAS$ Service $SERVICEDESC$ is $SERVICESTATE$"

Voila! We now get a phone call on critical system alerts. Of course, we still have to make sure our on-call mobile phone is charged – but that’s another story :-P

Exploiting Spring MVC interceptors

Interceptors are a pretty nifty feature in Spring MVC and we mix it with annotations to pull off some cool things in our app that keeps our code neat and tidy.

Let us take the case of protecting your actions with CSRF tokens. One approach that you can take is, in every controller, the first thing that you do is CSRF token matching and sending a forbidden response if the tokens do not match. Being the technically savvy reader that you are, I am sure you see the problem with this approach. Your CSRF related code is littered all over the place and duplicated in each and every controller action. So how can we stick to the DRY principle and keep our code sane? How about moving that logic to a Spring MVC interceptor and configuring every request that hits the app to be intercepted by this? Now we have moved our whole CSRF processing to a single place, but there is a problem here. Say, you do not want some controller action to be checked for a CSRF token, how do you achieve this? Annotations to the rescue. Let us define a custom annotation called Csrf with an attribute called exclude. In the interceptor, we get the target controller method, check whether there is an annotation defined for the method and if there is an annotation, we skip the CSRF token check.

Interceptor code, HandlerMethod is the target method of the controller:

public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
    //To make sure this is a spring controller request
    if (handler instanceof HandlerMethod) {
        if (shouldCheck(request, (HandlerMethod)handler)) {
            String sessionToken = CSRFTokenManager.getTokenFromSession();
            String requestToken = csrfTokenManager.getCSRFToken(request);
            if (sessionToken.equals(requestToken)) {
                return true;
            } else {
                response.sendError(HttpServletResponse.SC_FORBIDDEN, "Bad or missing CSRF value");
                return false;
            }
        }
    }

    return true;
}

private boolean shouldCheck(HttpServletRequest httpServletRequest, HandlerMethod handlerMethod) {
    Csrf csrf = handlerMethod.getMethodAnnotation(Csrf.class);

    if (csrf != null) {
        return !csrf.exclude();
    } else {
        if ("POST".equals(httpServletRequest.getMethod())) {
            return true;
        }

        if ("GET".equals(httpServletRequest.getMethod())) {
            return false;
        }
    }

    return false;
}

Csrf annotation class:

@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface Csrf {
    boolean exclude() default false;
}

An example controller with annotation usage:

	@Csrf(exclude=true)
	@RequestMapping(value = "foo", method = RequestMethod.POST)
	public ModelAndView foo(Model model,
			HttpServletRequest request, HttpServletResponse response) {

We use a similar pattern for our login validation also, will expound on this in a separate post. Are you looking forward to hacking on cool things like this? Get in touch with us at devgigs@freecharge.com, we are always looking for curious people who want to build beautiful products.

Consumer framework

We do our recharges through third party apis. The way it used to work was, whenever we get a request for recharge, we invoke the recharge api, wait for the response and based on the response, let the end user know whether the recharge is successful or not. Any technically astute reader will identify the problem with this approach, we are doing blocking third party requests within an HTTP request response cycle. One of the major manifestation of this is that we are holding our site hostage to the response time of third party apis. To alleviate this, recently we moved to a queue based system, where we put the request into a queue and a consumer picks it up and does the recharge.

We have been using kestrel for quite sometime for other asynchronous processes and it has worked very well for us. Hence, we moved our recharges also into a kestrel queuing system. On the consumer side, we were using camel which is an EAI framework. We ran into some rough weather with camel:

  • While queuing, if there was an exception, it would silently eat it. We tried different mechanisms as instructed in camel documentation, but none worked for us.
  • We could not figure out a way to gracefully shutdown a camel consumer. This was critical to us going by the no of deployments that we do in a day.
  • Monitoring of consumers so that we could plug in our nagios alerting system into it.
  • Kestrel has a nice mechanism where in we can claim ownership of an item once we take it out from the queue. Camel did not expose this functionality.

We tried to fix this in camel itself but it was looking like an effort of diminishing returns. We wanted a simple consumer, we were not after any of the enterprise capabilities of camel. So we gave it a good thought and instead of spending time and energy to make camel work for us, we decided to build a simple consumer framework.

All our consumers implement an interface as follows:


public interface Consumer {
  public void setConsumerWaitPeriod(long consumerWaitPeriod);
  public long getConsumerWaitPeriod();
  public void setNoOfConsumers(int noOfConsumers);
  public int getNoOfConsumers();
  public void setQueueName(String queueName);
  public String getQueueName();
  public void consume(Object object);
}

We run our code inside a spring container, so, as the first step in the consumer framework, we pick up all the beans that implement the Consumer interface.

ClassPathXmlApplicationContext ctx = new ClassPathXmlApplicationContext("applicationContext.xml");
Map<String, Consumer> map = ctx.getBeansOfType(Consumer.class);

The back bone of the whole consumer framework is the below class:

public class KestrelConsumer implements Runnable {
    @Autowired
    private MemcachedClient memcachedClient;

    private Consumer consumer;

    private ConsumerFramework.StopSwitch stopSwitch;

    private ExecutorService executorService;

    public Consumer getConsumer() {
        return consumer;
    }

    public void setConsumer(Consumer consumer) {
        this.consumer = consumer;
    }

    public ConsumerFramework.StopSwitch getStopSwitch() {
        return stopSwitch;
    }

    public void setStopSwitch(ConsumerFramework.StopSwitch stopSwitch) {
        this.stopSwitch = stopSwitch;
    }

    @Override
    public void run() {
        //We use semaphores to block until a thread is available in the executor pool
        final Semaphore semaphore = new Semaphore(consumer.getNoOfConsumers());
        this.executorService = Executors.newFixedThreadPool(consumer.getNoOfConsumers());

        while (!stopSwitch.isStop()) { //Check for the global stop command
            try {
                semaphore.acquire();
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                logger.error(e);
                continue;
            }

            final Object payLoad;
            try {
                payLoad = memcachedClient.get(KestrelUtil.getKestrelBlockingReadCommand(dopConsumer.getQueueName(),
                   dopConsumer.getConsumerWaitPeriod()));
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();

                semaphore.release();
                continue;
            } catch (Exception e) {
                semaphore.release();
                continue;
            }

            if (payLoad == null) {
                semaphore.release();
                continue;
            }

            executorService.submit(
                    new Thread() {
                        public void run() {
                            try {
                                consumer.consume(payLoad);
                            } catch (Exception e) {
                            } finally {
                                semaphore.release();
                            }
                        }
                    }
            );
        }
    }

    public void shutDown() {
        executorService.shutdown();
    }

    public void awaitShutdown(long timeOut, TimeUnit timeUnit) throws InterruptedException {
        executorService.awaitTermination(timeOut, timeUnit);
    }
}

The above code has been sanitized(removed error logging, etc) for this post.

I will try to walk you through the above code. Kestrel talks memcached protocol, we use xmemcached library to talk to kestrel, xmemcached has native support for kestrel.

For every consumer bean found in the spring context, we create a new KestrelConsumer object, set the consumer bean in KestrelConsumer object and spawn it off. We need a blocking executor service as we do not want to pick up items from the queue when all the executors are busy. This is where the semaphore comes in, we instantiate a semaphore with the concurrency count as configured in the consumer bean.

The StopSwitch object gives us the ability to do a graceful shutdown. Once a shutdown signal is received, we set the StopSwitch to true and once this is true, KestrelConsumer stops processing more items.

StopSwitch class looks as below:

 public class StopSwitch {
   private boolean stop = false;

   public synchronized boolean isStop() {
       return stop;
   }

   public synchronized void setStop(boolean stop) {
       this.stop = stop;
   }
}

It is very important for the getters and setters to be synchronized as this object is shared between multiple threads.

Since we want a neat shutdown, when the whole consumer framework is bootstrapped, we add a shutdown hook as below:

Runtime.getRuntime().addShutdownHook(new Thread() {
    @Override
    public void run() {
        final CountDownLatch countDownLatch = new CountDownLatch(consumers.size());

        stopSwitch.setStop(true);

        for (final KestrelConsumer consumer : consumers) {
            new Thread() {
                public void run() {
                    consumer.shutDown();
                    try {
                        consumer.awaitShutdown(5, TimeUnit.MINUTES);
                    } catch (InterruptedException e) {
                    } finally {
                        countDownLatch.countDown();
                    }
                }
            }.start();
        }

        try {
            countDownLatch.await();
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
    }
});

The shudown hook is invoked when the JVM gets a kill signal. As soon as this is invoked, we set the StopSwitch to true, so that all running KestrelConsumer threads stop. We iterate over each of the KestrelConsumer objects and shut them down. The CountDownLatch makes sure that all the consumers are shutdown before the JVM exits.

Along with this, the consumer framework understands telnet protocol and when queried with a consumer name, lets the user know whether the consumer is running or not. We utilized netty to implement telnet.

We have this running in production for around two weeks now and things are looking good. If you too are interested in hacking on things like this, let us know at devgigs@freecharge.com. We are always looking for curious people who want to build beautiful products.

Quartz to Jenkins

At FreeCharge, initially, we were using quartz to run all our background jobs. These jobs were configured to run at some frequency, for example, say every 15 minutes.

Problems that we faced with this setup:

  •  No GUI to see what job is currently executing, when is the next run of this job scheduled.
  •  Say I want to see the logs for the job foo that ran yesterday at 8:30 in the night, quartz did not provide this out of the box.
  •  How do I monitor whether the jobs are running fine?
  •  No way to cancel a currently running job.
  •  Say I want to fire a job right now but it’s scheduled run is 15 minutes later, how do I do this?
  •  If there is an error while running a job, how do I get to know of this?
  •  Say for sometime I do not want a job to run, how do I achieve this?
  •  If a job is stuck for sometime, how do I time it out and re run it?

Most of you might know jenkins as a build/continuous integration server, but it solves all of the problems listed above.

We used to run our quartz jobs through spring quartz integration. As the first step, we removed this and created a main method for each job. Each of these main methods were configured as new jobs in jenkins. We rigged up our jenkins in such a way that as soon as the build is done, these jobs are triggered. Once triggered, these jobs call themselves at a pre determined interval.

With this configuration, we get a lot of things out of the box:

  • Nice GUI to monitor and see which jobs are currently running, which jobs are scheduled to run later, health of each run.
  • Through the GUI, we can run a job out of line, suspend a job for sometime and re enable it or completely nuke a job.
  • We can configure a job to time out if stuck and re trigger.
  • We get  a complete history(configurable) of  each job’s run along with the status of that run as well as the logs for that run.

If you too are interested in doing geeky things like this, let us know at devgigs@freecharge.com. We are always looking for curious people who want to build beautiful products.