Monday, July 07, 2014

Dealing with HHVM 3.1.0 memory leak

A week ago, I switched my site onto HHVM (3.1.0), in replace of PHP5. The throughput increase was significant, and response time dropped from 400ms to 300ms observed from Pingdom. I was very happy with the result.

After 5 days, I got a down-time on my site.. The reason being HHVM was shut down on itself. As it is serving through cgi port 9000. Nginx cannot access it. The solution was rather simple:

      sudo /etc/init.d/hhvm restart

However, I noticed my php-fpm setting was through socket rather than port. After some reading, cgi using socket is a lot more efficient than through port, as using port involving resolving through the network. So in order for nginx to read from hhvm socket, I opened up  /etc/nginx/hhvm.conf, replace the fastcgi_pass with:

    fastcgi_pass   unix:/var/run/hhvm/hhvm.sock;

I need to tell HHVM to serve from the socket, too. So, in /etc/hhvm/server.ini, I commented the server port and added file_socket:

    hhvm.server.file_socket=/var/run/hhvm/hhvm.sock
    ;hhvm.server.port = 9000

Restarting both nginx and hhvm, now it runs peacefully, again. no significant response time improvement, but hoping it fixes the issue of shutting down itself.

5 days after, the response time of site started to catch up.. :(

save image


I ssh-ed into the node. Well, I found multiple processes of HHVM eating up the memory - that's not good. It is a memory leak. That's why it crashed a few days ago. and it only filled up the memory when serving through socket.

Keeping reading up HHVM, people had similar issues: Laravel + HHVM will cause memory leak: https://github.com/laravel/framework/issues/4757
It became my problem as my site is still on Laravel 3.

I guess I could switch back to PHP5. but I love the performance gained from HHVM and lambda expression, and would love to have it running around the bug.

I ran httperf against my site home page on localhost, and htop observing the hhvm processes, which consists a couple of mysql query and blade template render.  The memory grew fast with the requests. To rule out it is not the problem of db connection (or not just the problem of db): I ran httperf against another page that only have a static blade template like the following:

    public function action_about()
    {
        return View::make('home.about');
    }

And the memory consumption of hhvm grew just as fast. Third test with hello world: <?php echo 'hello' ?> it didn't cause hhvm any memory leak. 

Quite possibly it was caused by blade template engine running into some corner of hhvm. I didn't go deep - will leave the part for the future.

So I'm guessing hhvm did not limit on the max threads to be created, or the connections didn't timeout accumulated in the memory, or a background process should periodically expire the cache, or maybe there should need to be a memory cap defined, etc.. Have to try out and experiment with it. There wasn't much documentation, I could only found these information about server.ini settings:

https://github.com/facebook/hhvm/wiki/INI-Settings
        https://github.com/facebook/hhvm/blob/1cd8ceb06adf9a40039ebac40c489abb811ab599/hphp/runtime/base/runtime-option.cpp#L761

Tweaked more of the parameters, none of them stopped the memory accumulation! :( Here are the settings that I tried:

hvm.resource_limit.max_rss = 268435456
hhvm.resource_limit.drop_cache_cycle = 3600
hhvm.resource_limit.max_rsspolling_cycle = 3600
;hhvm.resource_limit.max_socket = 10
hhvm.resource_limit.socket_default_timeout = 30
hhvm.server.thread_count = 3
hhvm.pagelet_server.thread_drop_cache_timeout_seconds = 60


Finally, I found people talking about JIT is the cause of the memory leak, and the parameter trying to turn off JIT seems to work:

hhvm.jit = false

Now memory with hhvm processes stopped growing with httperf adding load to it. The service usable but I lost some performance - still beat PHP5. The problem is still unsolved. The remaining issue is: why does blade template engine cause JIT to memory leak?


1 comment:

Chibueze said...

Interesting discovery, Pablo also discovered something similar while trying to run composer install on HHVM. Apparently this is due to the way HHVM's JIT works, but in your case, it could be due to the regex differences with HHVM

On a different note though, considering that the blade templating engine uses regex to convert code, it might be advisable to look up how this is currently done with blade and maybe cross-check it with HHVM regex iterator