Yuhan's blog: 2015

Wednesday, July 15, 2015

Vim Tip - Delete All lines above / below

vim command to delete all lines above - dgg
vim command to delete all lines below - dG

Saturday, June 06, 2015

UTF8 Encoding Problem in Laravel Blade Template

I have a Chinese site written in three years ago. It is still on Laravel 3, but I'd like to migrate it into a new server. After the migration, it appears that all Chinese characters from the application are rendered twisted. And it was not because of the <meta> tag that can fix it, as I am seeing the wrong character in the source.

My first step was try to render some Chinese characters from plain .php file. It worked. That ruled out the Nginx/HHVM configurations. After that, I started to place "print_r()" with some UTF-8 characters. Tracing to /laravel/core.php, it appears that mb_output_handler method is using a wrong default encoding. I made the following changes:

mb_http_output("UTF-8"); // to make blade template output in UTF-8
ob_start('mb_output_handler');

The UTF-8 characters appeared normal again after the change. Great! :)

Saturday, May 02, 2015

List of Python Unicode / UTF8 String Encoding Snippet

\u to \x conversion

>>> print(u'\u0420\u0443\u0441\u0441\u043a\u0438\u0439')
Русский
>>> a = u'\u0420\u0443\u0441\u0441\u043a\u0438\u0439'.encode('utf8')
>>> a
'\xd0\xa0\xd1\x83\xd1\x81\xd1\x81\xd0\xba\xd0\xb8\xd0\xb9'
>>> print(a)
Русский

\x to \u conversion

Firstly, convert unicode to string (iso-8859-1 encoding)
Then you need to know the encoding. In this example, the encoding of the string is gb2312:

>>> x
u'\xcc\xd8\xbe\xaf\xc1\xa6\xc1\xbf'
>>> x.encode('iso-8859-1')
'\xcc\xd8\xbe\xaf\xc1\xa6\xc1\xbf'
>>> '\xcc\xd8\xbe\xaf\xc1\xa6\xc1\xbf'.decode('gb2312')
u'\u7279\u8b66\u529b\u91cf'
>>> print(u'\u7279\u8b66\u529b\u91cf')
特警力量

Tuesday, January 06, 2015

How to Make Slow Mongo Query on Compound Index with $gte and $lte Fast

I have a mongo collection of IpRanges, with start_address and end_address. I'd like to query given an IP. with compound index on {start_address:1, end_address:1}, this however is still very slow:

db.ip_blocks.find({start_address: {$lte: 1665637698}, end_address: {$gte: 1665637698}})

with .explain(), it appeared that this query causes a lot of scanning and slow, as only the $lte query worked hard, and the second query is scanning what's remaining from the first query.

Here is a trick to speed this up if you are only trying to get the first match:

db.ip_blocks.find({start_address: {$lte: 1665637698}, end_address: {$gte: 1665637698}}).limit(1)

This doesn't cause scan.

However, if you are expecting more than one match, limit(2) or above will cause a full scan.

So to make it work for multiple matches, here is the second trick:

Add this index:

{start_address:-1, end_address:1}

Query with a $maxScan parameter:
db.ip_blocks.find({start_address:{$lte: 1665637698}, end_address:{$gte: 1665637698}})._addSpecial("$maxScan", 100)

As the addresses are ordered in a way helping us to scan, the correct records can be retrieved and also with limited scanning and thus it becomes fast.