Sunday, January 28, 2007

How to add button to your blogger account?

I'm using the new blogger classic template. So YMMV. But the basic idea should be the same.

  • Go to "Edit Template"->"Edit HTML",
  • Expand all widgets HTML code by checking "Expand Widget Templates"
  • Scroll down to near the bottom, replace<div class="'post'">
    <a name="''/"></a>
    <div class="'post'"><a name="''/">
    </a><a name="''/">
    <script language="javascript" src=""></script></a>

Javascript tags generator: A MUST have for serious Ajax developers using vim

The following python script could build tags file for your javascript files. It's still version 0.0.1, but it does support extracting most methods and class definitions. Here are a list of the syntax it supports for now. If you find this script misses something, please let me know and I'll add them for you.

Supported Syntax

# - functions
# * function info(msg) {
# * info: function(msg) {
# - classes
# * var logger = {

# logger.js
# ls *.js |
# find ./ -name *.js |

Source code

#! /usr/bin/env python
# Create a tags file for javascript programs, usable with vi.
# Benefit:
# Just in case you haven't used tags in vim, you could jump
# to class/function definition using and jump back
# using . Once you start using tags, you can no longer
# live with it.

import sys, re, os
tags = [] # Modified global variable!

def main():
if sys.stdin: files = [s.strip() for s in sys.stdin.readlines()]
else: files = sys.argv[1:]
for filename in files: parse_file(filename)

fp = open('tags', 'w')
fp.write("!_TAG_PROGRAM_AUTHOR\tAlex Dong\n")

for s in tags: fp.write(s)

patterns = { re.compile('\s*var\s*(\w+)\s*=\s*{'):'c',

def parse_file(filename):
fp = open(filename, 'r')
while 1:
line = fp.readline()
if not line: break

for pattern in patterns.keys():
m = pattern.match(line)
if m:
c =
n =
s = "%s\t%s\t/^%s/;\"\t%s\n" % (n, filename, c, patterns[pattern])


if __name__ == '__main__':

Monday, January 22, 2007

Facts about Unicode which you might not know

  1. There is no such thing called "Plain Text". All text and strings are byte data stored in memory and disk. So a better word for previous "Ascii string" might be "binary string" or "byte string".
  2. The operation system needs a way to transform meaningless in memory binary data into meaningful string to display on monitor, print on paper. In the old time before Unicode, this semi-standard byte-to-string mapping table is called "Code Page".
  3. A byte can contain 256 characters with the low 128 characters common agreed as "ASCII" character set, but for the rest 128 characters, different country/culture choose a different way to use them. Thus a 0xc5 should be different in Hebrew and Russian language.
  4. IBM PC has defined one way to use 128-255 byte space, which is common called "latin-1" character set, or "iso-8259-1". This character set is so standard that many modern softwares, like MySQL, still makes it the default encoding for database connection, which causes tons of headaches. Someone even calls it "Character Set Hell".
  5. Chinese and other east asia characters requires at least two bytes to represent a word. Thus, the need for a multi-byte string. In Window, this is where the notorious MBCS comes into play. Plus the tons of methods in ATL/MFC to convert from one type to another.
  6. A more precise way for "UTF-8" is "an unicode string encoded using UTF-8 encoding rules". So first of all, UTF-8 can be used in transmission from one machine to another, but inside a computer, it still needs to be converted into bytes. UTF-8 is not Unicode.
  7. Unicode is a standard way to define how to represent any given character in memory. It has nothing related to how the character rendered on the monitor or on paper. It's only a world-level unique way to ensure that given any character, no matter it's a Thai, Chinese, Russian, Hebrew or French, there is only one single way to represent it in binary format.
  8. Unicode, as an internal representation, looks like like U+0048 U+0065 U+006C. When it's encoded into ASCII encoding and saved, it'll be the as 48 65 6C. The ASCII string could, understandably, be decoded into Unicode representation.
  9. The decode process requires an explicit definition of which encoding the input string is "encoded" in. Otherwise, the system level default encoding will be used.
  10. Major modern database software, Oracle, MSSQL, MySQL, PostgreSQL, SAP DB, Sqlite, all supports storing unicode string as content.
  11. UTF-8 is a preferable way to transmit and store data in because it has two unique characteristics: 1) UTF-8 is fully compatible with ASCII character set since it either uses one byte for 0-128 characters, or three bytes for the rest. Since most network applications, especially HTTP protocol accepts only ASCII character set, UTF-8 is the only encoding choice that's compatible with these network protocols. 2) UTF-8 contains the same character set as defined in Unicode. The process of encoding Unicode into ASCII is downcasting because it shrinks the amount of characters that could be represented.
  12. In order to make your web pages display correctly, it's recommended to include a META tag in your HTML code: meta equiv="Content_Type" charset="UTF-8". If you're using Apache, you could add AddDefaultCharset utf-8 into your httpd.conf to let Apache server add this to all the web pages your website is servicing.
  13. If you need to make your web forms have multi-lingual support, like Google's search textbox, you'll need to
I've done a quite extensive reading on this topic. Basically I found all the articles on this topic and continue reading them until I could no longer find any concepts that's new. Here are a few links I'd recommend:

Tuesday, January 09, 2007

Bash Alias for Darcs users

I've been using darcs for quite a while. I really love its simplicity and distributed nature. I found myself repeating myself again and again typing basic darcs commands after using the techniques described in this excellent blog. Here are a few additions I added to my ~/.bash_profile to minimize the types I need to make. Explanations in lines.

# Setup editor environment editors to let darcs know.

# My typical patch naming convention is {year-month-date-hour-min-sec}.
# instead of manually making on up every commit, I'd like to automate it.
PATCHNAME=$(date +%Y%m%d%k%M%S)

# 'commit' will tell darcs to record all (-a) changes using the patch name as mentioned above.
# we also tell the darcs command to launch our default EDITOR to let us add a long comment.
# last, we'd like to add a signature to the patch by associated with our email address
alias commit='darcs record -a --patch-name=$PATCHNAME --edit-long-comment'
alias diff='darcs whatsnew | less'

Friday, January 05, 2007

The Tale of Two Document

This afternoon, I spent three hours figuring out what's the different between window.document and content.document from Mozilla or FireFox extension's point of view. Figuring this out solves lots of my questions or "why the heck this doesn't work" wonders. This article explains what are the differences, when to use which and ends with some very practical kludges that I've found quite handy.

Document? Which document?
My previous Ajax experience told me that document is the HTML document, aka the DOM model, which I could modify the node attribute, change element layout, or remove one. When I write document.getElementById('sidebar'), I mean find an element whose id is sidebar. This is the document. But, in the Mozilla's world, this could cause you big trouble as I did.

In Mozilla's point of view, the whole browser is a rendered DOM tree. Which means that the menu, the toolbar, the status bar are all elements on a DOM tree. When you add a toolbar, you're inserting a toolbar element into the DOM tree. When you see the status bar, you're actually looking at a label. This is the beauty of XUL and Mozilla, the extensible architecture where other wonderful technologies like RDF, XBL, and overlay. Now when you write document.getElementById, the document is not referring to the HTMLDocument. But it's referring to XULDocument, which represents the current browser window, aka the chromeWindow.

Here is a concrete example. Run this and you'll see a "hello" showing up on your status bar. window.document.getElementById('statusbar-display').label="hello"; If you want to try this out, paste this into your Javascript Environment which comes along with FireFox Extension Development add-on. If you don't have it, get it now.

The statusbar-display is the left most status pane in your firefox, where you could find "Transferring data from ...". So here the window.document points to the current ChromeWindow's document, which is the XULDocument. Your HTML document doesn't and probably won't have an element with id like that.

Then, how could I get a pointer to the HTML document? Use window.content.document. If you see _content, it's an obsolete way of representing the HTML DOM tree. As a result, in order to find the sidebar in the current HTML document, you need to write the code like this: window.content.document.GetElementById("sidebar");, where the content means the real content instead of the user interface.

This script explains it much better:

Here is the output:
[object ChromeWindow]
[object Window]
[object HTMLDocument]
[object XULDocument]

Shorthand for window.content vs window.document

  • window.content can be written as content when there is only one window. The window is the current window or frame which the document is contained. If you are working on a multi-frame document, you'll have to write code like framelist[i].document.
  • window.document can be written as document because of the same reason explained above.
Event Handling
In Ajax, if we want to install an event handler for keypress event, here is what we'll do: document.onkeypress = function(event) { ... }. But in Mozilla world, you can't hook the event up with HTMLDocument directly. This works find in the Javascript Environment, but it just doesn't work in your extension javascript code. (My environment is Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20061204 Firefox/ To get it working, you will have to hook the event handler up with window.document.

One side effect of this is that now the event will be triggered for both the content in HTML and the controls in the browser's toolbar, status bar. If your event handler is to delete the element by clicking on it, you could delete the toolbar buttons one by one by clicking through the buttons!

I didn't find a good or official solution to this problem so if you know a better way to do this, please definitely let me know. Here is my kludge which just works.
function onclick(event) {
if ('XULElement')>=0) return;
// now, do whatever you want.