I'm an indecisive person; I like all the perks of running my own installation of WordPress, but still want to host my posts on WordPress.com.

I've been manually exporting from my blog on my private server (tak.atso-net.jp/blog) to my WordPress.com blog manually using the WordPress eXtended RSS (WXR) "Export" and "Import" tools provided in WordPress dashboard. This is great, but not ideal, especially if you'd rather spend more time scripting to automate something than the time it takes to do the thing manually, like me...

.... so, I've written a little Python script that uses the XML-RPC library (xmlrpclib) to fetch all posts on my private server, and posts/update them on my WordPress.com blog (it was raining and I was stuck indoors anyway).

Currently the script only mirrors from source to target, overwriting any existing posts with the same title. I'm sure it can be improved to allow updating instead of mirroring by comparing the last modified date, and decide which way to copy, but that, as with everything else on this blog, is left as an exercise for the diligent reader...

#!/usr/bin/env python

import logging
import time
import re
import xmlrpclib
from optparse import OptionParser

MAX_POSTS = 10000   # big enough for ya?

def decode_iso8601(date):
    # Translate an ISO8601 date to the tuple format used in Python's time
    # module.
    regex = r'^(d{4})(d{2})(d{2})T(d{2}):(d{2}):(d{2})'
    match = re.search(regex, str(date))
    if not match:
        raise Exception, '"%s" is not a correct ISO8601 date format' % date
    else:
        result = match.group(1, 2, 3, 4, 5, 6)
        result = map(int, result)
        result += [0, 1, -1]
        return tuple(result)

if __name__ == "__main__":
    logging.basicConfig(level=logging.DEBUG,
                        format='%(asctime)s %(levelname)-8s %(message)s',
                        datefmt='%H:%M:%S')

    parser = OptionParser()

    parser.set_description("Description:
     Mirror posts from one wordpress blog to another")
    parser.set_usage("%prog ")

    parser.add_option("-s", "--source-url", dest="sourceUrl",
                      help="XML-RPC URL of the source blog",
                      metavar="url")

    parser.add_option("-u", "--source-username", dest="sourceUsername",
                      help="Username for source blog",
                      metavar="username")

    parser.add_option("-p", "--source-password", dest="sourcePassword",
                      help="Password for source blog",
                      metavar="password")

    parser.add_option("-t", "--target-url", dest="targetUrl",
                      help="XML-RPC URL of the source blog",
                      metavar="url")

    parser.add_option("-U", "--target-username", dest="targetUsername",
                      help="Username for target blog",
                      metavar="username")

    parser.add_option("-P", "--target-password", dest="targetPassword",
                      help="Password for target blog",
                      metavar="password")

    (options, args) = parser.parse_args()

    logging.info("Source server: %s" % options.sourceUrl)
    source = xmlrpclib.ServerProxy(options.sourceUrl)
    sourcePosts = source.metaWeblog.getRecentPosts(
        1, options.sourceUsername, options.sourcePassword, MAX_POSTS)   # assumes it's the first blog you have
    logging.info("Fetched %d posts" % len(sourcePosts))

    logging.info("Target server: %s" % options.targetUrl)
    target = xmlrpclib.ServerProxy(options.targetUrl)
    targetPosts = target.metaWeblog.getRecentPosts(
        1, options.targetUsername, options.targetPassword, MAX_POSTS)    # assumes it's the first blog you have
    logging.info("Fetched %d posts" % len(targetPosts))

    for sp in sourcePosts:
        logging.info("TITLE: %s" % sp['title'])
        logging.debug("DATE:  %s" % time.strftime('%m/%d/%Y %H:%M:%S',
            decode_iso8601(sp['dateCreated'].value)))
        postId = -1
        for tp in targetPosts:
            if sp['title'] == tp['title']:
                # Edit and existing post - might be better to check the date and decide which way to sync...
                postId = tp['postid']
                logging.debug("    Already exists. Overwriting")
                break
        isPublished = (sp['post_status'] == "publish")
        if postId == -1:
            target.metaWeblog.newPost(
                1,
                options.targetUsername,
                options.targetPassword,
                sp,
                isPublished)
        else:
            target.metaWeblog.editPost(
                postId,
                options.targetUsername,
                options.targetPassword,
                sp,
                isPublished)
On 29 May 2011

Comments