Archive | January, 2009

Tags: ,

From Wordpress to Django – Part 2

Posted on 27 January 2009 by BeardyGeek

View Part 1
Beardy Geek Git Hub Repository

Welcome back. In part 2 I’ll be getting to the meat of the issue, which is retrieving the data from an existing Wordpress blog, and feeding the data into my own models.

Models

I’ve decided to start the models from scratch, rather than trying to copy the way Wordpress has laid them out. It should keep things simpler that way. I’ll be using the contrib comments system for the post comments, and I’ll also be creating 3 other models, Category, Tag, and Post. Here’s the code for the models:


from django.db import models
from django.contrib.auth.models import User
from datetime import datetime

POST_STATUS = (
               ('P', 'Published'),
               ('U', 'Unpublished'),
)

class Tag(models.Model):
    text = models.CharField(max_length=75)
    slug = models.CharField(max_length=75)

    def __unicode__(self):
        return self.text

class Category(models.Model):
    text = models.CharField(max_length=75)
    slug = models.CharField(max_length=75)

    def __unicode__(self):
        return self.text

    class Meta:
        verbose_name_plural = "categories"

class Post(models.Model):
    title = models.CharField(max_length=75)
    slug = models.CharField(max_length=75)
    content = models.TextField()
    author = models.ForeignKey(User)
    post_date = models.DateTimeField(default=datetime.now)
    status = models.CharField(max_length=1, choices=POST_STATUS)
    categories = models.ManyToManyField(Category)
    tags = models.ManyToManyField(Tag)

    def __unicode__(self):
        return self.title

Wordpress Export

Just a quick note: I am using Wordpress Version 2.6.1. Things may be different in other versions, but the code here works on this version. I’ll go through the code so you should see where the problem is if it doesn’t work with your version.

OK, if you didn’t already know, you can export all your Wordpress data into an xml file. Go to your dashboard, click ‘Manage’, and then ‘Export’. Select which authors to restrict (if any) and hit the download button. You should now have the data in an xml file.

XML File Editing

When I first tried to parse this file, I got an error as one of the namespaces used in the document is undeclared. To rectify this, you need to open up the file in a text editor (not an xml editor, you may get the same parse error). Near the top of the file, you should see something like this:


<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:wp="http://wordpress.org/export/1.0/"
>

The missing namespace is ‘excerpt’, and we need to add this. It doesn’t matter what value you give it. I did this:


<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:wp="http://wordpress.org/export/1.0/"
	xmlns:excerpt="mysite"
>

Save the file and we’re ready to start parsing the data.

XML Parsing

To parse the XML I have used ElementTree. This is included in Python 2.5, but if you’re using an earlier version, you can get ElementTree from Effbot.org.

I won’t include all the code here, just some snippets as examples. To view the full source, please check out the BeardyGeek Github Repository.

First we need to load our xml file.


tree = ElementTree.parse('c:/wordpress.xml')

Then find the top level under which our data resides, which is the ‘channel’ tag.


chan = tree.find('channel')

Now I want to create some shortcut variables for the namespaces that we will use when finding tags.


wp_ns = '{http://wordpress.org/export/1.0/}'
content_ns = '{http://purl.org/rss/1.0/modules/content/}'

Now we can get all the category, tag and item(post) entries:


cats = chan.findall('{http://wordpress.org/export/1.0/}category')
tags = chan.findall('{http://wordpress.org/export/1.0/}tag')
items = chan.findall('item')

This will give us lists of all the elements for those three items.

Finding and Saving Data

I’ll give an example of saving the data using the category tag.


for cat in cats:
        c = Category(text=cat.find(wp_ns + 'cat_name').text,
                 slug=cat.find(wp_ns + 'category_nicename').text)
        c.save()

The tag data is the same as the above.

The item data is a bit more complex. If you look at the xml you’ve exported, you’ll see that the item data includes both posts and pages. But it also gives all previous revisions of each post, which will include any drafts saved whilst writing a post. So we need to find all those with a status of ‘publish’ and a page type of ‘post’. We’ll deal with the ‘page’ data another time, using Flatpages.


if item.find(wp_ns + 'status').text == 'publish' and
               item.find(wp_ns + 'post_type').text == 'post':
            i = Post(title=item.find('title').text,
                     slug=item.find(wp_ns + 'post_name').text,
                     content=item.find(content_ns + 'encoded').text, author=u,
                     post_date=item.find(wp_ns + 'post_date').text,
                     status='P')
            i.save()

The ‘u’ (value for author) is a User object I create earlier in the code that I’ve used as the default author of each post (see source).

Post Categories

Now we have to find out which categories and tags this post has. Within each ‘item’ we have ‘category’ data. A bit confusingly this ‘category’ data also includes the tags, and to discover that you need to look at the ‘domain’ attribute to see which it is. Plus we only need the category with the ‘nicename’ in it (slugified).


post_cats = item.findall('category')

for pc in post_cats:
     #check for attributes
     if pc.get('nicename'):
         if pc.attrib['domain'] == 'category':
              c2 = Category.objects.get(slug=pc.attrib['nicename'])
              i.categories.add(c2)
         elif pc.attrib['domain'] == 'tag':
              t2 = Tag.objects.get(slug=pc.attrib['nicename'])
              i.tags.add(t2)

Comments

The last section deals with comments. I am using the django.contrib.comments module for this.


comments = item.findall(wp_ns + 'comment')

for comm in comments:
if not comm.find(wp_ns + 'comment_author_email').text:
comm_email = ''
else:
comm_email = comm.find(wp_ns + 'comment_author_email').text

if not comm.find(wp_ns + 'comment_author_url').text:
comm_url = ''
else:
comm_url = comm.find(wp_ns + 'comment_author_url').text

db_comm = Comment(comment=comm.find(wp_ns + 'comment_content').text,
ip_address=comm.find(wp_ns + 'comment_author_IP').text,
object_pk=i.id, submit_date=comm.find(wp_ns + 'comment_date').text,
user_email=comm_email,
user_name=comm.find(wp_ns + 'comment_author').text[:50],
user_url=comm_url,
content_type=ct, site=site)
db_comm.save()

Conclusion

Well that wraps it up for this post. I'll cover extracting the data for Flatpages in the next post, but this should give you enough to get started. You can see how to extract the required data from the xml document, so if you want to extend the models beyond what I have, you shouldn't have any problems. Again, check out the fully code, plus the other file changes (url.py etc) at the Beardy Geek Git Hub Repository. Have fun.

 

Comments

Tags: ,

From Wordpress to Django – Part 1

Posted on 14 January 2009 by BeardyGeek

Now don’t get me wrong, there’s nothing wrong with Wordpress. It’s just that I like to play with stuff, so I thought it would be fun to create a blog in Django, copy all my Wordpress posts across, and add at least some of the functionality that Wordpress has built in.

Open Source

Throughout this process, so that you can follow along, I will be storing the code in a GitHub repository. Click here to view the BeardyGeek repository

A Few Considerations

There are a few things to think about before getting started:

  1. Data Conversion – getting the data from Wordpress into my Django models. Do I want to keep the same database structure as Wordpress, or create my own? If I create my own, how will I get the data across?
  2. Services – how much do I try and implement? The main ones are comments, tags, pings, anti-spam.

I’m sure more will come to mind as I go through this

Conclusion

Well, I’d better stop jabbering and get on with it then! Keep an eye on the blog, or watch the Git repo for updates. I’ll get done whatever my current workload allows.

Comments

Tags: ,

How to Call a .Net Webservice using Python

Posted on 01 January 2009 by BeardyGeek

My day job involves working with software that automatically creates webservices on the .Net platform. Up until now I have used C# to create web applications to use these webservices.

But it would be nice to have the flexibility to use another programming language to create a solutions, and consume the webservices from there.

So, here’s a short tutorial on calling your .net webservice using Python.

Pre-requisites

  1. I’m currently using Python 2.5, so I can’t speak for other versions
  2. You will need the ElementSoap package from effbot.org. You can get it from here
  3. You’re also going to need a .Net webservice with which to test this out. If you’re reading this tutorial, you probably already have one.

SOAP

Firstly we need to look at the SOAP examples on your .Net webservice. Go your webservice’s asmx page, and then click on one of your webservices to view the details. You will see some example SOAP requests and responses. The one we’re interested in here is SOAP 1.1. Mine looks like this:


POST /lookserver/webservices.asmx HTTP/1.1
Host: localhost
Content-Type: text/xml; charset=utf-8
Content-Length: length
SOAPAction: "http://tempuri.org/GetVersion"

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <getVersion xmlns="http://tempuri.org/" />
  </soap:Body>
</soap:Envelope>

HTTP/1.1 200 OK
Content-Type: text/xml; charset=utf-8
Content-Length: length

<xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
 xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <getVersionResponse xmlns="http://tempuri.org/">
      <getVersionResult>string</getVersionResult>
    </getVersionResponse>
  </soap:Body>
</soap:Envelope>

As we can see here, I have a webservice called GetVersion, the namespace is http://tempuri.org, and the SOAPAction is http://tempuri.org/GetVersion. Note these down for your webservice.

Into the Python

Fire up an interactive session, and we’ll go through calling this service.

Firstly we need to import the ElementSOAP library:

from elementsoap import ElementSOAP as ES

If you get an error message, then you haven’t installed ElementSoap properly.

Next we create a SoapRequest:

sr = ES.SoapRequest("{http://tempuri.org/}GetVersion")

And now we initialize a SoapService object:

serv = ES.SoapService("http://localhost/lookserver/webservices.asmx")

Obviously put in your own url for your webservice.

Now we call the webservice, using the SOAPAction value:

result = serv.call("http://tempuri.org/GetVersion", sr)

The result is an xml element which should contain the response from your webservice. When I type result I get:

<element '{http://tempuri.org/}GetVersionResponse' at 00A1A8A8>

If you look at the xml at the top of the article, you’ll see that the result returns inside a GetVersionResponse tag. To get the value of the result we type:

result.find("{http://tempuri.org/}GetVersionResult").text

This finds the appropriate tag, and returns the text.

 

Conclusion

I hope this has helped. When I searched, I found that the other tutorials were either confusing, or out of date. Take a look at the various libraries at effbot.org, especially ElementTree for parsing XML, very useful. Enjoy!

Comments