Python's Mechanize not recognizing a form or HTML? Try BeautifulSoup
If you don't already have it installed, download and install BeautifulSoup 3.2.1 from http://www.crummy.com/software/BeautifulSoup/
Don't use BeautifulSoup4. It is messed up and I couldn't get it working. At the time I tried to use it, it didn't work and searches on the internet showed that it hasn't been working for a while for most people.
If you have easy_install you can just type "easy_install -Z beautifulsoup". After it installs, you might have to go to site-packages and move the BeautifulSoup.py file from it's own directory into site-packages itself. It wouldn't import for me until I moved it.
Import mechanize and BeautifulSoup
import mechanize
from BeautifulSoup import BeautifulSoup
Put the following class into your python script
class PrettifyHandler(mechanize.BaseHandler):
def http_response(self, request, response):
if not hasattr(response, "seek"):
response = mechanize.response_seek_wrapper(response)
# only use BeautifulSoup if response is html
if response.info().dict.has_key('content-type') and ('html' in response.info().dict['content-type']):
soup = BeautifulSoup(response.get_data())
response.set_data(soup.prettify())
return response
When you create a browser with mechanize, add the following handler.
br = mechanize.Browser()
br.add_handler(PrettifyHandler())
Now just use mechanize like normal. The mechanize browser will now use BeautifulSoup to parse all responses where html is contained in the contect type (mime type) text/html. This helped me get mechanize to work with forms
No comments:
Post a Comment