I l@ve RuBoard Previous Section Next Section

10.1 "Tune in, Log on, and Drop out"

In the last few years, the Internet has virtually exploded onto the mainstream stage. It has rapidly grown from a simple communication device used primarily by academics and researchers into a medium that is now nearly as pervasive as the television and telephone. Social observers have likened the Internet's cultural impact to that of the printing press, and technical observers have suggested that all new software development of interest occurs only on the Internet. Naturally, time will be the final arbiter for such claims, but there is little doubt that the Internet is a major force in society, and one of the main application contexts for modern software systems.

The Internet also happens to be one of the primary application domains for the Python programming language. Given Python and a computer with a socket-based Internet connection, we can write Python scripts to read and send email around the world, fetch web pages from remote sites, transfer files by FTP, program interactive web sites, parse HTML and XML files, and much more, simply by using the Internet modules that ship with Python as standard tools.

In fact, companies all over the world do: Yahoo, Infoseek, Hewlett-Packard, and many others rely on Python's standard tools to power their commercial web sites. Many also build and manage their sites with the Zope web application server, which is itself written and customizable in Python. Others use Python to script Java web applications with JPython (a.k.a. "Jython") -- a system that compiles Python programs to Java bytecode, and exports Java libraries for use in Python scripts.

As the Internet has grown, so too has Python's role as an Internet tool. Python has proven to be well-suited to Internet scripting for some of the very same reasons that make it ideal in other domains. Its modular design and rapid turnaround mix well with the intense demands of Internet development. In this part of the book, we'll find that Python does more than simply support Internet scripts; it also fosters qualities such as productivity and maintainability that are essential to Internet projects of all shapes and sizes.

10.1.1 Internet Scripting Topics

Internet programming entails many topics, so to make the presentation easier to digest, I've split this subject over the next six chapters of this book. This chapter introduces Internet fundamentals and explores sockets, the underlying communications mechanism of the Internet. From there, later chapters move on to discuss the client, the server, web site construction, and more advanced topics.

Each chapter assumes you've read the previous one, but you can generally skip around, especially if you have any experience in the Internet domain. Since these chapters represent a big portion (about a third) of this book at large, the following sections go into a few more details about what we'll be studying.

10.1.1.1 What we will cover

In conceptual terms, the Internet can roughly be thought of as being composed of multiple functional layers:

Low-level networking layers

Mechanisms such as the TCP/IP transport mechanism, which deal with transferring bytes between machines, but don't care what they mean

Sockets

The programmer's interface to the network, which runs on top of physical networking layers like TCP/IP

Higher-level protocols

Structured communication schemes such as FTP and email, which run on top of sockets and define message formats and standard addresses

Server-side web scripting (CGI)

Higher-level client/server communication protocols between web browsers and web servers, which also run on top of sockets

Higher-level frameworks and tools

Third-party systems such as Zope and JPython, which address much larger problem domains

In this chapter and Chapter 11, our main focus is on programming the second and third layers: sockets and higher-level protocols. We'll start this chapter at the bottom, learning about the socket model of network programming. Sockets aren't strictly tied to Internet scripting, but they are presented here because this is their primary role. As we'll see, most of what happens on the Internet happens through sockets, whether you notice or not.

After introducing sockets, the next chapter makes its way up to Python's client-side interfaces to higher-level protocols -- things like email and FTP transfers, which run on top of sockets. It turns out that a lot can be done with Python on the client alone, and Chapter 11 will sample the flavor of Python client-side scripting. The next three chapters then go on to present server-side scripting (programs that run on a server computer and are usually invoked by a web browser). Finally, the last chapter in this part, Chapter 15, briefly introduces even higher-level tools such as JPython and Zope.

Along the way, we will also put to work some of the operating-system and GUI interfaces we've studied earlier (e.g., processes, threads, signals, and Tkinter), and investigate some of the design choices and challenges that the Internet presents.

That last statement merits a few more words. Internet scripting, like GUIs, is one of the sexier application domains for Python. As in GUI work, there is an intangible but instant gratification in seeing a Python Internet program ship information all over the world. On the other hand, by its very nature, network programming imposes speed overheads and user interface limitations. Though it may not be a fashionable stance these days, some applications are still better off not being deployed on the Net. In this part of the book, we will take an honest look at the Net's trade-offs as they arise.

The Internet is also considered by many to be something of an ultimate proof of concept for open source tools. Indeed, much of the Net runs on top of a large number of tools, such as Python, Perl, the Apache web server, the sendmail program, and Linux. Moreover, new tools and technologies for programming the Web sometimes seem to appear faster than developers can absorb.

The good news is that Python's integration focus makes it a natural in such a heterogeneous world. Today, Python programs can be installed as client-side and server-side tools, embedded within HTML code, used as applets and servlets in Java applications, mixed into distributed object systems like CORBA and DCOM, integrated with XML-coded objects, and more. In more general terms, the rationale for using Python in the Internet domain is exactly the same as in any other: Python's emphasis on productivity, portability, and integration make it ideal for writing Internet programs that are open, maintainable, and delivered according to the ever-shrinking schedules in this field.

10.1.1.2 What we won't cover

Now that I've told you what we will cover in this book, I should also mention what we won't cover. Like Tkinter, the Internet is a large domain, and this part of the book is mostly an introduction to core concepts and representative tasks, not an exhaustive reference. There are simply too many Python Internet modules to include each in this text, but the examples here should help you understand the library manual entries for modules we don't have time to cover.

I also want to point out that higher-level tools like JPython and Zope are large systems in their own right, and they are best dealt with in more dedicated documents. Because books on both topics are likely to appear soon, we'll merely scratch their surfaces here. Moreover, this book says almost nothing about lower-level networking layers such as TCP/IP. If you're curious about what happens on the Internet at the bit-and-wire level, consult a good networking text for more details.

10.1.1.3 Running examples in this part of the book

Internet scripts generally imply execution contexts that earlier examples in this book have not. That is, it usually takes a bit more to run programs that talk over networks. Here are a few pragmatic notes about this part's examples up front:

  • You don't need to download extra packages to run examples in this part of the book. Except in Chapter 15, all of the examples we'll see are based on the standard set of Internet-related modules that come with Python (they are installed in Python's library directory).

  • You don't need a state-of-the-art network link or an account on a web server to run most of the examples in this and the following chapters; a PC and dial-up Internet account will usually suffice. We'll study configuration details along the way, but client-side programs are fairly simple to run.

  • You don't need an account on a web server machine to run the server-side scripts in later chapters (they can be run by any web browser connected to the Net), but you will need such an account to change these scripts.

When a Python script opens an Internet connection (with the socket or protocol modules), Python will happily use whatever Internet link exists on your machine, be that a dedicated T1 line, a DSL line, or a simple modem. For instance, opening a socket on a Windows PC automatically initiates processing to create a dial-up connection to your Internet Service Provider if needed (on my laptop, a Windows modem connection dialog automatically pops up). In other words, if you have a way to connect to the Net, you likely can run programs in this chapter.

Moreover, as long as your machine supports sockets, you probably can run many of the examples here even if you have no Internet connection at all. As we'll see, a machine name "localhost" or "" usually means the local computer itself. This allows you to test both the client and server sides of a dialog on the same computer without connecting to the Net. For example, you can run both socket-based clients and servers locally on a Windows PC without ever going out to the Net.

Some later examples assume that a particular kind of server is running on a server machine (e.g., FTP, POP, SMTP), but client-side scripts work on any Internet-aware machine with Python installed. Server-side examples in Chapter 12, Chapter 13, and Chapter 14 require more: you'll need a web server account to code CGI scripts, and you must download advanced third-party systems like JPython and Zope separately (or find them by viewing http://examples.oreilly.com/python2).

In the Beginning There Was Grail

Besides creating the Python language, Guido van Rossum also wrote a World Wide Web browser in Python, named (appropriately enough) Grail. Grail was partly developed as a demonstration of Python's capabilities. It allows users to browse the Web much like Netscape or Internet Explorer, but can also be programmed with Grail applets -- Python/Tkinter programs downloaded from a server when accessed and run on the client by the browser. Grail applets work much like Java applets in more widespread browsers (more on applets in Chapter 15).

Grail is no longer under development and is mostly used for research purposes today. But Python still reaps the benefits of the Grail project, in the form of a rich set of Internet tools. To write a full-featured web browser, you need to support a wide variety of Internet protocols, and Guido packaged support for all of these as standard library modules that are now shipped with the Python language.

Because of this legacy, Python now includes standard support for Usenet news (NNTP), email processing (POP, SMTP, IMAP), file transfers (FTP), web pages and interactions (HTTP, URLs, HTML, CGI), and other commonly used protocols (Telnet, Gopher, etc.). Python scripts can connect to all of these Internet components by simply importing the associated library module.

Since Grail, additional tools have been added to Python's library for parsing XML files, OpenSSL secure sockets, and more. But much of Python's Internet support can be traced back to the Grail browser -- another example of Python's support for code reuse at work. At this writing, you can still find the Grail at http://www.python.org.

    I l@ve RuBoard Previous Section Next Section