Java and the Internet

Bruce Eckel’s Thinking in Java Contents | Prev | Next

If
Java is, in fact, yet another computer programming language, you may question
why it is so important and why it is being promoted as a revolutionary step in
computer programming. The answer isn’t immediately obvious if
you’re coming from a traditional programming perspective. Although Java
will solve traditional stand-alone programming problems, the reason it is
important is that it will also solve programming problems on the World Wide Web.

What
is the Web?

The
Web can seem a bit of a mystery at first, with all this talk of
“surfing,” “presence” and “home pages.”
There has even been a growing reaction against “Internet-mania,”
questioning the economic value and outcome of such a sweeping movement.
It’s helpful to step back and see what it really is, but to do this you
must understand client/server systems, another aspect of computing that’s
full of confusing issues.


Client/Server
computing

The
primary idea of a client/server system is that you have a central repository of
information – some kind of data, typically in a database – that you
want to distribute on demand to some set of people or machines. A key to the
client/server concept is that the repository of information is
centrally
located
so
that it can be changed and so that those changes will propagate out to the
information consumers. Taken together, the information repository, the software
that distributes the information and the machine(s) where the information and
software reside is called the
server.
The software that resides on the remote machine, and that communicates with the
server, fetches the information, processes it, and displays it on the remote
machine is called the
client.

The
basic concept of client/server computing, then, is not so complicated. The
problems arise because you have a single server trying to serve many clients at
once. Generally a database management system is involved so the designer
“balances” the layout of data into tables for optimal use. In
addition, systems often allow a client to insert new information into a server.
This means you must ensure that one client’s new data doesn’t walk
over another client’s new data, or that data isn’t lost in the
process of adding it to the database. (This is called
transaction
processing.
)
As client software changes, it must be built, debugged and installed on the
client machines, which turns out to be more complicated and expensive than you
might think. It’s especially problematic to support multiple types of
computers and operating systems. Finally, there’s the all-important
performance issue: you might have hundreds of clients making requests of your
server at any one time, and so any small delay is crucial. To minimize latency,
programmers work hard to offload processing tasks, often to the client machine
but sometimes to other machines at the server site using so-called
middleware.
(Middleware is also used to improve maintainability.)

So
the simple idea of distributing information to people has so many layers of
complexity in implementing it that the whole problem can seem hopelessly
enigmatic. And yet it’s crucial: client/server computing accounts for
roughly half of all programming activities. It’s responsible for
everything from taking orders and credit-card transactions to the distribution
of any kind of data – stock market, scientific, government – you
name it. What we’ve come up with in the past is individual solutions to
individual problems, inventing a new solution each time. These were hard to
create and hard to use and the user had to learn a new interface for each one.
The entire client/server problem needs to be solved in a big way.


The
Web as a giant server

The
Web is actually one giant client-server system. It’s a bit worse than
that, since you have all the servers and clients coexisting on a single network
at once. You don’t need to know that, since all you care about is
connecting to and interacting with one server at a time (even though you might
be hopping around the world in your search for the correct server).

Initially
it was a simple one-way process. You made a request of a server and it handed
you a file, which your machine’s browser software (i.e. the client) would
interpret by formatting onto your local machine. But in short order people
began wanting to do more than just deliver pages from a server. They wanted
full client/server capability so that the client could feed information back to
the server, for example, to do database lookups on the server, to add new
information to the server or to place an order (which required more security
than the original systems offered). These are the changes we’ve been
seeing in the development of the Web.

The
Web browser was a big step forward: the concept that one piece of information
could be displayed on any type of computer without change. However, browsers
were still rather primitive and rapidly bogged down by the demands placed on
them. They weren’t particularly interactive and tended to clog up both
the server and the Internet because any time you needed to do something that
required programming you had to send information back to the server to be
processed. It could take many seconds or minutes to find out you had misspelled
something in your request. Since the browser was just a viewer it
couldn’t perform even the simplest computing tasks. (On the other hand,
it was safe, since it couldn’t execute any programs on your local machine
that contained bugs or viruses.)

Client-side
programming
[8]

The
Web’s initial server-browser design provided for interactive content, but
the interactivity was completely provided by the server. The server produced
static pages for the client browser, which would simply interpret and display
them. Basic HTML contains simple mechanisms for data gathering: text-entry
boxes, check boxes, radio boxes, lists and drop-down lists, as well as a button
that can only be programmed to reset the data on the form or
“submit” the data on the form back to the server. This submission
passes through the
Common
Gateway Interface

(CGI) provided on all Web servers. The text within the submission tells CGI
what to do with it. The most common action is to run a program located on the
server in a directory that’s typically called “cgi-bin.” (If
you watch the address window at the top of your browser when you push a button
on a Web page, you can sometimes see “cgi-bin” within all the
gobbledygook there.) These programs can be written in most languages. Perl is a
common choice because it is designed for text manipulation and is interpreted,
so it can be installed on any server regardless of processor or operating system.

Many
powerful Web sites today are built strictly on CGI, and you can in fact do
nearly anything with it. The problem is response time. The response of a CGI
program depends on how much data must be sent as well as the load on both the
server and the Internet. (On top of this, starting a CGI program tends to be
slow.) The initial designers of the Web did not foresee how rapidly this
bandwidth would be exhausted for the kinds of applications people developed.
For example, any sort of dynamic graphing is nearly impossible to perform with
consistency because a GIF file must be created and moved from the server to the
client for each version of the graph. And you’ve no doubt had direct
experience with something as simple as validating the data on an input form.
You press the submit button on a page; the data is shipped back to the server;
the server starts a CGI program that discovers an error, formats an HTML page
informing you of the error and sends the page back to you; you must then back
up a page and try again. Not only is this slow, it’s not elegant.

The
solution is client-side programming. Most machines that run Web browsers are
powerful engines capable of doing vast work, and with the original static HTML
approach they are sitting there, just idly waiting for the server to dish up
the next page. Client-side programming means that the Web browser is harnessed
to do whatever work it can, and the result for the user is a much speedier and
more interactive experience at your Web site.

The
problem with discussions of client-side programming is that they aren’t
very different from discussions of programming in general. The parameters are
almost the same, but the platform is different: a Web browser is like a limited
operating system. In the end, it’s still programming and this accounts
for the dizzying array of problems and solutions produced by client-side
programming. The rest of this section provides an overview of the issues and
approaches in client-side programming.


Plug-ins

One
of the most significant steps forward in client-side programming is the
development of the plug-in. This is a way for a programmer to add new
functionality to the browser by downloading a piece of code that plugs itself
into the appropriate spot in the browser. It tells the browser “from now
on you can perform this new activity.” (You need to download the plug-in
only once.) Some fast and powerful behavior is added to browsers via plug-ins,
but writing a plug-in is not a trivial task and isn’t something
you’d want to do as part of the process of building a particular site.
The value of the plug-in for client-side programming is that it allows an
expert programmer to develop a new language and add that language to a browser
without
the permission of the browser manufacturer
.
Thus, plug-ins provide the back door that allows the creation of new
client-side programming languages (although not all languages are implemented
as plug-ins).


Scripting
languages

Plug-ins
resulted in an explosion of scripting languages. With a scripting language you
embed the source code for your client-side program directly into the HTML page
and the plug-in that interprets that language is automatically activated while
the HTML page is being displayed. Scripting languages tend to be reasonably
simple to understand, and because they are simply text that is part of an HTML
page they load very quickly as part of the single server hit required to
procure that page. The trade-off is that your code is exposed for everyone to
see (and steal) but generally you aren’t doing amazingly sophisticated
things with scripting languages so it’s not too much of a hardship.

This
points out that scripting languages are really intended to solve specific types
of problems, primarily the creation of richer and more interactive graphical
user interfaces (GUIs). However, a scripting language might solve 80 percent of
the problems encountered in client-side programming. Your problems might very
well fit completely within that 80 percent, and since scripting languages tend
to be easier and faster to develop, you should probably consider a scripting
language before looking at a more involved solution such as Java or ActiveX
programming.

The
most commonly-discussed scripting languages are JavaScript (which has nothing
to do with Java; it’s named that way just to grab some of Java’s
marketing momentum), VBScript (which looks like Visual Basic) and Tcl/Tk, which
comes from the popular cross-platform GUI-building language. There are others
out there and no doubt more in development.

JavaScript
is probably the most commonly supported. It comes built into both Netscape
Navigator and the Microsoft Internet Explorer (IE). In addition, there are
probably more JavaScript books out than for the other languages, and some tools
automatically create pages using JavaScript. However, if you’re already
fluent in Visual Basic or Tcl/Tk, you’ll be more productive using those
scripting languages rather than learning a new one. (You’ll have your
hands full dealing with the Web issues already.)


Java

If
a scripting language can solve 80 percent of the client-side programming
problems, what about the other 20 percent – the “really hard
stuff?” The most popular solution today is Java. Not only is it a
powerful programming language built to be secure, cross-platform and
international, but Java is being continuously extended to provide language
features and libraries that elegantly handle problems that are difficult in
traditional programming languages, such as multithreading, database access,
network programming and distributed computing. Java allows client-side
programming via the
applet.

An
applet is a mini-program that will run only under a Web browser. The applet is
downloaded automatically as part of a Web page (just as, for example, a graphic
is automatically downloaded). When the applet is activated it executes a
program. This is part of its beauty – it provides you with a way to
automatically distribute the client software from the server at the time the
user needs the client software, and no sooner. They get the latest version of
the client software without fail and without difficult re-installation. Because
of the way Java is designed, the programmer needs to create only a single
program, and that program automatically works with all computers that have
browsers with built-in Java interpreters. (This safely includes the vast
majority of machines.) Since Java is a full-fledged programming language, you
can do as much work as possible on the client before and after making requests
of the server. For example, you won’t need to send a request form across
the Internet to discover that you’ve gotten a date or some other
parameter wrong, and your client computer can quickly do the work of plotting
data instead of waiting for the server to make a plot and ship a graphic image
back to you. Not only do you get the immediate win of speed and responsiveness,
but the general network traffic and load upon servers can be reduced,
preventing the entire Internet from slowing down.

One
advantage a Java applet has over a scripted program is that it’s in
compiled form, so the source code isn’t available to the client. On the
other hand, a Java applet can be decompiled without too much trouble, and
hiding your code is often not an important issue anyway. Two other factors can
be important. As you will see later in the book, a compiled Java applet can
comprise many modules and take multiple server “hits” (accesses) to
download. (In Java 1.1

this is minimized by Java archives, called JAR files, that allow all the
required modules to be packaged together for a single download.) A scripted
program will just be integrated into the Web page as part of its text (and will
generally be smaller and reduce server hits). This could be important to the
responsiveness of your Web site. Another factor is the all-important learning
curve. Regardless of what you’ve heard, Java is not a trivial language to
learn. If you’re a Visual Basic programmer, moving to VBScript will be
your fastest solution and since it will probably solve most typical
client/server problems you might be hard pressed to justify learning Java. If
you’re experienced with a scripting language you will certainly benefit
from looking at JavaScript or VBScript before committing to Java, since they
might fit your needs handily and you’ll be more productive sooner.


ActiveX

To
some degree, the competitor to Java is Microsoft’s ActiveX, although it
takes a completely different approach. ActiveX is originally a Windows-only
solution, although it is now being developed via an independent consortium to
become cross-platform. Effectively, ActiveX says “if your program
connects to its environment just so, it can be dropped into a Web page and run
under a browser that supports ActiveX.” (IE directly supports ActiveX and
Netscape does so using a plug-in.) Thus, ActiveX does not constrain you to a
particular language. If, for example, you’re already an experienced
Windows programmer using a language such as C++, Visual Basic, or
Borland’s Delphi, you can create ActiveX components with almost no
changes to your programming knowledge. ActiveX also provides a path for the use
of legacy code in your Web pages.


Security

Automatically
downloading and running programs across the Internet can sound like a
virus-builder’s dream. ActiveX especially brings up the thorny issue of
security in client-side programming. If you click on a Web site, you might
automatically download any number of things along with the HTML page: GIF
files, script code, compiled Java code, and ActiveX components. Some of these
are benign; GIF files can’t do any harm, and scripting languages are
generally limited in what they can do. Java was also designed to run its
applets within a “sandbox” of safety, which prevents it from
writing to disk or accessing memory outside the sandbox.

ActiveX
is at the opposite end of the spectrum. Programming with ActiveX is like
programming Windows – you can do anything you want. So if you click on a
page that downloads an ActiveX component, that component might cause damage to
the files on your disk. Of course, programs that you load onto your computer
that are not restricted to running inside a Web browser can do the same thing.
Viruses downloaded from Bulletin-Board Systems (BBSs) have long been a problem,
but the speed of the Internet amplifies the difficulty.

The
solution seems to be “digital signatures,” whereby code is verified
to show who the author is. This is based on the idea that a virus works because
its creator can be anonymous, so if you remove the anonymity individuals will
be forced to be responsible for their actions. This seems like a good plan
because it allows programs to be much more functional, and I suspect it will
eliminate malicious mischief. If, however, a program has an unintentional bug
that’s destructive it will still cause problems.

The
Java approach is to prevent these problems from occurring, via the sandbox. The
Java interpreter that lives on your local Web browser examines the applet for
any untoward instructions as the applet is being loaded. In particular, the
applet cannot write files to disk or erase files (one of the mainstays of the
virus). Applets are generally considered to be safe, and since this is
essential for reliable client-server systems, any bugs that allow viruses are
rapidly repaired. (It’s worth noting that the browser software actually
enforces these security restrictions, and some browsers allow you to select
different security levels to provide varying degrees of access to your system.)

You
might be skeptical of this rather draconian restriction against writing files
to your local disk. For example, you may want to build a local database or save
data for later use offline. The initial vision seemed to be that eventually
everyone would be online to do anything important, but that was soon seen to be
impractical (although low-cost “Internet appliances” might someday
satisfy the needs of a significant segment of users). The solution is the
“signed applet” that uses public-key encryption to verify that an
applet does indeed come from where it claims it does. A signed applet can then
go ahead and trash your disk, but the theory is that since you can now hold the
applet creator accountable they won’t do vicious things. Java 1.1

provides a framework for digital signatures so that you will eventually be able
to allow an applet to step outside the sandbox if necessary.

Digital
signatures have missed an important issue, which is the speed that people move
around on the Internet. If you download a buggy program and it does something
untoward, how long will it be before you discover the damage? It could be days
or even weeks. And by then, how will you track down the program that’s
done it (and what good will it do at that point?).


Internet
vs. Intranet

The
Web is the most general solution to the client/server problem, so it makes
sense that you can use the same technology to solve a subset of the problem, in
particular the classic client/server problem within a company. With traditional
client/server approaches you have the problem of multiple different types of
client computers, as well as the difficulty of installing new client software,
both of which are handily solved with Web browsers and client-side programming.
When Web technology is used for an information network that is restricted to a
particular company, it is referred to as an
Intranet.
Intranets provide much greater security than the Internet, since you can
physically control access to the servers within your company. In terms of
training, it seems that once people understand the general concept of a browser
it’s much easier for them to deal with differences in the way pages and
applets look, so the learning curve for new kinds of systems seems to be reduced.

The
security problem brings us to one of the divisions that seems to be
automatically forming in the world of client-side programming. If your program
is running on the Internet, you don’t know what platform it will be
working under and you want to be extra careful that you don’t disseminate
buggy code. You need something cross-platform and secure, like a scripting
language or Java.

If
you’re running on an Intranet, you might have a different set of
constraints. It’s not uncommon that your machines could all be
Intel/Windows platforms. On an Intranet, you’re responsible for the
quality of your own code and can repair bugs when they’re discovered. In
addition, you might already have a body of legacy code that you’ve been
using in a more traditional client/server approach, whereby you must physically
install client programs every time you do an upgrade. The time wasted in
installing upgrades is the most compelling reason to move to browsers because
upgrades are invisible and automatic. If you are involved in such an Intranet,
the most sensible approach to take is ActiveX rather than trying to recode your
programs in a new language.

Server-side
programming

This
whole discussion has ignored the issue of server-side programming. What happens
when you make a request of a server? Most of the time the request is simply
“send me this file.” Your browser then interprets the file in some
appropriate fashion: as an HTML page, a graphic image, a Java applet, a script
program, etc. A more complicated request to a server generally involves a
database transaction. A common scenario involves a request for a complex
database search, which the server then formats into an HTML page and sends to
you as the result. (Of course, if the client has more intelligence via Java or
a scripting language, the raw data can be sent and formatted at the client end,
which will be faster and less load on the server.) Or you might want to
register your name in a database when you join a group or place an order, which
will involve changes to that database. These database requests must be
processed via some code on the server side, which is generally referred to as
server-side
programming
.
Traditionally, server-side programming has been performed using Perl and CGI
scripts, but more sophisticated systems have been appearing. These include
Java-based Web servers that allow you to perform all your server-side
programming in Java by writing what are called
servlets.

A
separate arena: applications

Most
of the brouhaha over Java has been about applets. Java is actually a
general-purpose programming language that can solve any type of problem, at
least in theory. And as pointed out previously, there might be more effective
ways to solve most client/server problems. When you move out of the applet
arena (and simultaneously release the restrictions, such as the one against
writing to disk) you enter the world of general-purpose applications that run
standalone, without a Web browser, just like any ordinary program does. Here,
Java’s strength is not only in its portability, but also its
programmability. As you’ll see throughout this book, Java has many
features that allow you to create robust programs in a shorter period than with
previous programming languages.


[8]
The material in this section is adapted from an article by the author that
originally appeared on Mainspring, at
www.mainspring.com.
Used with permission.

More by Author

Must Read