Damodar's Musings

web development and miscellany

Help is at hand for everyone who has ever cursed at having to add entries to their web.xml just because they wanted to use a third party framework.

With JSR 315, web.xml is no longer a single monolithic entity, but instead can be assembled from multiple pieces (called web fragments) into a single whole.

For instance, here’s a project I rolled earlier, that is packaged as jSwengsol.jar. This external JAR file would be deployed into your web application’s WEB-INF/lib folder, and as usual, is able to contribute web components (such as servlets, filters, or listeners). In my JAR, these components live within the com.swengsol package.

Note the presence of two special entries within this JAR file, the META-INF/web-fragment.xml file and the resources directory.

The web fragment entry allows an external JAR to package the configuration of its contained web components. This fragment will be merged into your own application’s web.xml when your application is deployed into the servlet container.

The optional resources folder can contain static resources (such as HTML, CSS, and JS files) as well as dynamic resources (JSPs). Resources provided within this folder are publicly available. Note that these resources live within your web application’s WEB-INF folder,  and yet are visible to the external world. This means that frameworks no longer have to implement workarounds (such as filters mapped to URL paths) to serve up resources that are required by that framework’s components.

Note that jSwengsol.jar is deployed into our main web application’s WEB-INF/lib.

This JAR file will now be described.

The IDEA configuration for generating the JAR file is as shown:

To build this artifact, I simply use Build > Build ‘jSwengsol’ artifact:

Squawker.java is a standard helper class that is used by our main web application. It is a very simple helper that takes a message and generates the HTML scaffolding around it. An important point of note is that it retrieves the stylesheet straight out of the web application’s WEB-INF/lib/jSwengsol.jar!resources/styles.css.

package com.swengsol;
/**
* User: Damodar Chetty
* Date: Jul 23, 2010
* Time: 8:13:26 AM
* An introduction to Servlet 3.0 and Java 6 (TC JUG)
* (c) Software Engineering Solutions, Inc.
*/
public class Squawker {
	private String title;
	private String body;
	public Squawker(String body) {
		this("Introducing Servlet 3.0", body);
	}
	public Squawker(String title, String body) {
		this.title = title;
		this.body = body;
	}
	public String getHTML() {
		String html = "<html><head>";
		if (null == title || title.length() ==0)
			title = "JSR315: Servlet 3.0";
		html += "<title>" + title + "</title>";
		html +=
		"<link rel=stylesheet type=\"text/css\"  href=\"styles.css\">";
		html += "</head> <body><p>" ;
		html += body;
		html += "</p>";
		html += "<p class=\"copyright\">Damodar Chetty,”;
		html += “ Software Engineering Solutions, Inc.</p>";
		html += "</body></html>";
		return html;
	}
}

The stylesheet, styles.css, is also a very straightforward affair:

.mainpara {
	font-size: 15pt;
	font-family:'trebuchet ms';
	padding:20;
	border-color:#736AFF;
	position:absolute;
	top:50;
	left: 200;
	border-style: dotted;
}
.copyright {
	width:375px;
	font-size:12pt;
	font-weight:bold;
	font-family:serif;
	background-color:black;
	color:white;
	position: absolute;
	top:300;
	left:500;
	padding:5;
}
body {background-color:#E0FFFF; color:#7F5217;}

In addition, you can also directly request the static resource, jSwengsol.html (listed below) as if it were directly within the root of our web application context, http://localhost:8008/jsr315/jSwengsol.html, where jsr315 is the context path for the web application that includes this JAR file.

<html>
<head>
<title>Introducing Servlet 3.0</title>
<link rel=stylesheet type="text/css"  href="styles.css">
</head>
<body>
<p>This is a static resource file served up from jSwengsol.jar</p>
<p>Damodar Chetty, Software Engineering Solutions, Inc.</p>
</body>
</html>

Likewise, you can access the JSP file as http://localhost:8008/jsr315/jSwengsol.jsp:

<%@ page import="java.util.Date" %>
<%--
  Created by IntelliJ IDEA.
  User: Monsty
  Date: Jul 22, 2010
  Time: 9:24:48 PM
--%>
<%@ page contentType="text/html;charset=UTF-8" language="java" %>
<html>
  <head><title>Introducing Servlet 3.0</title>
  <link rel=stylesheet type="text/css"  href="styles.css">
  </head>
  <body>
    <p class="mainpara">
      jSwengsol.jar!jSwengsol.jsp says the time is now <%=new Date()%>
   </p>
    <p class="copyright">
      Damodar Chetty, Software Engineering Solutions, Inc.
    </p>
  </body>
</html>

Now, let’s take a look at the web fragment, META-INF/web-fragment.xml:

<web-fragment>
  <name>jSwengsolFragment</name>
  <listener>
    <listener-class>com.swengsol.listeners.MyServletRequestListener</listener-class>
  </listener>
</web-fragment>

Firstly, note that a fragment can be named. This comes in handy when you are trying to specify ordering rules that define how multiple fragments can be combined into the application’s web.xml. Second, note that as its name suggests, this fragment is not a complete descriptor. Here it contains only the listener’s declaration within this fragment is automatically rolled into the main web application’s web.xml.

package com.swengsol.listeners;

import javax.servlet.ServletRequestEvent;
import javax.servlet.ServletRequestListener;
import javax.servlet.http.HttpServletRequest;
import java.util.Date;
import java.util.logging.Logger;

//1. No annotation - uses web-fragment.xml
public class MyServletRequestListener implements ServletRequestListener {
    private static final Logger logger = Logger.getLogger("com.swengsol");
    // Public constructor is required by servlet spec
    public MyServletRequestListener() {
    }

    public void requestDestroyed(ServletRequestEvent servletRequestEvent) {
        logger.info("RequestListener > request destroyed - " + new Date());
    }

    public void requestInitialized(ServletRequestEvent servletRequestEvent) {
        HttpServletRequest req =
		(HttpServletRequest)servletRequestEvent.getServletRequest();
        String uri = req.getRequestURL().toString();
        logger.info("RequestListener > request initialized - " + uri + " - "
		 + new Date());
    }
}

There’s nothing noteworthy about this listener, except to point out that I specifically avoided using an annotation here to show how a web fragment functions.

To show how annotations could have been used instead, check out the JARServlet servlet:

package com.swengsol.servlets;

import com.swengsol.Squawker;

import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.Date;
import java.util.logging.Logger;

/**
 * User: Damodar Chetty
 * Date: Jul 23, 2010
 * Time: 1:12:21 PM
 * An introduction to Servlet 3.0 and Java 6 (TC JUG)
 * (c) Software Engineering Solutions, Inc.
 */
@WebServlet("/JarServlet")
public class JARServlet extends HttpServlet {
    private static final Logger logger = Logger.getLogger("com.swengsol");

    protected void doPost(HttpServletRequest request,
			HttpServletResponse response)
        throws ServletException, IOException {
        doGet(request, response);
    }

    protected void doGet(HttpServletRequest request,
			HttpServletResponse response)
        throws ServletException, IOException {
        logger.info("Within JARServlet request>>>" + new Date());

        Squawker squawker =
		new Squawker("Hello from jSwengsol.jar!JARServlet: " + new Date());
        response.setContentType("text/html");
        PrintWriter out = response.getWriter();
        out.println(squawker.getHTML());
    }
}

In this case, the servlet annotation is sufficient to register this servlet with the main web application, so no fragment entry is necessary.

That’s about it for now. As you can tell, there’s legs to this story – and I can imagine it being very useful as we head off into the future.

Whoa! I’ve just cracked the 600 ceiling on the Amazon ranks of reviewers.  Still a way to go – but I’m heading in the right direction it seems :)

http://www.amazon.com/gp/pdp/profile/A3FEGTOLCWXSV4/ref=cm_cr_thx_pdp

Next up is the idea of programmatic definition of components. One might argue that this is not a real benefit, and I might even agree.

For a while now, it has been vogue to decry the explicit instantiation of components – and the new operator has been driven out of its usual home, and into factory methods and abstract factories. Then there was the XML movement which decreed that everything configurable should happen within configuration files – which gave rise to a slew of deployment descriptors. The idea was sound – why should a configuration change need a compile/build and deploy? Well it seems we’re back full circle. With programmatic definition, we can once again instantiate components programmatically and then configure them.

In this example, we’ll see how you might programmatically instantiate servlets, filters, and listeners.

First, let’s revisit our servlet context listener from the last post. For brevity, I’m only showing the parts that are new.

@WebListener
public class MyServletContextListener implements ServletContextListener {
		…
    public void contextInitialized(ServletContextEvent sce) {
        logger.info("Context Listener > Initialized");
        doProgrammaticRegistration(sce.getServletContext());
    }

    private void doProgrammaticRegistration(ServletContext sc) {
        ServletRegistration.Dynamic dynamic = sc.addServlet(
					"ProgrammaticServlet",
					"com.swengsol.servlets.MyProgrammaticServlet");
        dynamic.addMapping("/ProgrammaticServlet");
        FilterRegistration.Dynamic filter = sc.addFilter(
					"MyProgrammaticFilter",
					"com.swengsol.filters.MyProgrammaticFilter");
        EnumSet<DispatcherType> disps = EnumSet.of(
					DispatcherType.REQUEST, DispatcherType.FORWARD);
        filter.addMappingForServletNames(disps, true,
					"ProgrammaticServlet");
    }
}

As shown, during context initialization, we instantiate a new servlet, and use its returned ServletRegistration.Dynamic instance to configure the appropriate mapping. We take similar steps with a filter. These steps are similar to what would have happened when a servlet container encountered the <servlet>, <servlet-mapping>, <filter>, and <filter-mapping> elements in a web deployment descriptor.

As can be seen in the next listings, the servlet and filter are rather straightforward. Since the configuration occurs programmatically, they do not even need any annotations to be used.

package com.swengsol.filters;

import javax.servlet.*;
import javax.servlet.http.HttpServletRequest;
import java.io.IOException;
import java.util.logging.Logger;

/**
 * User: Monsty
 * Date: Jul 24, 2010
 * Time: 8:45:10 AM
 * An introduction to Servlet 3.0 and Java 6 (TC JUG)
 * (c) Software Engineering Solutions, Inc.
 */

//1. Not even annotation is required
public class MyProgrammaticFilter implements Filter {
    private static final Logger logger = Logger.getLogger("com.swengsol");

    public void destroy() {
    }

    public void doFilter(ServletRequest req, ServletResponse resp,
	FilterChain chain) throws ServletException, IOException {
        logger.info("> Filtering: " +
			((HttpServletRequest)req).getRequestURI());
        chain.doFilter(req, resp);
    }

    public void init(FilterConfig config) throws ServletException {
        logger.info("> Filter: " + "Initializing ProgrammaticFilter");
	}
}

The servlet class is similarly simple. We will see the Squawker class in the next post. For now, all you need to know is that it is used to generate the HTML scaffolding around the text being displayed.

package com.swengsol.servlets;

import com.swengsol.Squawker;

import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.Date;
import java.util.logging.Logger;

/**
 * User: Damodar Chetty
 * Date: Jul 24, 2010
 * Time: 8:46:17 AM
 * An introduction to Servlet 3.0 and Java 6 (TC JUG)
 * (c) Software Engineering Solutions, Inc.
 */

//1. Note not even an annotation is required.
public class MyProgrammaticServlet extends HttpServlet {
    private static final Logger logger =
		Logger.getLogger("com.swengsol");

    protected void doPost(HttpServletRequest request,
	HttpServletResponse response) throws ServletException, IOException {
        doGet(request, response);
    }

    protected void doGet(HttpServletRequest request,
		HttpServletResponse response) throws ServletException, IOException {
        logger.info("Within request>>>" + getServletName());
        Squawker squawker = new Squawker("Added programmatically " +
				"(no annotation/web.xml entries): " + new Date());
        response.setContentType("text/html");
        PrintWriter out = response.getWriter();
        out.println(squawker.getHTML());
    }
}

Simple enough?

In this series of articles, I’m going to describe elements of my talk at the TC JUG.

I chose to present the topics in JSR 315 in an ascending order of importance. So let’s begin with what i think is the least important of the lot. I’m sure others will argue with my ordering, but since this is my blog, I get to make the calls ;)

So, let’s begin with annotations.

With JSR 315, the servlet container is finally a completely optional artifact. That’s a bit of oversimplification, but it is certainly true that for the most part. The servlet container must introspect all the classes in WEB-INF/classes as well as in WEB-INF/lib/*.jar looking for annotations. You can speed up servlet container starts by preventing this scanning of classes – if you set the metadata-complete attribute to true on the web-app element of your application’s web.xml.

Let’s begin by taking a look at a filter in the new world.

First, notice that the web.xml file is simply a shadow of its former self – and could have been omitted.

<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns="http://java.sun.com/xml/ns/javaee"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://java.sun.com/xml/ns/javaee

http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd"

	version="3.0">
</web-app>

The filter is declared using the @WebFilter annotation and is registered as a filter for the /foo URL pattern.

package com.swengsol.filters;
import javax.servlet.*;
import javax.servlet.annotation.WebFilter;
import javax.servlet.http.HttpServletRequest;
import java.io.IOException;
import java.util.logging.Logger;

/**
 * User: Damodar Chetty
 * Date: Jul 24, 2010
 * Time: 8:39:20 AM
 * An introduction to Servlet 3.0 and Java 6 (TC JUG)
 * (c) Software Engineering Solutions, Inc.
 */

// 1. Annotations can map to either url patterns or servlet names
@WebFilter(urlPatterns = "/foo", filterName = "MyFilter")
public class MyFilter implements Filter {
    private static final Logger logger =
		Logger.getLogger("com.swengsol");
    public void destroy() {
    }

    public void doFilter(ServletRequest req, ServletResponse resp,
		FilterChain chain) throws ServletException, IOException {
        logger.info("> Filtering: " +
			((HttpServletRequest)req).getRequestURI());
        chain.doFilter(req, resp);
    }
    public void init(FilterConfig config) throws ServletException {
        logger.info("> Filter: " + "Initializing filter");
    }
}

The key element to notice here is the annotation that maps the url patterns /foo and /bar to our servlet. The filter we defined earlier will be invoked only when this servlet is requested using the pattern, /foo.

package com.swengsol.servlets;

import com.swengsol.Squawker;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.Date;
import java.util.logging.Logger;

/**
* User: Damodar Chetty
* Date: Jul 24, 2010
* Time: 8:26:34 AM
* An introduction to Servlet 3.0 and Java 6 (TC JUG)
* (c) Software Engineering Solutions, Inc.
*/

@WebServlet(name="HelloWorldServlet", urlPatterns={"/foo", "/bar"})
public class HelloWorldServlet extends HttpServlet {
	private static final Logger logger =
		Logger.getLogger("com.swengsol");
	protected void doGet(HttpServletRequest request,
	  HttpServletResponse response)
	  throws ServletException, IOException {
		logger.info("Within request>>>" + getServletName());
		Squawker squawker =
		new Squawker("HelloWorldServlet: annotations map to " +
			"url patterns /foo and /bar: " + new Date());
		response.setContentType("text/html");
		PrintWriter out = response.getWriter();
		out.println(squawker.getHTML());
	}
}

The final annotation of interest is @WebListener, which we demonstrate using a trivial context listener. Registering a different listener is just as simple. You add this annotation to any class that implements a listener interface, and you are in business.

package com.swengsol.listeners;
/**
 * User: Damodar Chetty
 * Date: Jul 23, 2010
 * Time: 10:48:14 PM
 * An introduction to Servlet 3.0 and Java 6 (TC JUG)
 * (c) Software Engineering Solutions, Inc.
 */

import javax.servlet.*;
import javax.servlet.annotation.WebListener;
import java.util.EnumSet;
import java.util.concurrent.Executor;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
import java.util.logging.Logger;

@WebListener
public class MyServletContextListener implements ServletContextListener {
    private static final Logger logger =
			Logger.getLogger("com.swengsol");

    // Public constructor is required by servlet spec
    public MyServletContextListener() {
    }
    public void contextInitialized(ServletContextEvent sce) {
        logger.info("Context Listener > Initialized");
    }
    public void contextDestroyed(ServletContextEvent sce) {
    }
}

So is the web.xml completely redundant? Not quite.

You can still use the web.xml to specify the ordering of components such as filters.

Likewise, you would use it when you wanted to override any configuration settings that may have been specified via annotations.

In just a bit, we’ll look at the programmatic definition of servlets and filters.

I’ll be speaking at the Twin Cities Java Users Group tomorrow.

I hope to meet a number of people from my former team, and I do hope some of my new teammates will come out as well.

In case you aren’t able to make it, but are interested in learning what I’m going to speak about … here’s a sneak preview.

JSR-315-ServletSpec3.0-Damodar-Chetty-swengsol.com

See you there!

While working on a presentation for JSR 315 – servlet specification 3.0, I realized that a key aspect to understanding asynchronous servlets was to understand how asynchronous processing worked in Java in the first place.

One thing led to another, and soon I was neck deep in executors and executor services – the key building blocks of aysnchronous processing in Java.

In this blog post, I summarize my learnings on this topic.

Concepts

A task is defined as a small independent activity that represents some unit of work that starts at some point, requires some activity or computation, and then terminates. In a web server, each individual incoming request meets this definition. In Java, these are represented by instances of Runnable or Callable.

A thread can be considered to be a running instance of a task. If a task represents some unit of work that needs to be done, then a thread represents the actual performance of that task. In Java, these are represented by instances of Thread.

Synchronous processing occurs when a task must be done in the main thread of execution. In other words, the main program must wait until the current task is done, before it can continue on with its processing.

Asynchronous processing is when the main thread delegates the processing of a task to a separate independent thread. That thread is then responsible for the processing associated with the task, while the main thread returns to doing whatever main programs do.

A thread pool represents one or more threads sitting around waiting for work to be assigned to them. A pool of threads brings a number of advantages to the party. First, it limits the cost of setting up and tearing down threads, since threads in the pool are reused rather than created from scratch each time. Second, it can serve to limit the total number of active threads in the system, which reduces the memory and computing burdens on the server. Finally, it lets you delegate the problem of managing threads to the pool, simplifying your application.

At this point, it is important to note that there are three critical mechanisms at work here – there’s the arrival of tasks to be processed (someone is requesting some units of work to be done), there is the submission of tasks to some holding tank, and then there’s the actual processing of each task. The Executor framework in Java separates the latter two mechanisms – submission and processing.

The arrival of requests is generally out of the control of the program – and may be driven by requests from clients. The submission of a request is typically made by requesting that the task be added to a queue of incoming tasks, while the processing is implemented using a pool of threads that sit idle waiting to be assigned an incoming task to process.

Java 5.0 and Thread Pools

Java 5.0 comes with its own thread pool implementation – within the Executor and ExecutorService interfaces. This makes it easier for you to use thread pools within your own programs.

An Executor provides application programs with a convenient abstraction for thinking about tasks. Rather than thinking in terms of threads, an application now deals simply with instances of Runnable, which it then passes to an Executor to process.

The ExecutorService interface extends the simplistic Executor interface, by adding lifecycle methods to manage the threads in the pool. For instance, you can shutdown the threads in the pool.

In addition, while the Executor lets you submit a single task for execution by a thread in the pool, the ExecutorService lets you submit a collection of tasks for execution, or to obtain a Future object that you can use to track the progress of that task.

Runnable and Callable

The Executor framework represents tasks using instances of either Runnable or Callable. Runnable‘s run() method is limiting in that it cannot return a value, nor throw a checked exception.  Callable is a more functional version, and defines a call() method that allows the return of some computed value, and even throwing an exception if necessary.

Controlling your Tasks

You can get detailed information about your tasks using the FutureTask class, an instance of which can wrap either a Callable or a Runnable. You can get an instance of this as the return value of the submit() method of an ExecutorService, or you can manually wrap your task in a FutureTask before calling the execute() method.

The FutureTask instance, which implements the Future interface, gives you the ability to monitor a running task, cancel it, and to retrieve its result (as the return value of a Callable‘s call() method).

ThreadPoolExecutor

The most common implementation of ExecutorService that we will encounter is the ThreadPoolExecutor.

Tasks are submitted to a ThreadPoolExecutor as instances of Runnable. The executor is then responsible for the actual processing, and your application no longer needs to care about what happens behind that abstraction.

This executor is defined in terms of:

  1. a pool of threads (with a configured number of minimum and maximum threads),
  2. a work queue,
    this queue holds the submitted tasks which are still to be assigned a Thread from the pool. There are two main types of queue – bounded and unbounded.Adding tasks to an unbounded queue always succeeds.A bounded queue (such as a LinkedBlockingQueue with a fixed capacity) rejects tasks once the number of pending tasks reaches its maximum capacity.
  3. a handler that defines how rejections should be handled (the saturation policy).
    When a task cannot be added to a queue, the thread pool will call its registered rejection handler to determine what should happen. The default rejection policy is to simply throw a RejectedExecutionException runtime exception, and it is up to the program to catch the exception and process it. Other policies exist, such as DiscardPolicy, which silently discards the task without any notifications.
  4. a thread factory .
    By default, new threads constructed by the executor will have certain properties – such as a priority of Thread.NORM_PRIORITY, and a thread name that is based on the pool number and thread number within the pool. You can use a custom thread factory to override these defaults.

Algorithm for using an Executor

1.  Create an Executor

You first create an instance of an Executor or ExecutorService in some global context (such as the application context for a servlet container).

The Executors class has a number of convenience static factory methods that create an ExecutorService. For instance, newFixedThreadPool() returns a ThreadPoolExecutor instance which is intialized with an unbounded queue and a fixed number of threads; while newCachedThreadPool() returns a ThreadPoolExecutor instance initialized with an unbounded queue and unbounded number of threads. In the latter case, existing threads are reused if available, and if no free thread is available, a new one is created and added to the pool. Threads that have been idle for longer than a timeout period will be removed from the pool.


private static final Executor executor = Executors.newFixedThreadPool(10);
 

Rather than use these convenience methods, you might find it more appropriate to instantiate your own fully customized version of ThreadPoolExecutor – using one of its many constructors.


private static final Executor executor = new ThreadPoolExecutor(10, 10, 50000L,   TimeUnit.MILLISECONDS, new LinkedBlockingQueue<Runnable>(100));
 

This creates a bounded queue of size 100, with a thread pool of fixed size 10.

2. Create one or more tasks

You are required to have one or more tasks to be performed as instances of either Runnable or Callable.

3. Submit the task to the Executor

Once you have an ExecutorService, you can submit a task to it using either the submit() or execute() methods, and a free thread from the pool will automatically dequeue the tasks and execute it.

4. Execute the task

The Executor is then responsible for managing the task’s execution as well as the thread pool and queue. Exactly what happens here depends on the thread pool size limits, the number of idle threads, and the bounds of the queue.

In general, if the pool has fewer than its configured number of minimum threads, new threads will be created to handle queued tasks until that limit is reached.

If the number of threads is higher than the configured minimum, then the pool is reluctant to start any more threads. Instead, the task is queued until a thread is freed up to process it. If the queue is full, then a new thread must be started to handle it.

If the number of threads is at the maximum, the pool is unable to start new threads, and hence the task will either be added to the queue, or will be rejected if the queue is full.

The threads in the pool will continually monitor the queue, for tasks to run. Threads that are higher than the configured minimum become ripe for termination if they have been idle for longer than the configured timeout period.

5. Shutdown the Executor

At application shutdown, we terminate the executor by invoking its shutdown() method. You can choose to terminate it gracefully, or abruptly.

C’est le fin!

Bibliography:

Java Concurrency in Practice, Goetz et al

Java Threads, Oaks and Wong

Today I got a chance to play around with building the brand new Tomcat 7.

If you’ve read my book, you know the steps.

To summarize:

1) Get the latest JDK, which right now is at version 6u21. Add {JAVA_HOME}/bin to your path.

2)  Get winmd5sum – those of you who know me are aware of how paranoid i am :)

3) Get Apache Ant, which is currently at 1.8.1. Add your {ANT_HOME}/bin folder to your path.

4) Get the Subversion command line client, which is at 1.6.12.

That’s all you need.

Now, download the latest source to an apporpriate folder on  your workstation:

svn co http://svn.apache.org/repos/asf/tomcat/tc7.0.x/tags/TOMCAT_7_0_0

Finally, change over to the  TOMCAT_7_0_0 folder that was just created. You should find build.xml in there.

Simply type “ant” and sit back as your shiny new Tomcat installation is built.

To start up your new Tomcat build, change directory to the output/build/bin and run startup.bat.

In your browser, mosey on over to http://localhost to access your Tomcat installation.

Compared to building Tomcat 6, I was pleasantly surprised with the ease with which  Tomcat was built from its source.  So far so good with this new iteration.

Eclipse Helios

To get Tomcat to run within Eclipse Helios, I downloaded the 64-bit version of Eclipse 3.6 (eclipse-SDK-3.6-win32-x86_64.zip).

I set up the Eclipse Classpath Variables ANT_HOME and TOMCAT_LIBS_BASE to point to the Ant install folder and the path to where the Ant build downloaded its JARs (c:\usr\share\java, by default).

A couple of warnings are in order here:
1. run the “ant extras” target to download the webservices JARs that are required to build the project in Helios.
2. Rename the “eclipse.project” and “eclipse.classpath” to drop the “eclipse” part of the file name.

Now, add a Run configuration for the org.apache.catalina.startup.Bootstrap class as a Java Application. Run up the application, and go on over to http://localhost:8080 as before.

That’s it!

I’ve been playing with the best way to display code snippets, and came across an amazing little plugin called SyntaxHighlighter Evolved by Viper007Bond.

This is a sample usage of the syntax highlighter:
wrap your code within
[xml]
<html  xmlns =”http://www.w3.org/1999/xhtml”
xmlns:C=”http://mynamespace.swengsol.com/contacts”>
<head><title>Contacts sorted by organization<title><head>
<body>
<C:organization>

[/xml]
which displays:

<html  xmlns =”http://www.w3.org/1999/xhtml”
 xmlns:C=”http://mynamespace.swengsol.com/contacts”>
 <head><title>Contacts sorted by organization<title><head>
 <body>
 <C:organization>
 ...

In particular, note the JavaScript magic when you hover over the pretty syntax layout.

I’m getting more and more impressed with the WordPress architecture the more time I spend with it!

And, to think that all this is being done with PHP!

I’m definitely looking forward to teaching PHP and MySQL in my course on Internet Application Development over at Metro State in the fall.

For more information on this plugin – see http://en.support.wordpress.com/code/posting-source-code/.

Quick tip:

In WordPress, do any pasting of source code within the HTML view. Pasting code into the Visual editor does not seem to work well. Once you have the code pasted as

[sourcecode language=”java”]… source code here …[/sourcecode]

you can then switch to the Visual editor to complete your edits.

The advantages of XML are already well articulated – it provides a way to organize data in a self describing manner (where the meaning of the data is clearly indicated by the element to which it belongs); it allows its data to be manipulated by standard libraries that are readily available on most platforms;  it is readable by humans; and it can be validated for well formedness as well as for semantic validity using standard tools. All of which contribute to making it easily the single most popular data interchange format in use today.

While the basic ideas behind XML are fairly easy to grasp, advanced XML usage can quickly get rather confusing. For instance, simply browsing through a WSDL document places you slam bang in the middle of the territory of schemas or namespaces.

So, in this post, I am going to review the key concepts that underlie the advanced use of XML. If you’re already well familiar with this area, I’ll meet you on the other side. Else, buckle in, and enjoy the ride.

Document Types

The documents you’ll see in the XML world fall into 2 broad categories – schemas and instance documents. The former category is used to define the legal structure of your XML documents, while the latter contains the documents that your applications will largely generate and/or consume.

In other words, if a schema is considered to be equivalent to a Java class definition, then the instance document is equivalent to an instance of that class.

XML Application

An XML application is a set of rules that define the structure of an instance document. For instance, SVG, MathML, and XHTML define a set of elements, as well as their attributes, relationships, and type rules.

In other words, an XML application is defined by one or more schemas that define what constitutes a valid XML document.

Namespaces

A key challenge with building an XML application is that since most applications are designed and developed independently of the others, it is not very uncommon for a tag name used by one application (e.g., <head>, <title>, or <msub>) to also be inadvertently used by another.

This can get hairy when a single XML instance document combines more than one XML application, resulting in a condition where a given element, such as <title>, may have a different meaning within the context of each application. In such a case, when the element <title> shows up in an instance document, its actual meaning may be quite unclear.

Namespaces and Qualified Names

So how does one prevent naming collisions?

The traditional answer is “using namespaces” and that’s the answer here too.

A namespace works by providing a unique space within which an application’s names can be defined. As the developer of the application it is your responsibility to ensure that you have acquired a globally unique namespace identifier. Then, you must ensure that the names of your components (elements, types, attributes, etc.) are unique within the application itself.

In other words, a component’s name is globally unique only when its namespace identifier and its local name within the application are taken together. This composite name is termed its “qualified name”.

This can be expressed as:

Qualified_Name = Unique_Name_Of_the_Applications_Namespace : Local_Name_of_the_Component.

But, how does one guarantee uniqueness of namespace identifiers?

The easiest option is to simply use the absolute URI of a domain that is owned by your organization.

For instance, I own the domain www.swengsol.com, and so I could choose to use the namespace, http://mynamespace.swengsol.com/contacts for my sample XML application that deals with contacts.

It is important to note that a URL (or even more generally a URI) is simply a convenient way to establish a globally unique namespace. The convenience comes from the fact that you have previously registered your domain with a central registration authority.

While a namespace might resemble a URL, it is not expected that this URL will resolve to a particular document, or even that the virtual host mynamespace.swengsol.com actually exists anywhere on the Net. Instead, this namespace is simply a logical identifier used to ensure that any component names I define within it are globally unique. To make sure you understand this concept, ask yourself why you should not use namespaces based on time.com or cnn.com for your own application.

So, if I were to define an element <organization> in my contacts XML application, the qualified name for that element would be: http://mynamespace.swengsol.com/contacts:organization, where http://mynamespace.swengsol.com/contacts is my unique namespace, and where organization is a local name that I guarantee is unique within my own namespace. Taken together, the qualified name is therefore guaranteed to be unique across the entire universe of XML component names.

Namespace Prefixes

Unfortunately, using a URI as a namespace is problematic because of the characters, such as “/”, that are legal within a URI, but are illegal within an XML instance document. The solution here is to restrict the URI to a safe location within the document (such as an attribute value), to map the URI to a legal logical name  for that namespace (e.g., “xsd”),  and then to use that logical name wherever the actual namespace would have been used.

So how do we associate the actual namespace name with its logical equivalent?

We use a namespace declaration.

A namespace declaration uses the xmlns attribute of any XML element to bind a given namespace to its logical equivalent. That binding is visible for that element and any of its descendants. If used on the root element (the usual case), it is available throughout that document.

<HTML:html xmlns:HTML="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"
 xmlns:C=”http://mynamespace.swengsol.com/contacts”>
 <HTML:head>
   <HTML:title>Contacts sorted by organization</HTML:title>
 </HTML:head>
 <HTML:body>
   <C:organization name="Software Engineering Solutions, Inc">
     <C:contact>
       <C:title>Director of Marketing</C:title>
       ...
       </C:contact>
       ...
   </C:organization>
 </HTML:body>
</HTML:html>

As you can see, elements from different namespaces (HTML:title and C:title) can coexist quite well within a single instance document – even if they share the same local name (title).

Default Namespace

The default namespace is a special case used to reduce the verbosity of an instance document. If a large percentage of your elements come from a single namespace, you can declare the mapping of that namespace to an empty logical namespace name. Any elements that do not belong to a named namespace now belong to this default namespace.

<html  xmlns =”http://www.w3.org/1999/xhtml”
       xmlns:C=”http://mynamespace.swengsol.com/contacts”>
  <head><title>Contacts sorted by organization<title><head>
  <body>
    <C:organization>
    ...

You could define the default namespace (or any other namespace) at a lower level within the document tree. In which case, the default namespace (or the other namespace) is overridden at that element and any of its descendants.

The default namespace does not apply to attributes. An attribute that is not prefixed does not exist in any namespace.

Target Namespace

This concept has meaning only when we discuss XML Schemas – so while I introduce it here, we won’t actually see it in more detail until a bit later.

A target namespace is declared within an XML Schema to indicate the namespace to which any types defined in that schema, belong.

XML Schemas

A schema provides you with a way of defining what is a valid and legal XML instance document for your XML application. It lets you define the structure of your document (which elements and attributes are permissible, and in what combinations), as well as the legal data types for your elements and attributes.

A schema’s root element is called schema, and is in the http://www.w3.org/2001/XMLSchema namespace (xsd prefix).

Content Model

The content model describes the content of an XML element. An element has a “simple” content model when it only contains a text node; a “complex” model when it can only take subelements; “mixed” when both can be present; and “empty” when no content is allowed.

A simple content model:

<name>Damodar Chetty</name>

A complex content model:

<name>
 <first>Damodar</first>
 <last>Chetty</last>
</name>

An element that can only take a simple content model and has no attributes is considered a simple type, while all others are considered complex types.

XML Simple Type

The XML Schema specification defines 44 simple data types in 4 main categories. This includes numeric data types such as integers (xsd:int, xsd:short, and xsd:long) and real numbers (xsd:float, xsd:double, and xsd:decimal); timestamp data types  such as a specific date (xsd:date), time (xsd:time) or length of time (xsd:duration); XML types such as an XML ID (xsd:ID) or an ID reference (xsd:IDREF); strings (xsd:string); booleans (xsd:boolean), a URI (xsd:any), and so on.

For a detailed reference, visit: http://www.w3.org/TR/xmlschema-2/.

In addition, you can extend these built-in simple types to derive new custom simple types. For instance, you can use a regular expression to restrict the values that are legal for a given type. You do this using the simpleType element and its restriction child which takes one or more facets that let you restrict the allowable values.

Allowable facets for the string data type include xsd:pattern which can be used to define a regex pattern that defines legal values; as well as xsd:minLength and xsd:maxLength which define the minimum and maximum length of the content.

<xsd:simpleType name=”us-zipcode”>
  <xsd:restriction base=”xsd:string”>
    <xsd:pattern value=”\p{Nd}{5}”/>
  </xsd:restriction>
</xsd:simpleType>
<xsd:simpleType>
  <xsd:restriction base="int">
    <xsd:minInclusive value="0"/>
    <xsd:maxExclusive value="10000" />
  </xsd:restriction>
</xsd:simpleType>

Once these new types have been defined, you can use them to declare other elements.

<element type="inv:quantity"/>

XML Complex Data Types

In addition to simple types, you can also define your own complex data types that are composed of one or more simple types, or even from other complex types.

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd=”http://www.w3.org/2001/XMLSchema”
 xmlns="http://www.swengsol.com/contacts"
 targetNamespace="http://www.swengsol.com/contacts" >

 <xsd:element name="organization" type="organizationType" />

 <xsd:simpleType name=”us-zipcode”>
   <xsd:restriction base=”xsd:string”>
     <xsd:pattern value=”\p{Nd}{5}”/>
   </xsd:restriction>
 </xsd:simpleType>
 <xsd:complexType name="organizationType">
   <xsd:sequence>
     <xsd:element>
       <xsd:complexType name="contactName">
         <xsd:sequence>
           <xsd:element name="firstName"/>
           <xsd:element name="lastName"/>
         </xsd:sequence>
       </xsd:complexType>
     </xsd:element>
     <xsd:element name="phone"   type="xsd:string" />
     <xsd:element name="address" type="addressType" />
   </xsd:sequence>
   <xsd:attribute name="name" type="xsd:string"/>
 </xsd:complexType>
 <xsd:complexType name="address">
   <xsd:sequence>
     <xsd:element name="street"  type="xsd:string" />
     <xsd:element name="city"    type="xsd:string" />
     <xsd:element name="state"   type="xsd:string" />
     <xsd:element name="zip"     type="us-zipcode" />
   </xsd:sequence>
 </xsd:complexType>
</xsd:schema>

There are a few things to note with this schema:

1. The targetNamespace identifies the namespace into which the local components defined in this schema will be placed.

2. All unqualified component names are part of the default namespace.

3. Components that are named (using the name attribute) and that are defined directly under the xsd:schema document element within a schema, are called “global” components. In this schema, the organization, organizationType, and address components are global components. The visibility of global components is schema-wide. Global elements (organization) can be root elements for instance documents that conform to this schema.

4. The contactName element’s type definition is an anonymous component since the definition is local to its containing element. The use of anonymous elements limits reuse since it uses a type definition that is not named – preventing that definition from being used by any other element other than contactName. The contactName element is not a direct child of the schema element, hence it cannot be used as a document’s root element.

The benefit here is that you could define different content models for the same element. For instance <contactName> could be locally defined within one containing element to have first, last and middle sub elements; while when used within another element it could simply have a text node.

Global components have special properties:

  • they can be referenced from anywhere within the schema, as well as from another schema that may include or import this schema.  For instance, global types can be referenced in any element definition using the type attribute. New complex types can also be derived from these global definitions.
  • their names must be unique within a schema, hence you are limited to a single global element with a given name. If you need to reuse a name, you must define the component locally under the appropriate parent.
  • a global element definition can not only be referenced anywhere within the schema, but also can be used as root elements for your instance documents based on this schema.
  • Complex types used as building blocks must appear as top level complexType elements  in the schema.

Complex Data Type Composition

In general, complex types are comprised of simple types arranged in some compositional manner – either using an ordered sequence of elements (xsd:sequence), or unordered combinations (xsd:all or xsd:choice).

The sequence is the simplest construct, where you specify the order in which elements must appear, and for each element you specify its type as well as the number of times it is allowed to appear (using occurrence bounds). You can also add occurrence attributes to the sequence as a whole.

xsd:all defines an unordered grouping of one or more individual element declarations, where each element may only occur either 0 or 1 times.

xsd:choice allows an unordered grouping of one or more individual element declarations, where only one element from that group may appear in an instance document. The xsd:choice element itself can be bounded.

Attaching Schemas to Instance Documents

Most parsers can validate an XML instance document against a given schema to ensure that the instance document conforms to that given markup language. There are two ways in which to link an instance document to its defining schema. In both cases, the instance document has an embedded pointer that references its schema.

In the first mechanism, the instance document uses the xsd:noNamespaceSchemaLocation attribute on its root element to reference its schema document.

<?xml version=”1.0”?>
 <organization xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
 xsi:noNamespaceSchemaLocation=”contacts.xsd”>

In this case, the path to the XSD can be an absolute URL to somewhere on the Internet, or it can be a path relative to the instance document on the local hard drive. Note that you do not need to specify the location of the schema that defines the xsi namespace since the XMLSchema-Instance namespace is supported natively by any XML schema validating parser.

In the second mechanism, a separate xmlns:schemaLocation attribute is used in the document root element to map a namespace to its associated schema document. Whitespace is used to separate each namespace from its schema, and to separate namespace/schema pairs from each other. Any such use of whitespace is purely for readability.

<organization
 xmlns="http://www.swengsol.com/contacts"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://www.swengsol.com/contacts http://www.swengsol.com/contacts.xsd
 http://www.w3.org/2001/XMLSchema-instance http://www.w3.org/2001/XMLSchema.xsd">

For a referenced schema such as contacts.xsd to be useful, that schema’s targetNamespace must match a namespace that is used within this instance document.

Combining Schemas

Schemas support reuse of defined types and structures using standard mechanisms of imports and includes. An import lets you combine schemas from different namespaces, while an include lets you combine schemas from the same namespace.

In both cases, you need at least two schema definitions – where one of the schemas (the dependent) is being imported into or included by the other (the independent).

An include is the simpler operation, and is used simply as a composition mechanism to construct an overall schema out of individual portions. In this case, the target namespace advertised by the independent and dependent schemas must match exactly. This is appropriate since even though they are physically distinct, the schemas being composed are logically part of a single namespace.

<include schemaLocation="http://www.swengsol.com/contacts.xsd" />

The include mechanism is very straightforward, and can be considered a direct copy and paste into the independent schema.

While including is a way of composing parts of a single grammar into a single complete set, importing is a way of composing independent grammars into a single family of rules. In other words, each grammar in an import can stand by itself quite comfortably, and is only being imported in order to be used in a synergistic manner with another cooperating grammar. As a result, the namespaces of each grammar will be quite different, and there is no requirement that relates the targetNamespace of the two schemas. As a result, the importing element must specify not only the location of the schema being imported, but also the namespace to which it will be mapped – which should match the targetNamespace within the dependent schema.

<import namespace="http://www.swengsol.com/contacts"
        schemaLocation="http://www.swengsol.com/contacts.xsd" />

The imported schema will be assigned a prefix , usually using the xmlns attribute in the independent schema’s root element, before its rules can be used.

That’s it! This much XML knowledge is generally sufficient for most usage scenarios. I’ll follow this post with another that takes a closer look at the WSDL definition of web service contracts.

Updated July 16: I spoke too soon … I was informed that there’s one more concept that I should have covered in the above article.

Qualified and unqualified elements and attributes.

By default, any global elements used in your instance document must be fully qualified (either explicitly by using a namespace prefix, or implicitly by specifying a default namespace for that instance document). This is controlled by the elementFormDefault attribute of the schema root element which is set to “unqualified” by default.

You can explicitly set this attribute to “qualified” to indicate that even local elements must now be fully qualified (i.e., be prefixed by a namespace).

In a similar fashion, the schema root element’s attributeFormDefault attribute can also be set. If set to qualified, then both global as well as local attributes must be explicitly qualified. Note that the default namespace does not apply to attributes, and so there is no implicit qualification that occurs as with elements.

Note that you can override the defaults using the form attribute of the element and attribute schema elements.

Just noticed this review over at DZone – http://java.dzone.com/articles/tomcat-6-developer’s-guide