Monday, February 21, 2011

Chapter 39: Getting Started with Collections

In the previous chapter, we talked about the equals(), toString() and hashCode() methods and said that they would be very useful with collections, but we never saw what these collections are. Well, here we are. This chapter and the subsequent ones will dive deep into the world of collections.

So, let’s get started!!!

Collections

The Collections Framework in Java, which took shape with the release of JDK 1.2 and was expanded in 1.4 and again in Java 5, and yet again in Java 6, gives you lists, sets, maps, and queues to satisfy most of your coding needs. They’ve been tried, tested, and tweaked to offer the best performance. Pick the best one for your job and you’ll get—at the least—reasonable performance. And when you need something a little more custom, the Collections Framework in the java.util package is loaded with interfaces and utilities.

What Do You Do with a Collection?

There are a few basic operations you’ll normally use with collections:
• Add objects to the collection.
• Remove objects from the collection.
• Find out if an object is in the collection.
• Retrieve an object from the collection
• Iterate through the collection, looking at each object one by one.

Now that we know what we can do with a collection, let’s look at more details about them.

Key Classes & Interfaces of the Collections Framework

For the exam you’ll need to know which collection to choose based on a stated requirement. The collections API begins with a group of interfaces, but also gives you a truckload of concrete classes. The core interfaces you need to know for the exam are the following nine:
1. Collection
2. List
3. Queue
4. Set
5. Map
6. NavigableSet
7. SortedSet
8. SortedMap
9. NavigableMap

The core concrete implementation classes you need to know for the exam are the following 13:
1. Maps
     a. HashMap
     b. HashTable
     c. TreeMap
     d. LinkedHashMap
2. Sets
     a. HashSet
     b. LinkedHashSet
     c. TreeSet
3. Lists
     a. ArrayList
     b. Vector
     c. LinkedList
4. Queues
     a. PriorityQueue
5. Utilities
     a. Collections
     b. Arrays

Not all collections in the Collections Framework actually implement the Collection interface. In other words, not all collections pass the IS-A test for Collection. Specifically, none of the Map-related classes and interfaces extend from Collection. So while SortedMap, Hashtable, HashMap, TreeMap, and LinkedHashMap are all thought of as collections, none are actually extended from Collection-with-a-capital-C. To make things a little more confusing, there are really three overloaded uses of the word “collection”:
• collection (lowercase c), which represents any of the data structures in which objects are stored and iterated over.
• Collection (capital C), which is actually the java.util.Collection interface from which Set, List, and Queue extend.
• Collections (capital C and ends with s) is the java.util.Collections class that holds a pile of static utility methods for use with collections.

Exam Tip: You can very easily mistake “Collections” for “Collection”—be careful. Keep in mind that Collections is a class, with static utility methods, while Collection is an interface with declarations of the methods common to most collections including add(), remove(), contains(), size(), and iterator().

Collections come in four basic flavors:

• Lists Lists of things (classes that implement List).
• Sets Unique things (classes that implement Set).
• Maps Things with a unique ID (classes that implement Map).
• Queues Things arranged by the order in which they are to be processed.

But there are sub-flavors within those four flavors of collections:
• Sorted
• Unsorted
• Ordered
• Unordered

An implementation class can be unsorted and unordered, ordered but unsorted, or both ordered and sorted. But an implementation can never be sorted but unordered, because sorting is a specific type of ordering. For example, a HashSet is an unordered, unsorted set, while a LinkedHashSet is an ordered (but not sorted) set that maintains the order in which objects were inserted.

Maybe we should be explicit about the difference between sorted and ordered, but first we have to discuss the idea of iteration. When you think of iteration, you may think of iterating over an array using, say, a for loop to access each element in the array in order ([0], [1], [2], and so on). Iterating through a collection usually means walking through the elements one after another starting from the first element. Sometimes, though, even the concept of first is a little strange—in a Hashtable there really isn’t a notion of first, second, third, and so on. In a Hashtable, the elements are placed in a random order based on the hashcode of the key. But something has to go first when you iterate; thus, when you iterate over a Hashtable, there will indeed be an order. But as far as you can tell, it’s completely arbitrary and can change in apparently random ways as the collection changes.

Ordered

When a collection is ordered, it means you can iterate through the collection in a specific (not-random) order. A Hashtable collection is not ordered. Although the Hashtable itself has internal logic to determine the order (based on hashcodes and the implementation of the collection itself), you won’t find any order when you iterate through the Hashtable. An ArrayList, however, keeps the order established by the elements’ index position (just like an array). LinkedHashSet keeps the order established by insertion, so the last element inserted is the last element in the LinkedHashSet (as opposed to an ArrayList, where you can insert an element at a specific index position). Finally, there are some collections that keep an order referred to as the natural order of the elements, and those collections are then not just ordered, but also sorted. Let’s look at how natural order works for sorted collections.

Sorted

A sorted collection means that the order in the collection is determined according to some rule or rules, known as the sort order. A sort order has nothing to do with when an object was added to the collection, or when was the last time it was accessed, or what “position” it was added at. Sorting is done based on properties of the objects themselves. You put objects into the collection, and the collection will figure out what order to put them in, based on the sort order. A collection that keeps an order (such as any List, which uses insertion order) is not really considered sorted unless it sorts using some kind of sort order. Most commonly, the sort order used is something called the natural order. What does that mean?

You know how to sort alphabetically—A comes before B, X comes before Y, and so on. For a collection of String objects, then, the natural order is alphabetical. For Integer objects, the natural order is by numeric value—1 before 2, and so on. There is no natural order for Objects (Customs ones created by programmers) unless and until the developer who coded the class provides one, through an interface (Comparable) that defines how instances of a class can be compared to one another (does instance a come before b, or does instance b come before a?). If the developer decides that objects should be compared using the value of some instance variable, then a sorted collection will order the objects according to the rules in the class for how to use the instance variable to determine the order. Of course, the class might also inherit a natural order from a superclass rather than define its own order, in some cases.

Aside from natural order as specified by the Comparable interface, it’s also possible to define other, different sort orders using another interface: Comparator. We will discuss how to use both Comparable and Comparator to define sort orders later. But for now, just keep in mind that sort order (including natural order) is not the same as ordering by insertion, access, or index.
Now that we know about ordering and sorting, we’ll look at each of the four interfaces, and we’ll dive into the concrete implementations of those interfaces.

List Interface

A List cares about the index. The one thing that List has that non-lists don’t have is a set of methods related to the index. Those key methods include things like get(int index), indexOf(Object o), add(int index, Object obj), and so on. All three List implementations are ordered by index position—a position that you determine either by setting an object at a specific index or by adding it without specifying position, in which case the object is added to the end. The three List implementations are described in the following sections.

ArrayList

Think of this as a growable array. It gives you fast iteration and fast random access. To state the obvious: it is an ordered collection (by index), but not sorted. You might want to know that as of version 1.4, ArrayList now implements the new RandomAccess interface—a marker interface (meaning it has no methods) that says, “this list supports fast (generally constant time) random access.” Choose this over a LinkedList when you need fast iteration but aren’t as likely to be doing a lot of insertion and deletion.

Vector

Vector is a holdover from the earliest days of Java; Vector and Hashtable were the two original collections, the rest were added with Java 2 versions 1.2 and 1.4. A Vector is basically the same as an ArrayList, but Vector methods are synchronized for thread safety. You’ll normally want to use ArrayList instead of Vector because the synchronized methods add a performance hit you might not need. And if you do need thread safety, there are utility methods in class Collections that can help. Vector is the only class other than ArrayList to implement RandomAccess.

LinkedList

A LinkedList is ordered by index position, like ArrayList, except that the elements are doubly-linked to one another. This linkage gives you new methods (beyond what you get from the List interface) for adding and removing from the beginning or end, which makes it an easy choice for implementing a stack or queue. Keep in mind that a LinkedList may iterate more slowly than an ArrayList, but it’s a good choice when you need fast insertion and deletion. As of Java 5, the LinkedList class has been enhanced to implement the java.util.Queue interface. As such, it now supports the common queue methods: peek(), poll(), and offer().

Set Interface

A Set cares about uniqueness i.e., it doesn’t allow duplicates. Our good friend the equals() method determines whether two objects are identical (in which case only one can be in the set). The three Set implementations are described in the following sections.

HashSet

A HashSet is an unsorted, unordered Set. It uses the hashcode of the object being inserted, so the more efficient your hashCode() implementation the better access performance you’ll get. Use this class when you want a collection with no duplicates and you don’t care about order when you iterate through it.

LinkedHashSet

A LinkedHashSet is an ordered version of HashSet that maintains a doubly-linked List across all elements. Use this class instead of HashSet when you care about the iteration order. When you iterate through a HashSet the order is unpredictable, while a LinkedHashSet lets you iterate through the elements in the order in which they were inserted.

TreeSet

The TreeSet is one of two sorted collections (the other being TreeMap). It uses a Red-Black tree structure (but you knew that), and guarantees that the elements will be in ascending order, according to natural order. Optionally, you can construct a TreeSet with a constructor that lets you give the collection your own rules for what the order should be (rather than relying on the ordering defined by the elements’ class) by using a Comparable or Comparator. As of Java 6, TreeSet implements NavigableSet.

Exam Tip:
When using HashSet or LinkedHashSet, the objects you add to them must override hashCode(). If they don’t override hashCode(), the default Object. hashCode() method will allow multiple objects that you might consider “meaningfully equal” to be added to your “no duplicates allowed” set.

Map Interface

A Map cares about unique identifiers. You map a unique key (the ID) to a specific value, where both the key and the value are, of course, objects. You’re probably quite familiar with Maps since many languages support data structures that use a key/value or name/value pair. The Map implementations let you do things like search for a value based on the key, ask for a collection of just the values, or ask for a collection of just the keys. Like Sets, Maps rely on the equals() method to determine whether two keys are the same or different.

HashMap

The HashMap gives you an unsorted, unordered Map. When you need a Map and you don’t care about the order (when you iterate through it), then HashMap is the way to go; the other maps add a little more overhead. Where the keys land in the Map is based on the key’s hashcode, so, like HashSet, the more efficient your hashCode() implementation, the better access performance you’ll get. HashMap allows one null key and multiple null values in a collection.

HashTable

Like Vector, Hashtable has existed from prehistoric Java times. For fun, don’t forget to note the naming inconsistency: HashMap vs. Hashtable. Where’s the capitalization of t? Oh well, you won’t be expected to spell it. Anyway, just as Vector is a synchronized counterpart to the sleeker, more modern ArrayList, Hashtable is the synchronized counterpart to HashMap. Remember that you don’t synchronize a class, so when we say that Vector and Hashtable are synchronized, we just mean that the key methods of the class are synchronized. Another difference, though, is that while HashMap lets you have null values as well as one null key, a Hashtable doesn’t let you have anything that’s null.

LinkedHashMap

Like its Set counterpart, LinkedHashSet, the LinkedHash-Map collection maintains insertion order (or, optionally, access order). Although it will be somewhat slower than HashMap for adding and removing elements, you can expect faster iteration with a LinkedHashMap.

TreeMap

You can probably guess by now that a TreeMap is a sorted Map. And you already know that by default, this means “sorted by the natural order of the elements.” Like TreeSet, TreeMap lets you define a custom sort order (via a Comparable or Comparator) when you construct a TreeMap, that specifies how the elements should be compared to one another when they’re being ordered. As of Java 6, TreeMap implements NavigableMap.

Queue Interface

A Queue is designed to hold a list of “to-dos,” or things to be processed in some way. Although other orders are possible, queues are typically thought of as FIFO (first-in, first-out). Queues support all of the standard Collection methods and they also add methods to add and subtract elements and review queue elements.

PriorityQueue

This class is new with Java 5. Since the LinkedList class has been enhanced to implement the Queue interface, basic queues can be handled with a LinkedList. The purpose of a PriorityQueue is to create a “priority-in, priority out” queue as opposed to a typical FIFO queue. A PriorityQueue’s elements are ordered either by natural ordering (in which case the elements that are sorted first will be accessed first) or according to a Comparator. In either case, the elements’ ordering represents their relative priority.

Exam Tip: You can easily eliminate some answers right away if you recognize that, for example, a Map can’t be the class to choose when you need a name/value pair collection, since Map is an interface and not a concrete implementation class. The wording on the exam is explicit when it matters, so if you’re asked to choose an interface, choose an interface rather than a class that implements that interface. The reverse is also true—if you’re asked to choose a class, don’t choose an interface type.

Exam Tip: For the exam, you might be expected to choose a collection based on a particular requirement, where that need is expressed as a scenario. For example, which collection would you use if you needed to maintain and search on a list of parts, identified by their unique alphanumeric serial number where the part would be of type Part? Would you change your answer at all if we modified the requirement such that you also need to be able to print out the parts in order, by their serial number? For the first question, you can see that since you have a Part class, but need to search for the objects based on a serial number, you need a Map. The key will be the serial number as a String, and the value will be the Part instance. The default choice should be HashMap, the quickest Map for access. But now when we amend the requirement to include getting the parts in order of their serial number, then we need a TreeMap—which maintains the natural order of the keys. Since the key is a String, the natural order for a String will be a standard alphabetical sort. If the requirement had been to keep track of which part was last accessed, then we’d probably need a LinkedHashMap. But since a LinkedHashMap loses the natural order (replacing it with last-accessed order), if we need to list the parts by serial number, we’ll have to explicitly sort the collection, using a utility method

Previous Chapter: Chapter 38 - Overriding toString(), hashCode() and equals() Methods

Next Chapter: ArrayList Basics

1 comment:

© 2013 by www.inheritingjava.blogspot.com. All rights reserved. No part of this blog or its contents may be reproduced or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without prior written permission of the Author.

ShareThis

Google+ Followers

Followers