Monday, October 31, 2011

Building And Using Static And Shared "C" Libraries


Building And Using Static And Shared "C" Libraries


One of the problems with developed programs, is that they tend to grow larger and larger, bringing up overall compilation and linking time to a large figure, and polluting out makefile, and the directory where we placed the source files. The first time a program we write reaches this state, is normally when we look for a different way to manage our projects.
It is this point where we start thinking about combining out source code into small units of related files, that can be managed with a separate makefile, possibly by a different programmer (for a multi-programmer project).


What Is A "C" Library? What Is It Good For?


One of the tools that compilers supply us with are libraries. A library is a file containing several object files, that can be used as a single entity in a linking phase of a program. Normally the library is indexed, so it is easy to find symbols (functions, variables and so on) in them. For this reason, linking a program whose object files are ordered in libraries is faster than linking a program whose object files are separate on the disk. Also, when using a library, we have fewer files to look for and open, which even further speeds up linking.
Unix systems (as well as most other modern systems) allow us to create and use two kinds of libraries - static libraries and shared (or dynamic) libraries.
Static libraries are just collections of object files that are linked into the program during the linking phase of compilation, and are not relevant during runtime. This last comment seems obvious, as we already know that object files are also used only during the linking phase, and are not required during runtime - only the program's executable file is needed in order to run the program.
Shared libraries (also called dynamic libraries) are linked into the program in two stages. First, during compile time, the linker verifies that all the symbols (again, functions, variables and the like) required by the program, are either linked into the program, or in one of its shared libraries. However, the object files from the dynamic library are not inserted into the executable file. Instead, when the program is started, a program in the system (called a dynamic loader) checks out which shared libraries were linked with the program, loads them to memory, and attaches them to the copy of the program in memory.
The complex phase of dynamic loading makes launching the program slightly slower, but this is a very insignificant drawback, that is out-weighted by a great advantage - if a second program linked with the same shared library is executed, it can use the same copy of the shared library, thus saving a lot of memory. For example, the standard "C" library is normally a shared library, and is used by all C programs. Yet, only one copy of the library is stored in memory at any given time. This means we can use far less memory to run our programs, and the executable files are much smaller, thus saving a lot of disk space as well.
However, there is one drawback to this arrangement. If we re-compile the dynamic library and try to run a second copy of our program with the new library, we'll soon get stuck - the dynamic loader will find that a copy of the library is already stored in memory, and thus will attach it to our program, and not load the new (modified) version from disk. There are ways around this too, as we'll see in the last section of our discussion.

Creating A Static "C" Library Using "ar" and "ranlib"

The basic tool used to create static libraries is a program called 'ar', for 'archiver'. This program can be used to create static libraries (which are actually archive files), modify object files in the static library, list the names of object files in the library, and so on. In order to create a static library, we can use a command like this:

ar rc libutil.a util_file.o util_net.o util_math.o

This command creates a static library named 'libutil.a' and puts copies of the object files "util_file.o", "util_net.o" and "util_math.o" in it. If the library file already exists, it has the object files added to it, or replaced, if they are newer than those inside the library. The 'c' flag tells ar to create the library if it doesn't already exist. The 'r' flag tells it to replace older object files in the library, with the new object files.
After an archive is created, or modified, there is a need to index it. This index is later used by the compiler to speed up symbol-lookup inside the library, and to make sure that the order of the symbols in the library won't matter during compilation (this will be better understood when we take a deeper look at the link process at the end of this tutorial). The command used to create or update the index is called 'ranlib', and is invoked as follows:

ranlib libutil.a

On some systems, the archiver (which is not always ar) already takes care of the index, so ranlib is not needed (for example, when Sun's C compiler creates an archive, it is already indexed). However, because 'ar' and 'ranlib' are used by many makefiles for many packages, such platforms tend to supply a ranlib command that does nothing. This helps using the same makefile on both types of platforms.
Note: when an archive file's index generation date (stored inside the archive file) is older than the file's last modification date (stored in the file system), a compiler trying to use this library will complain its index is out of date, and abort. There are two ways to overcome the problem:
  1. Use 'ranlib' to re-generate the index.
  2. When copying the archive file to another location, use 'cp -p', instead of only 'cp'. The '-p' flag tells 'cp' to keep all attributes of the file, including its access permissions, owner (if "cp" is invoked by a superuser) and its last modification date. This will cause the compiler to think the index inside the file is still updated. This method is useful for makefiles that need to copy the library to another directory for some reason.

Using A "C" Library In A Program

After we created our archive, we want to use it in a program. This is done by adding the library's name to the list of object file names given to the linker, using a special flag, normally '-l'. Here is an example:

cc main.o -L. -lutil -o prog

This will create a program using object file "main.o", and any symbols it requires from the "util" static library. Note that we omitted the "lib" prefix and the ".a" suffix when mentioning the library on the link command. The linker attaches these parts back to the name of the library to create a name of a file to look for. Note also the usage of the '-L' flag - this flag tells the linker that libraries might be found in the given directory ('.', refering to the current directory), in addition to the standard locations where the compiler looks for system libraries.
For an example of program that uses a static library, try looking at our static library example directory.

Creating A Shared "C" Library Using "ld"

The creation of a shared library is rather similar to the creation of a static library. Compile a list of object files, then insert them all into a shared library file. However, there are two major differences:
  1. Compile for "Position Independent Code" (PIC) - When the object files are generated, we have no idea where in memory they will be inserted in a program that will use them. Many different programs may use the same library, and each load it into a different memory in address. Thus, we need that all jump calls ("goto", in assembly speak) and subroutine calls will use relative addresses, and not absolute addresses. Thus, we need to use a compiler flag that will cause this type of code to be generated.
    In most compilers, this is done by specifying '-fPIC' or '-fpic' on the compilation command.
  2. Library File Creation - unlike a static library, a shared library is not an archive file. It has a format that is specific to the architecture for which it is being created. Thus, we need to use the compiler (either the compiler's driver, or its linker) to generate the library, and tell it that it should create a shared library, not a final program file.
    This is done by using the '-G' flag with some compilers, or the '-shared' flag with other compilers.
Thus, the set of commands we will use to create a shared library, would be something like this:


cc -fPIC -c util_file.c
cc -fPIC -c util_net.c
cc -fPIC -c util_math.c
cc -shared libutil.so util_file.o util_net.o util_math.o

The first three commands compile the source files with the PIC option, so they will be suitable for use in a shared library (they may still be used in a program directly, even thought they were compiled with PIC). The last command asks the compiler to generate a shared library

Using A Shared "C" Library - Quirks And Solutions

Using a shared library is done in two steps:
  1. Compile Time - here we need to tell the linker to scan the shared library while building the executable program, so it will be convinced that no symbols are missing. It will not really take the object files from the shared library and insert them into the program.
  2. Run Time - when we run the program, we need to tell the system's dynamic loader (the process in charge of automatically loading and linking shared libraries into the running process) where to find our shared library.
The compilation part is easy. It is done almost the same as when linking with static libraries:

cc main.o -L. -lutil -o prog

The linker will look for the file 'libutil.so' (-lutil) in the current directory (-L.), and link it to the program, but will not place its object files inside the resulting executable file, 'prog'.
The run-time part is a little trickier. Normally, the system's dynamic loader looks for shared libraries in some system specified directories (such as /lib, /usr/lib, /usr/X11/lib and so on). When we build a new shared library that is not part of the system, we can use the 'LD_LIBRARY_PATH' environment variable to tell the dynamic loader to look in other directories. The way to do that depends on the type of shell we use ('tcsh' and 'csh', versus 'sh', 'bash', 'ksh' and similar shells), as well as on whether or not 'LD_LIBRARY_PATH' is already defined. To check if you have this variable defined, try:

echo $LD_LIBRARY_PATH

If you get a message such as 'LD_LIBRARY_PATH: Undefined variable.', then it is not defined.
Here is how to define this variable, in all four cases:
  1. 'tcsh' or 'csh', LD_LIBRARY_PATH is not defined:
    
        setenv LD_LIBRARY_PATH /full/path/to/library/directory
        

  2. 'tcsh' or 'csh', LD_LIBRARY_PATH already defined:
    
        setenv LD_LIBRARY_PATH /full/path/to/library/directory:${LD_LIBRARY_PATH}
        

  3. 'sh', 'bash' and similar, LD_LIBRARY_PATH is not defined:
    
        LD_LIBRARY_PATH=/full/path/to/library/directory
        export LD_LIBRARY_PATH
        

  4. 'sh', 'bash' and similar, LD_LIBRARY_PATH already defined:
    
        LD_LIBRARY_PATH=/full/path/to/library/directory:${LD_LIBRARY_PATH}
        export LD_LIBRARY_PATH
        

After you've defined LD_LIBRARY_PATH, you can check if the system locates the library properly for a given program linked with this library:

ldd prog

You will get a few lines, each listing a library name on the left, and a full path to the library on the right. If a library is not found in any of the system default directories, or the directories mentioned in 'LD_LIBRARY_PATH', you will get a 'library not found' message. In such a case, verify that you properly defined the path to the directory inside 'LD_LIBRARY_PATH', and fix it, if necessary. If all goes well, you can run your program now like running any other program, and see it role...
For an example of a program that uses a shared library, try looking at our shared library example directory.

Using A Shared "C" Library Dynamically - Programming Interface

One of the less-commonly used feature of shared libraries is the ability to link them to a process anytime during its life. The linking method we showed earlier makes the shared library automatically loaded by the dynamic loader of the system. Yet, it is possible to make a linking operation at any other time, using the 'dl' library. This library provides us with a means to load a shared library, reference any of its symbols, call any of its functions, and finally detach it from the process when no longer needed.
Here is a scenario where this might be appealing: suppose that we wrote an application that needs to be able to read files created by different word processors. Normally, our program might need to be able to read tens of different file formats, but in a single run, it is likely that only one or two such document formats will be needed. We could write one shared library for each such format, all having the same interface (readfile and writefile for example), and one piece of code that determines the file format. Thus, when our program is asked to open such a file, it will first determine its format, then load the relevant shared library that can read and translate that format, and call its readfile function to read the document. We might have tens of such libraries, but only one of them will be placed in memory at any given time, making our application use less system resources. It will also allow us to ship the application with a small set of supported file formats, and add new file formats without the need to replace the whole application, by simply sending the client an additional set of shared libraries.

Loading A Shared Library Using dlopen()

In order to open and load the shared library, one should use the dlopen() function. It is used this way:


#include <dlfcn.h>      /* defines dlopen(), etc.       */
.
.
void* lib_handle;       /* handle of the opened library */

lib_handle = dlopen("/full/path/to/library", RTLD_LAZY);
if (!lib_handle) {
    fprintf(stderr, "Error during dlopen(): %s\n", dlerror());
    exit(1);
}

The dlopen() function gets two parameters. One is the full path to the shared library. The other is a flag defining whether all symbols refered to by the library need to be checked immediatly, or only when used. In our case, we may use the lazy approach (RTLD_LAZY) of checking only when used. The function returns a pointer to the loaded library, that may later be used to reference symbols in the library. It will return NULL in case an error occured. In that case, we may use the dlerror() function to print out a human-readable error message, as we did here.

Calling Functions Dynamically Using dlsym()

After we have a handle to a loaded shared library, we can find symbols in it, of both functions and variables. We need to define their types properly, and we need to make sure we made no mistakes. The compiler won't be able to check those declarations, so we should be extra carefull when typing them. Here is how to find the address of a function named 'readfile' that gets one string parameter, and returns a pointer to a 'struct local_file' structure:


/* first define a function pointer variable to hold the function's address */
struct local_file* (*readfile)(const char* file_path);
/* then define a pointer to a possible error string */
const char* error_msg;
/* finally, define a pointer to the returned file */
struct local_file* a_file;

/* now locate the 'readfile' function in the library */
readfile = dlsym(lib_handle, "readfile");

/* check that no error occured */
error_msg = dlerror();
if (error_msg) {
    fprintf(stderr, "Error locating 'readfile' - %s\n", error_msg);
    exit(1);
}

/* finally, call the function, with a given file path */
a_file = (*readfile)("hello.txt");

As you can see, errors might occur anywhere along the code, so we should be carefull to make extensive error checking. Surely, you'll also check that 'a_file' is not NULL, after you call your function.

Unloading A Shared Library Using dlclose()

The final step is to close down the library, to free the memory it occupies. This should only be done if we are not intending to use it soon. If we do - it is better to leave it open, since library loading takes time. To close down the library, we use something like this:

dlclose(lib_handle);

This will free down all resources taken by the library (in particular, the memory its executable code takes up).

Automatic Startup And Cleanup Functions

Finally, the dynamic loading library gives us the option of defining two special functions in each library, namely _init and _fini. The _init function, if found, is invoked automatically when the library is opened, and before dlopen() returns. It may be used to invoke some startup code needed to initialize data structures used by the library, read configuration files, and so on.
The _fini function is called when the library is closed using dlclose(). It may be used to make cleanup operations required by the library (freeing data structures, closing files, etc.).
For an example of a program that uses the 'dl' interface, try looking at our dynamic-shared library example directory.

Getting a Deeper Understanding - The Complete Linking Story



The Importance Of Linking Order

In order to fully understand the way linking is done, and be able to overcome linking problems, we should bare in mind that the order in which we present the object files and the libraries to the linker, is the order in which the linker links them into the resulting binary file.
The linker checks each file in turn. If it is an object file, it is being placed fully into the executable file. If it is a library, the linker checks to see if any symbols referenced (i.e. used) in the previous object files but not defined (i.e. contained) in them, are in the library. If such a symbol is found, the whole object file from the library that contains the symbol - is being added to the executable file. This process continues until all object files and libraries on the command line were processed.
This process means that if library 'A' uses symbols in library 'B', then library 'A' has to appear on the link command before library 'B'. Otherwise, symbols might be missing - the linker never turns back to libraries it has already processed. If library 'B' also uses symbols found in library 'A' - then the only way to assure successful linking is to mention library 'A' on the link command again after library 'B', like this:

$(LD) ....... -lA -lB -lA

This means that linking will be slower (library 'A' will be processed twice). This also hints that one should try not to have such mutual dependencies between two libraries. If you have such dependencies - then either re-design your libraries' contents, or combine the two libraries into one larger library.
Note that object files found on the command line are always fully included in the executable file, so the order of mentioning them does not really matter. Thus, a good rule is to always mention the libraries after all object files.

Static Linking Vs. Dynamic Linking

When we discussed static libraries we said that the linker will try to look for a file named 'libutil.a'. We lied. Before looking for such a file, it will look for a file named 'libutil.so' - as a shared library. Only if it cannot find a shared library, will it look for 'libutil.a' as a static library. Thus, if we have created two copies of the library, one static and one shared, the shared will be preferred. This can be overridden using some linker flags ('-Wl,static' with some linkers, '-Bstatic' with other types of linkers. refer to the compiler's or the linker's manual for info about these flags).

Tuesday, October 18, 2011

samsung nexus

samsung nexus

The new Samsung Nexus S (also called Google Nexus S) is large 4-inch high resolution touchscreen display mobile phone running the latest Google Android OS v2.3 . It is the first mobile phone to run the Android v2.3 OS (GingerBread) .It is powered by a 1GHz processor and features high speed connectivity options like 3G HSDPA and  Wireless LAN WiFi n .
Samsung Nexus S is designed with Samsung’s brilliant Super AMOLED touch screen technology providing a premium viewing experience. The 4-inch Contour Display features a curved design for a more ergonomic style and feel when held to the user’s face. Samsung Nexus S also features Near Field Communication (NFC) technology which allows you to read information off of everyday objects like stickers and posters that are embedded with NFC chips. Powered by a 1 GHz Samsung application processor, Samsung Nexus S produces rich 3D graphics, faster upload and download times and supports HD-like multimedia content.
Samsung Nexus S is equipped with a 5 megapixel rear facing camera and camcorder, as well as a VGA front facing camera. In addition, Samsung Nexus S features a gyroscope sensor to provide a smooth, fluid gaming experience when the user is tilting the device up or down or panning the phone to the left or right. Samsung Nexus S also comes with 16 GB of internal memory.
It does not have FM Radio , the main camera is only 5 MP , and the Bluetooth is v2.1 .It has a long battery backup powered by the 1500mAh Li-Ion battery.
Samsung Nexus S mobile phone key features :
  • 4-inch high resolution touchscreen display
  • 1GHz processor
  • Android OS v2.3 (GingerBread)
  • 3G,WiFi
  • 16GB internal memorySamsung Nexus S mobile phone specifications :
  • Network : Quad band GSM , 3G HSDPA
  • Dimensions : 123.9 x 63 x 10.9 mm
  • Weight : 129 g
  • DISPLAY : 4.0 inch Super AMOLED capacitive touchscreen, 16M colors ,480 x 800 pixels resolution
  • Oleophobic surface
  • Contour Display with curved glass screen
  • Multi-touch input method
  • Accelerometer sensor for UI auto-rotate
  • Touch-sensitive controls
  • Proximity sensor for auto turn-off
  • Three-axis gyro sensor
  • 3.5 mm audio jack
  • Memory Internal : 16GB storage, 512 MB RAM
  • Expandable memory : microSD, up to 32GB
  • 3G HSDPA, 7.2 Mbps; HSUPA, 5.76 Mbps
  • Wireless LAN : Wi-Fi 802.11 b/g/n , DLNA
  • Bluetooth v2.1 with A2DP
  • USB v2.0 microUSB
  • CAMERA : 5 mega-pixel, 2560 x 1920 pixels, autofocus, LED flash ,Geo-tagging, touch focus
  • Secondary VGA camera
  • OS : Android OS, v2.3 Gingerbread
  • CPU : ARM Cortex A8 1GHz processor
  • GPS with A-GPS support
  • Java Via third party application
  • Social networking integration
  • Digital compass
  • MP4/DivX/WMV/H.264/H.263 player
  • MP3/WAV/eAAC+/AC3/FLAC player
  • Organizer
  • Image/video editor
  • Document editor (Word, Excel, PowerPoint, PDF)
  • Google Search, Maps, Gmail,
  • YouTube, Calendar, Google Talk, Picasa integration
  • Flash Player v10.1
  • Near Field Communications (NFC)
  • Battery : Li-Ion 1500 mAh
    • Stand-by Up to 713 hours (2G) / Up to 428 hours (3G)
    • Talk time Up to 14 hours (2G) / Up to 7 hours (3G)

Monday, October 17, 2011

HTC Amaze


Size & weight
  • Height: 5.12 inches (130 mm)
  • Width: 2.58 inches (65.6 mm)
  • Depth: 0.46 inches (11.8 mm)
  • Weight: 6.1 ounces
Processor & OS

  • Qualcomm® Snapdragon™ S3 Processor with 1.5 GHz dual-core CPUs
  • Android™ 2.3.4 (Gingerbread)
  • HTC Sense 3.0
Connectivity


  • 4G: Theoretical peak download speeds up to 42 Mbps
  • Wi-Fi: IEEE 802.11 a/b/g/n compliant
  • Bluetooth® 3.0
  • Near Field Communications
  • GPS
  • FM radio
Multimedia
  • HDMI output (with MHL Adapter)
  • Share photos/videos on Facebook™, Flickr™, Twitter™, or YouTube™
  • Facebook™ and Twitter™ for HTC Sense™
  • Friend Stream™
  • Gallery, Music, and FM Radio
  • SRS virtual surround sound wired for headphones


Friday, October 14, 2011

Motorola Atrix 2

For AT&T
General 2G Network GSM 850 / 900 / 1800 / 1900
3G Network HSDPA 850 / 1900 / 2100
Announced 2011, October
Status Coming soon. Exp. release 2011, October
Size Dimensions -
Weight -
Display Type TFT capacitive touchscreen, 16M colors
Size 540 x 960 pixels, 4.3 inches (~256 ppi pixel density)

- Gorilla Glass display
- Touch sensitive controls
- MOTOBLUR UI with Live Widgets
- Multi-touch input method
- Accelerometer sensor for UI auto-rotate
- Proximity sensor for auto turn-off
Sound Alert types Vibration; MP3, WAV ringtones
Loudspeaker Yes
3.5mm jack Yes
Memory Phonebook Practically unlimited entries and fields, Photo call
Call records Practically unlimited
Internal 8 GB storage, 1 GB RAM
Card slot microSD, up to 32GB, 2 GB included
Data GPRS Yes
EDGE Yes
3G HSDPA, 21 Mbps; HSUPA, 5.76 Mbps
WLAN Wi-Fi 802.11 a/b/g/n, DLNA, Wi-Fi hotspot
Bluetooth Yes, v2.1 with A2DP, EDR
Infrared port No
USB Yes, microUSB v2.0
Camera Primary 8 MP, 3264x2448 pixels, autofocus, LED flash
Features Geo-tagging, image stabilization
Video Yes, 1080p
Secondary Yes
Features OS Android OS, v2.3 (Gingerbread)
CPU Dual-core 1GHz ARM Cortex-A9 processor, ULP GeForce GPU, Tegra 2 AP20H chipset
Messaging SMS (threaded view), MMS, Email, IM, Push Email
Browser HTML
Games Yes + downloadable
Colors Black
GPS Yes, with A-GPS support
Java Yes, via Java MIDP emulator

- Active noise cancellation with dedicated mic
- HDMI port
- Digital compass
- MP3/WAV/WMA/eAAC+ player
- 1080p MP4/H.263/H.264.WMV/Xvid/DivX @ 30 fps playback
- Google Search, Maps, Gmail, YouTube, Google Talk
- Facebook, Twitter, MySpace integration
- Photo viewer/editor
- Organizer
- Quickoffice document editor
- Adobe Flash player
- Voice memo/dial/commands
- Predictive text input
Battery
Standard battery, Li-Ion
Stand-by
Talk time
android Motorola Atrix 2

galaxy ace

galaxy ace

Thursday, October 13, 2011

HCL Me - X1 tablet

The HCL Me Tab X1 sports a 7-inch Large Touchscreen display with 800 x 480 pixels screen resolution, and a 2 MP Front Camera for quality video recording and playback.
The Android Tablet includes music player, FM Radio, .5 watt speakers, 3.5 mm jack, 32GB external memory support and 1GB RPM + 512MB RAM. The Android tablet supports social networking, Emails and instant messaging.
The Me Tab X1 supports 3G data through external USB dongle and is USB port, Wi-Fi and Bluetooth enabled.
HCL Me X1 Android Tablet Features and Specifications:
Network: 2G
3G USB dongle support
OS: Android 2.3 OS
Processor: 1GHz ARM Cortex A8 processor
Display: 7-inch Large Touchscreen Display
800 x 480 pixels screen resolution
4GB ROM + 512MB RAM
up to 32GB External memory support
2MP Front camera for video calls
HD video recording
Music Player
FM Radio
3.5 mm jack
GPS
USBPort
Wi-Fi, Bluetooth data conectivity
Social Networking Integration
IM and Email Support
Pre-instaled Software and Applications
Battery: 3500 mAh Standard

HCL Me X1 Price

The HCL Me Tab X1 Android Tablet is Priced Rs 10,490 in India.
HCL Me Tab X1

Monday, October 10, 2011

Differences between DOM and SAX.


While comparing two entities, we tend to see both of them as competitors and consequently comparing them to find a winner. This of course is not applicable in every case - not at least in the case of SAX and DOM. Both have their own pros and cons and they are certainly not in direct competition with each other.


SAX v/s DOM

Main differences between SAX and DOM, which are the two most popular APIs for processing XML documents in Java, are:-
  • Read v/s Read/Write: SAX can be used only for reading XML documents and not for the manipulation of the underlying XML data whereas DOM can be used for both read and write of the data in an XML document.
  • Sequential Access v/s Random Access: SAX can be used only for a sequential processing of an XML document whereas DOM can be used for a random processing of XML docs. So what to do if you want a random access to the underlying XML data while using SAX? You got to store and manage that information so that you can retrieve it when you need.
  • Call back v/s Tree: SAX uses call back mechanism and uses event-streams to read chunks of XML data into the memory in a sequential manner whereas DOM uses a tree representation of the underlying XML document and facilitates random access/manipulation of the underlying XML data.
  • XML-Dev mailing list v/s W3C: SAX was developed by the XML-Dev mailing list whereas DOM was developed by W3C (World Wide Web Consortium).
  • Information Set: SAX doesn't retain all the info of the underlying XML document such as comments whereas DOM retains almost all the info. New versions of SAX are trying to extend their coverage of information.
Usual Misconceptions
  • SAX is always faster: this is a very common misunderstanding and one should be aware that SAX may not always be faster because it might not enjoy the storage-size advantage in every case due to the cost of call backs depending upon the particular situation, SAX is being used in.
  • DOM always keeps the whole XML doc in memory: it's not always true. DOM implementations not only vary in their code size and performance, but also in their memory requirements and few of them don't keep the entire XML doc in memory all the time. Otherwise, processing/manipulation of very large XML docs may virtually become impossible using DOM, which is of course not the case.

Choosing between SAX and DOM

The single biggest factor in deciding whether to code your programs with SAX or DOM is programmer preference. SAX and DOM are very different APIs, Where SAX models the parser, DOM models the XML document. Most programmers find the DOM approach more to their taste, at least initially. Its pull model (The client program extracts the information it wants from a document by invoking various methods on that document.) is much more familiar than SAX’s push model (The parser tells you what it reads when it reads it, whether you’re ready for that information or not.)
However, SAX’s push model, unfamiliar as it is, can be much more efficient. SAX programs can be much faster than their DOM equivalents, and almost always use far less memory. In particular, SAX works extremely well when documents are streamed, and the individual parts of each document can be processed in isolation from other parts. If complicated processes can be broken down into serial filters, then SAX is hard-to-beat. SAX lends itself to assembly-line like automation where different stations perform small operations on just the parts of the document they have at hand right at that moment. By contrast, DOM is more like a factory where each worker operates only on an entire car. Every time the worker receives a new car off the line, they have to take the entire car apart to find the piece they need to work with, do their job, then put the car back together again before moving it along to the next worker. This system is not very efficient if there’s more than one station. DOM lends itself to monolithic applications where one program does everything. SAX works better when the program can be divided into small bits of independent work.
In particular the following characteristics indicate that a program should probably be using a streaming API such as SAX, XNI, or XMLPULL:
  • Documents will not fit into available memory. This is the only rule that really mandates one or the other. If your documents are too big for available memory, then you must use a streaming API such as SAX, painful though it may be. You really have no other choice.
  • You can process the document in small contiguous chunks of input. The entire document does not need to be available before you can do useful work.
    A slightly weaker variant of this is if the decisions you make only depend on preceding parts of the document, never on what comes later.
  • Processing can be divided up into a chain of successive operations.
However, if the problem matches this next set of characteristics, the program should probably be using DOM or perhaps another of the tree-based APIs such as JDOM:
  • The program needs to access widely separated parts of the document at the same time. Even more so, it needs access to multiple documents at the same time.
  • The internal data structures are almost as complicated as the document itself.
  • The program must modify the document repeatedly.
  • The program must store the document for a significant amount of time through many method calls, not just process it once and forget it.
On occasion, it’s possible to use both SAX and DOM. In particular, you can parse the document using a SAX XMLReader attached to a series of SAX filters, then use the final output from that process to construct a DOM Document. Working in reverse, you can traverse a DOM tree while firing off SAX events to a SAX ContentHandler.
The approach is the same Example 9.14 used earlier in to serialize a DOM Document onto a stream. Use JAXP to perform an identity transform from a source to a result. JAXP supports both SAX, DOM, and streams as sources and results. For example, this code fragment reads an XML document from the InputStream in and parses it with the SAX XMLReader named saxParser. Then it transforms this input into the equivalent DOMResult from which the DOM Document is extracted.
XMLReader saxParser = XMLReaderFactory.createXMLReader();
Source input = new SAXSource(saxParser, in);
Result output = new DOMResult();
TransformerFactory xformFactory 
 = TransformerFactory.newInstance();
Transformer idTransform = xformFactory.newTransformer();
idTransform.transform(input, output);
Node document = idTransform.getNode();
To go in the other direction, from DOM to SAX, just use a DOMSource and a SAXResult. The DOMSource is constructed from a DOM Document object, and the SAXResult is configured with a ContentHandler:
Source input = new DOMSource(document);
ContentHandler handler = new MyContentHandler();
Result output = new SAXResult(handler);
TransformerFactory xformFactory 
 = TransformerFactory.newInstance();
Transformer idTransform = xformFactory.newTransformer();
idTransform.transform(input, output);
Node document = idTransform.getNode();
The transform will walk the DOM tree firing off events to the SAX ContentHandler.
Although TrAX is the most standard, parser-independent means of passing documents back and forth between SAX and DOM, many implementations of these APIs also provide their own utility classes for crossing the border between the APIs, For example, GNU JAXP has the gnu.xml.pipeline.DomConsumer class for building DOM Document objects from SAX event streams and the gnu.xml.util.DomParser class for feeding a DOM Document into a SAX program. The Oracle XML Parser for Java provides the oracle.xml.parser.v2.DocumentBuilder is a SAX ContentHandler/LexicalHandler/DeclHandler that builds a DOM Document from a SAX XMLReader.

Monday, October 3, 2011

Android’s HTTP Clients

Most network-connected Android apps will use HTTP to send and receive data. Android includes two HTTP clients: HttpURLConnection and Apache HTTP Client. Both support HTTPS, streaming uploads and downloads, configurable timeouts, IPv6 and connection pooling. Apache HTTP Client DefaultHttpClient and its sibling AndroidHttpClient are extensible HTTP clients suitable for web browsers. They have large and flexible APIs. Their implementation is stable and they have few bugs. But the large size of this API makes it difficult for us to improve it without breaking compatibility. The Android team is not actively working on Apache HTTP Client. HttpURLConnection HttpURLConnection is a general-purpose, lightweight HTTP client suitable for most applications. This class has humble beginnings, but its focused API has made it easy for us to improve steadily. Prior to Froyo, HttpURLConnection had some frustrating bugs. In particular, calling close() on a readable InputStream could poison the connection pool. Work around this by disabling connection pooling: private void disableConnectionReuseIfNecessary() { // HTTP connection reuse which was buggy pre-froyo if (Integer.parseInt(Build.VERSION.SDK) < Build.VERSION_CODES.FROYO) { System.setProperty("http.keepAlive", "false"); } } In Gingerbread, we added transparent response compression. HttpURLConnection will automatically add this header to outgoing requests, and handle the corresponding response: Accept-Encoding: gzip Take advantage of this by configuring your Web server to compress responses for clients that can support it. If response compression is problematic, the class documentation shows how to disable it. Since HTTP’s Content-Length header returns the compressed size, it is an error to use getContentLength() to size buffers for the uncompressed data. Instead, read bytes from the response until InputStream.read() returns -1. We also made several improvements to HTTPS in Gingerbread. HttpsURLConnection attempts to connect with Server Name Indication (SNI) which allows multiple HTTPS hosts to share an IP address. It also enables compression and session tickets. Should the connection fail, it is automatically retried without these features. This makes HttpsURLConnection efficient when connecting to up-to-date servers, without breaking compatibility with older ones. In Ice Cream Sandwich, we are adding a response cache. With the cache installed, HTTP requests will be satisfied in one of three ways: * Fully cached responses are served directly from local storage. Because no network connection needs to be made such responses are available immediately. * Conditionally cached responses must have their freshness validated by the webserver. The client sends a request like “Give me /foo.png if it changed since yesterday” and the server replies with either the updated content or a 304 Not Modified status. If the content is unchanged it will not be downloaded! * Uncached responses are served from the web. These responses will get stored in the response cache for later. Use reflection to enable HTTP response caching on devices that support it. This sample code will turn on the response cache on Ice Cream Sandwich without affecting earlier releases: private void enableHttpResponseCache() { try { long httpCacheSize = 10 * 1024 * 1024; // 10 MiB File httpCacheDir = new File(getCacheDir(), "http"); Class.forName("android.net.http.HttpResponseCache") .getMethod("install", File.class, long.class) .invoke(null, httpCacheDir, httpCacheSize); } catch (Exception httpResponseCacheNotAvailable) { } } You should also configure your Web server to set cache headers on its HTTP responses. Which client is best? Apache HTTP client has fewer bugs on Eclair and Froyo. It is the best choice for these releases. For Gingerbread and better, HttpURLConnection is the best choice. Its simple API and small size makes it great fit for Android. Transparent compression and response caching reduce network use, improve speed and save battery. New applications should use HttpURLConnection; it is where we will be spending our energy going forward. from android-developers.blogspot.com