Wednesday, 11 December 2013

Requirements for Automation Framework Design

Introduction

A Framework defines a set of guidelines for all phases of test automation: Requirement Analysis, Script Design, Execution, Reporting and maintenance. A framework can be a wrapper around some complex internal architecture which makes it easy to use for the end user. It also enforces a set of standards for implementation


Problem Statement

There is no standard set of guidelines available on developing a framework and what all considerations need to be taken during the development of the same. There are different white papers which go over types of framework and how they work. But none of them defines what all factors go in to the design of the same Design guidelines This post covers different aspect of a framework and key features it needs to have based on the requirements.

The Automation Framework Design Challenge:
  • Balance Quality
  • Time
  • Resources
The challenge is to build a fit-for-purpose automation framework that is capable of keeping up with quickly changing automation testing technologies and changes in the system under test. The challenge is accentuated by the various combinations that are possible using the wide gamut of available automation tools. Making the right choices in the preliminary design stage is the most critical step of the process, since this can be the differentiator between a successful framework and failed investment.

As if this were not tough enough, add to this the even more formidable challenge of balancing the quality of the framework against the desired utility and the need to develop the framework within a stipulated timeframe using available resources to ensure the economic viability of the solution. Therefore, it is very important to benchmark the framework, the associated development time, and the required resources to ensure the framework's quality justifies the use of the framework.

  1. Selection of a framework

    Different types of frameworks that exist are:

    • Data Driven framework – Used when flow of the application remains constant, only the data changes. The data is provided by external medium e.g. – excel sheet, XML etc…
    • Keyword driven framework – This framework provides generic keywords that can be used with any type of application. It also provides abstraction from the type of automation tool used and type of being application tested, e.g. – it can test a similar Web and Windows application with the same test case
    • Hybrid framework – A hybrid framework is the one which takes advantages from both Data Driven and keyword driven frameworks. These frameworks do not implement generic keywords but implement business logic keywords based on the application being tested. For ex – Login, Logout could be application specific keyword that can be used.
     
  2. Don’t reinvent the wheel – A framework should try and use the power of the automation tool rather than re-defining the whole language by implementing new keywords. Developing a keyword driven framework is time consuming and costly. A Hybrid framework can be developed in a shorter time period and with less cost.
     
  3.  Re-usability -The framework should allow highest possible reusability. Combining individual actions into business logic provides re-usability. E.g. – Combing actions like “Enter username”, “Enter password” and “Click Login” into one re-usable component “Login”
     
  4. Support of different application versions -A framework should allow re-use of baselines scripts in case different versions/flavors of an applications are to be tested. There are two different ways to support different applications
    • Copy and Modify – This method involves creating copies of the baseline scripts and modifying them for a specific application version
    • Re-use and Upgrade – This method involves re-using baseline script and providing a upgrade code for specific version of application. This ensures maximum re-usability and should be preferred.
       
  5. Support of script versioning – Scripts should be stored in a version control system like GIT, Bit Keeper, ClearCase etc…This ensures recovery from any disaster.
     
  6. Different environment for development and production – Automation should be considered as any other development project. Test scripts should be created and debugged in Test environment. Once tested then only should be deployed to the production environment. This holds true for any emergency releases also
     
  7. Externally Configurable – Configurable items of a script should be kept in an external file. This would contain configuration like Application URL, version, path etc…This allows running the same script against different environment. Ensure that location of the configuration file is not hard coded. Hard coded files would allow running on any environment but only one at a time. Keeping the configuration relative to current test path allows overcoming this limitation
     
  8. Self configurable – Ideally a framework should be self configurable. Once deployed to a system, no manual configuration changes should be required and scripts should automatically configure the required settings
     
  9. Minimal changes required for any object changes -Most common issues faced during automation are object identification changes. Framework should be able to patch such changes easily. This can be achieved by storing all object identification settings at a shared location. This could be an external XML file, excel file, database or automation proprietary format. There are two possible way to load this object identification configuration
    • Static – In this all the object definitions are loaded into the memory at the start of the test. Any changes made to object definition can only be loaded by stopping and re-running the test
    • Dynamic –Object definition is pulled as per request. This approach is a bit slow as compared to the static one. But in case of huge scripts where the fix needs to be made at run-time this is suitable.
       
  10. Execution – Framework might need to cater to below requirements (on need bases)
    • Execution of a individual test case
    • Execution of a test batch (combination of tests)
    • Re-execution of only failed test cases
    • Execution of a test case/test batch based on result of another test case/test batch
  11. There could be many other needs based on the project requirement. A framework might not implement all of them, but should be flexible enough to accommodate such requirements in future
  12.  Status monitoring – A framework should allow monitoring the execution status in real time and should be capable of sending alerts in case of failure. This ensures quick turnaround time in event of a failure
     
  13. Reporting – Different applications have different reporting needs. Some require combined results for a test batch and some require individual level test report for each test case in test batch. The framework should be flexible enough to generate required reports
     
  14. Minimum dependency on Automation tool for changes – Some fixes can only be made by opening the script in the automation tool and then saving it. Scripts should be developed in such a way that modification is possible even without the unavailability of the automation tool. This deflates company cost by reducing the number of licenses required. It also allows anyone to make changes to the script without having the need to setup the tool
     
  15. Easy debugging -Debugging takes a lot of time during automation and hence special care needs to be taken for this part. Keyword driven frameworks which use external data source (like a excel spread sheet) to read scripts keywords and process the same are difficult to debug.
     
  16. Logging – Log generation is important part of execution. It is very important to generate debug information at various points in a test case. This information can help find problem area quickly and reduce the time to make a fix at the same time
     
  17. Easy to Use – The framework should be easy to learn and use. It is time consuming and costly to train a resource on a framework. A well documented framework is easier to understand and implement
     
  18. Flexible – Framework should be flexible enough to accommodate any enhancements without impacting existing test cases
     
  19. Performance impacts – A framework should also consider the performance impacts of the implementation. A complex framework which increases the load time or execution time of scripts is never desirable. Techniques like caching, compiling all code into single library while execution etc… should be used to improve performance whenever possible
     
  20. Framework Support Tools – External Tools can be developed to perform tasks that help in framework design. Some example tasks would be
    • Uploading scripts from local folder to HP Quality Center
    • Associating library files to currently open scripts
    • Synchronizing local files with HP Quality Center.
       
  21. Coding Standards – Coding standards ensures scripts that are consistent, readable and easily maintainable. Coding standard should define all the below listed things
    • Naming convention for variables, subs, functions, file names, script names etc… Ex – iVarName for interger, prFuncName/meFuncName for function returning interger
    • Library, subs, functions comment header. This should include information like version history, created by, last modified by, last modified date, description, parameters, example
    • Object naming conventions. Ex – txt_FieldName for a text box
       
  22. Portability - The framework must be easily deployable in any environment and technology stack for regression testing and certification testing of the product suite.
     
  23. Scalability - When expansion is required, the framework must allow the addition of multiple scripts and assets, per the testing requirements, through the organized structuring of resources. This need is inevitable in the case of ERP products, which undergo multiple releases for new builds or require different scripts for parallel testing or testing on multiple technology stacks.
     
  24. Reliability - The framework must ensure that the test results are an accurate depiction of system conditions at the time of testing. It also must ensure that automated testing is carried out by efficiently utilizing system resources.
     
  25. Retestability - This feature ensures that regression tests can be submitted multiple times without any change to the data or any other component associated with the tests. It also ensures that the functionality being tested is thoroughly verified.
     
  26. Rerunability - The framework must provide the ability to resubmit the core setup to the same environment, and each time this happens, the framework must ensure that a new setup is created that is used for subsequent testing. All this should happen without any manual change of data in individual scripts.
     
  27. Remote execution - Provision must be made to create and submit automation jobs, which then must be scheduled automatically according to available resources and preset parameters.
     
  28. Load balancing - All scheduled jobs must be executed on the first-available machine, thereby making optimum use of the system. The executions should be based on business rules.
     
  29. Parallel execution in the same environment - Automated scripts conforming to the framework must be able to run simultaneously even though they form part of multiple test cycles in the same environment.
     
  30. Parallel execution in different environments - Automated scripts conforming to the framework must be able to run simultaneously in different environments.
     
  31. Script development efficiency - The development time for scripts conforming to the automation framework should not be more than eight man-hours per script for scripts of high complexity.
     
  32. Addition of functionality - Flexibility should be provided-without limitations posed by the framework-for adding new functionality as requirements evolve.
     
Summary

Automation should be considered as a development project and not just record and playback of events. Starting automated testing with a good framework ensures low maintenance. Guidelines discussed in this post can be used as input for developing requirements for a framework. The benefits of test automation are improved time to market through efficiency, better software quality, and reduced cost. However, careful planning and sound design are required to reap these benefits. A framework is like a blueprint – it provides direction during the automation efforts and helps achieve the value of the automation investment. 

Wednesday, 20 November 2013

Linux DHCP Configuration

Dynamic Host Configuration Protocol (DHCP) automatically assigns IP addresses and other network configuration information (subnetmask, broadcast address, etc) to computers on a network. A client configured for DHCP will send out a broadcast request to the DHCP server requesting an address. The DHCP server will then issue a "lease" and assign it to that client. The time period of a valid lease can be specified on the server. DHCP reduces the amount of time required to configure clients and allows one to move a computer to various networks and be configured with the appropriate IP address, gateway and subnet mask. For ISP's it conserves the limited number of IP addresses it may use. DHCP servers may assign a "static" IP address to specified hardware. Microsoft NetBios information is often included in the network information sent by the DHCP server.

DHCP assignment:
  1. Lease Request: Client broadcasts request to DHCP server with a source address of 0.0.0.0 and a destination address of 255.255.255.255. The request includes the MAC address which is used to direct the reply.
  2. IP lease offer: DHCP server replies with an IP address, subnet mask, network gateway, name of the domain, name servers, duration of the lease and the IP address of the DHCP server.
  3. Lease Selection: Client recieves offer and broadcasts to al DHCP servers that will accept given offer so that other DHCP server need not make an offer.
  4. The DHCP server then sends an ack to the client. The client is configured to use TCP/IP.
  5. Lease Renewal: When half of the lease time has expired, the client will issue a new request to the DHCP server.

DHCP server installation:
  • Red Hat/CentOS/Fedora: rpm -ivh dhcp-x.xxx.elx.i386.rpm
  • Ubuntu/Debian 8: apt-get install dhcp3-server
    ( Later releases of Ubuntu (11.04) used the busybox release known as udhcpd and the configuration is NOT shown here)

Starting DHCP server:
  • Red Hat/CentOS/Fedora: service dhcpd start
    (or /etc/rc.d/init.d/dhcpd start for Red Hat, Fedora and CentOS Linux distributions)
  • Ubuntu/Debian: /etc/init.d/networking restart

Sample DHCP server config file: (DHCP v3.0.1)
  • Red Hat/CentOS/Fedora: /etc/dhcpd.conf
    (See /usr/share/doc/dhcp-3.X/dhcp.conf.sample)
    [Potential Pitfall]: Its /etc/dhcpd.conf NOT /etc/dhcp.conf !!
  • Ubuntu/Debian: /etc/default/dhcp3-server

     


    Test configuration file for errors with the following command: /etc/rc.d/init.d/dhcpd configtest
    (Other distributions may use: /usr/sbin/dhcpd -f)
Note: The MAC addresses for the static address name server (ns2.your-domain.com),
can be obtained with the command /sbin/ifconfig:
eth0      Link encap:Ethernet  HWaddr 00:02:C3:D0:E5:83
          inet addr:40.175.42.254  Bcast:40.175.42.255  Mask:255.255.255.0
          inet6 addr: fe80::202:b3ff:fef0:e484/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:4070 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3878 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3406445 (3.2 MiB)  TX bytes:439612 (429.3 KiB)
                    
When dhcpd is running it will generate entries in the file: /var/lib/dhcp/dhcpd.leases
lease 192.168.1.128 {
 starts 2 2004/12/01 20:07:05;
 ends 3 2004/12/02 08:07:05;
 hardware ethernet 00:00:e8:4a:2c:5c;
 uid 01:00:00:e8:4c:5d:31;
 client-hostname "Node1";
}

Options:
  • ddns-update-style:
    • interim: allows your DHCP server to update a DNS server whenever it hands out a lease. Allows your DNS server to know which IP addresses are associated with which computers in your network. Requires that your DNS server support DDNS (Dynamic DNS).
    • none: to disable dynamic DNS updates or DNS is not supporting DDNS.
    • ad-hoc: been deprecated and shouldn't be used
  • Default options (Red Hat/CentOS/Fedora) are set in /etc/sysconfig/dhcpd
LANs separated by routers: In order to have your DHCP broadcast pass through a router on the the next network, one must configure the router to allow DHCP relay. (Cisco: ip-helper address, Juniper: dhcp-relay) The local lan subnet{ } configuration must come before the configuration directives of the remote LANs.
Look for errors in /var/log/messages

Tuesday, 1 October 2013

How to debug TCL scripts ?

On occasion, I write Tcl programs that don't work right the first time, and thus need to be “debugged”. The easiest way to debug a Tcl program is with the puts command.
 
puts stderr "Some useful information to print"
 
A few carefully placed puts statements can be used to ferret out most bugs. Unfortunately, it often seems the bugs have a habit of returning as soon as the puts statements are removed :-) .

The solution to the recurring bug problem is to wrap puts in a procedure, called dputs, so we can turn debug printing on or off without changing the code:

proc dputs {args} {
  global Debug
  if {[info exists Debug]} {
     puts stderr $args
  }
}
 
This first version of dputs checks to see if the global variable Debug is set (to anything) before printing the arguments passed to dputs. As a side benefit, dputs lets us specify what to print as multiple arguments. The args parameter, which is special in Tcl, automatically gathers all of the arguments of dputs into a single string.

Although dputs is an improvement over puts, it doesn't take too long to discover the limitation of this version. You have the choice of either too little output or too much. What we would like to do is turn debug printing on or off selectively, in different sections of the program.

We can use the introspective capabilities of Tcl to determine which procedure each dputs is being called from, and turn debug printing on or off for each procedure. We'll use the info level command to look into the current procedure stack and figure out the name of the procedure that dputs is being called from. We can set Debug to a glob-style pattern that will cause only those dputs statements in procedures that match that pattern to print. As a bonus, we'll print the calling procedure name as part of our output, so it doesn't have to be included as an argument to dputs.

proc dputs {args} {
  global Debug
  if {![info exists Debug]} return
  set current [expr [info level] - 1]
  set caller toplevel
  catch {
    set caller [lindex [info level $current] 0]
  }
  if {[string match $Debug $caller]} {
    puts stderr "$caller: $args"
  }
}
 
In this version of dputs, as before, if Debug is not set, no debugging output is produced. The info level command returns the current nesting level of the procedure call stack, the dputs procedure. Subtracting one from $current is the stack level of dputs's caller. The info level $current command returns a list of information about the procedure stack at level $current, whose first element is the name of the procedure. If dputs is called at the global scope, the call to info level will fail (current will be -1), thus the catch around info level, which will leave $caller with the pre-initialized value of toplevel.

Now that we have the name of the procedure that dputs was called from, it is a simple matter for string match to compare the procedure name in $caller with the pattern in Debug, and only emit debugging output for the desired procedures. The pattern in Debug can be changed interactively at the command prompt, or automatically under program control.

Although this version of dputs is better, it requires the programmer know in advance what information to pass as arguments to dputs in order for the debug output to help locate the bug. Typically, half the battle of debugging is determining what information needs to be printed to find the bug, and what dputs prints is probably not right.

We can easily overcome this limitation by remembering that Tcl is an interpreted language. Instead of simply printing canned values that are passed as arguments to dputs, we can stop the program at any dputs call and allow the programmer to enter arbitrary Tcl commands to elicit information about the current execution state of the program.

The next procedure, breakpoint, may be inserted anywhere in a Tcl program to cause it to stop and allow interactive execution of commands. For example, the Tcl moral equivalent of the C assert command is implemented by calling breakpoint any time an invalid condition is detected. Alternately, breakpoint can be inserted into dputs so breakpoints can be turned on or off selectively using the Debug variable.

The breakpoint procedure implements four build-in commands: +, -, ? and C. The + and - commands allow the user to move up and down the call stack. The ? commands prints out useful information about the current stack frame, and C returns from breakpoint, resuming execution of the program. Any other command is passed to uplevel to be evaluated at the appropriate stack level.

proc breakpoint {} {
  set max [expr [info level] - 2]
  set current $max
  show $current
  while {1} {
    puts -nonewline stderr "#$current: "
    gets stdin line
    while {![info complete $line]} {
      puts -nonewline stderr "? "
      append line \n[gets stdin]
    }
    switch -- $line {
      + {if {$current < $max} {show [incr current]}}
      - {if {$current > 0} {show [incr current -1]}}
      C {puts stderr "Resuming execution";return}
      ? {show $current}
      default {
        catch { uplevel #$current $line } result
        puts stderr $result
      }
    }
  }
}
 
The procedure breakpoint demonstrates the use of the Tcl commands info level and uplevel to examine the state of a running Tcl program, and the info complete command to read and evaluate Tcl commands entered interactively.
First, info level computes the depth of the procedure call stack (in $max). We need to subtract two from info level, one for the breakpoint procedure, and one for dputs. We then loop (while {1}) getting Tcl commands and running them. The variable $current contains the current stack level, which we'll print as part of the prompt to the user.

Getting a Tcl command from the console is a little tricky, as a single command might span multiple input lines. We'll use info complete, and append commands in the inner while loop to gather up enough lines of input to form a complete Tcl command. Once we have the entire command, the switch statement selects either one of the built-in commands, or it calls uplevel to run the command at the current stack level, which may have been modified previously by + or - commands. The catch around uplevel insures that an errant command typed by the user doesn't terminate the program with an error. We then print the result of the command (or the error message if it failed), and loop back to get the next command from the user.

The built-in commands + and - are used to change the stack level that the commands we enter will be evaluated in. They simply change the value of $current. The ? command calls show, and C returns, resuming execution of the program. The procedure show, which we'll write next, displays useful information about the current stack level.

proc show {current} {
  if {$current > 0} {
    set info [info level $current]
    set proc [lindex $info 0]
    puts stderr "$current: Procedure $proc \
                {[info args $proc]}"
    set index 0
    foreach arg [info args $proc] {
      puts stderr \
           "\t$arg = [lindex $info [incr index]]"
    }
  } else {
    puts stderr "Top level"
  }
}
 
The procedure show is a shortcut for printing application-specific information while debugging, since the user could type in the Tcl commands to achieve the same result. This version of show, which gets passed the stack level $current as an argument, prints the procedure name, its arguments, and their values at the time the procedure was called. In dputs we used the first element of info level $current as the name of the procedure in stack frame $current. The remaining elements contain the values of the arguments passed to the procedure. The call to info args returns the names of the arguments, which we pair with their values in info level $current, using the variable index to step though the list of argument values. Here is some sample output from show, taken from a debugging session of HMtag_img, part of a Tcl HTML library package.

4: Procedure HMtag_img {win param text}
        win = .clone1.text
        param = src=green_ball.gif
        text = text
#4: info vars
var text param win
#4: set var(font)
font:courier:14:medium:r
#4: -
3: Procedure HMrender {win tag not param text}
        win = .clone1.text
        tag = img
        not =
        param = src=green_ball.gif
        text =  This is a good point
#3: C
Resuming Execution
 
In conclusion, we started with a simple puts for program debugging, and in less than 50 lines of Tcl code, created a powerful debugging environment that can be easily tailored to meet the debugging needs of most Tcl applications.

Courtesy: www.linuxjournal.com

Thursday, 15 August 2013

TCP Connection Establishment

An application on a host/computer(IP, Port) shall establish TCP connection with other application on another Host/computer(IP, Port) by 3-way handshake. A state is maintained at both end. i.e

(1)Connection initiator sends connection-establishment message to other host (Receiver), 
(2)Now Receiver shall send the acknowledgement of connection-establishment message. 
(3)Now connection initiator shall again send an acknowledgement that it has received the 

All these steps are shown below.



Lets see what exactly happens in simplest way of connection putting all complexities such as local policies, retries etc. aside and assuming Host-A acting as CLIENT, Host-B (is in LISTEN state) as SERVER  step by step. 

a) An application on Host-B is up and in Listen state. {Listen} state implies application is ready to accept a TCP connection and waiting for incoming connection requests.

b) Now an Application on Host-A is in Close state, Initiates  a TCP message to establish a connection with Host-B. First message sent by Host-A (Client) is SYN (Synchronize) message. {Close} state means no state.

c) Here in SYN message Host-A shall set a sequence number, this sequence number is unique for given time and random. This sequence number is generated by ISN (Initial Sequence Number generator). SYN-Flag of control flags in header is also set.

d)As soon as SYN message is initialed by Host-A it goes to SYN-SENT state. {SYN-SENT} implies SYN message is sent to Host-B (remote B) and waiting for acknowledgement of SYN i.e. SYN of sequence number 1000.

c)As soon as SYN message is received by Host-B it changes its state from {Listen} to {SYN-RECEIVED} and send the acknowledgement of SYN in SYN-ACK message to Host-A. {SYN-RECEIVED} state implies that SYN received, SYN-ACK send and waiting for acknowledgement of SYN-ACK.

d)SYN-ACK message contains a random,Unique sequence number generated by Host-B, Acknowledgment of SYN by setting acknowledgment section of Header by SEQ-Received +1. I.e. Host-B Expects that next sequence number of incoming   message from Host-A shall be the same as the Acknowledgment Number sent by Host-B. Host-B also sets control flags SYN and ACK, As shown above.

e)Now Host-A receives SYN-ACK check Acknowledgement number, if valid sends ACK and goes in {Established} state otherwise close the connection. {Established}means  open connection, data can be delivered, Data transfer can take place.

f) Host-A sends ACK containing Sequence Number equal to the Acknowledgement number received in SYN-ACK message, sets Acknowledgement field to Sequence Number received in SYN-ACK + 1. ACK control flag is also set.

g)Host-B receives ACK check acknowledgement and sequence numbers if valid goes to {ESTABLISHED} state otherwise close the connection. This how 3-way handshake take place.

h)Once Host-B is in {ESTABLISHED} data transfer from both end can take place as shown above.

Note:-  Once a host goes in established state until connection close request is received. Means of any node reboots/crashes in established state then other node shall not be aware of reboot/crash of remote host because TCP does not have any health check mechanism to check whether established connection is still up or not.

Now we shall move toward some complex situations occurs during TCP connection establishment process. Before we proceed to next topic it would be best we go back to Previous topic TCP Message Format for better understanding coming topics.

Thursday, 9 May 2013

How does a Traceroure works ??

If you are working as a network administrator, system administrator, or in any system operations team, then you might have already heard about the tool named TRACEROUTE. Its a very handy tool available in most of the operating systems by default.

Network administrators and system administrators use this tool most commonly in their day to day activities. Its basically a network diagnostic tool that is very handy. There are three main primary objectives of traceroute tool. These objectives fulfilled by tracroute gives an insight to your network problem.

  1. The entire path that a packet travels through
  2. Names and identity of routers and devices in your path
  3. Network Latency or more specifically the time taken to send and receive data to each devices on the path

Its a tool that can be used to very the path that your data will take to reach its destination, without actually sending your data.

As I always say in my articles, its always good to understand how a particular tool works. Because its not the usage that helps you understand and troubleshoot an issue. But its the concept behind a tool that will always give you an insight into the problem. How to use a command can always be found out online or even inside the Linux man and info pages.

In this article I will explain the working of traceroute and types of traceroute tools along with their differences. We will also look at different switches available to traceroute command in Linux.

Basics First

Each IP packet that you send on the internet has got a field called as TTL. TTL stands for Time To Live. Although its called as Time To Live, its not actually the time in seconds, but its something else.

TTL is not measured by the no of seconds but the no of hops. Its the maximum number of hops that a packet can travel through across the internet, before its discarded.

Hops are nothing but the computers, routers, or any devices that comes in between the source and the destination.

What if there was no TTL at all?. If there was no TTL in an IP packet, the packet will flow endlessly from one router to another and on and on forever searching for the destination.  TTL value is set by the sender inside his IP packet ( the person using the system, or sending the packet, is unaware of these things going on under the hood, but is automatically handled by the operating system ).

If the destination is not found after traveling through too many routers in between ( hops ) and TTL value becomes 0 (which means no further travel) the receiving router will drop the packet and informs the original sender.

Original sender is informed that the TTl value exceeded and it cannot forward the packet further.

Let's say i need to reach 10.1.136.23 Ip address, and my default TTL value is 30 hops. Which means i can travel a maximum of 30 hops to reach my destination, before which the packet is dropped.

But how will the routers in between determine the TTL value limit has reached. Each router that comes in between the source and destination will go on reducing the TTL value before sending to the next router. Which means if i have a default TTL value of 30, then my first router will reduce it to 29 and then send that to the next router across the path.

The receiving router will make it 28 and send to the next and so on. If a router receives a packet with TTl of 1 (which means no more further traveling, and no forwarding ), the packet is discarded. But the router which discards the packet will inform the original sender that the TTL value has exceeded.!

The information send by the router receiving a packet with TTL of 1 back to the original sender is called as "ICMP TTL exceeded messages". Of course in internet when you send something to a receiver, the receiver will come to know the address of the sender.

Hence when an ICMP TTL exceeded message is sent by a router, the original sender will come to know the address of the router.

Traceroute makes use of this TTL exceeded messages to find out routers that come across your path to destination(Because these exceeded messages send by the router will contain its address).


But how does Traceroute uses TTL exceeded message to find out routers/hops in between?

You might be thinking, TTL exceeded messages are only send by the router that receives a packet with TTL of 1. That's correct, every router in between you and your receiver will not send TTL exceeded message. Then how will you find the address of all the routers/hops in between you and your destination. Because the main purpose of Traceroute is to identify the hops between you and your destination.

But you can exploit this behavior of sending TTL exceeded messages by routers/hops in between by purposely sending an IP packet with a TTL value of 1.


See an example diagram of the whole process in the below diagram, where a sender does a traceroute towards one of the servers a remote location.

Traceroute Working Explained


So let's say i want to do a traceroute to google's publicly available DNS server(8.8.8.8). My traceroute command and its result will look something like the below.


1
2
3
4
5
6
7
8
9
10
root@workstation:~# traceroute -n 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
 1  192.168.0.1  6.768 ms  6.462 ms  6.223 ms
 2  183.83.192.1  5.842 ms  5.543 ms  5.288 ms
 3  183.82.14.5  5.078 ms  6.755 ms  6.468 ms
 4  183.82.14.57  20.789 ms  27.609 ms  27.931 ms
 5  72.14.194.18  17.821 ms  17.652 ms  17.465 ms
 6  66.249.94.170  19.378 ms  15.975 ms  23.017 ms
 7  209.85.241.21  16.633 ms  16.607 ms  17.428 ms
 8  8.8.8.8  17.144 ms  17.662 ms  17.228 ms


Let's see what's happening under the hood. When i fire that command of traceroute -n 8.8.8.8, what my computer does is to make a UDP packet (Yeah its UDP. Don't worry we will be discussing this in detail ). This UDP packet will contain the following things.

  • My Source Address (Which is my IP address)
  • Destination address (Which is 8.8.8.8)
  • And A destination UDP port number which is invalid. Means the traceroute utility will send packet to a UDP port in the range of 33434 to 33534, Which is normally unused.

So Let's see how this thing works.

Step 1: My Source address will make a packet with destination ip address of 8.8.8.8 and a destination port number between 33434 to 33534. And the important thing it does it to make the TTL Value 1

Step 2: Of course my packet will reach my gateway server. On seeing receiving the packet my gateway server will reduce the TTL by 1 (All routers/hops in between does this job of reducing the TTL value by 1). Once the TTL is reduced by the value of 1 (1-1= 0), the TTL value becomes zero. Hence my gateway server will send me back a TTL Time exceeded message. Please remember that when my gateway server sends a TTL exceeded message back to me, it will send the first 28 byte header of the initial packet i send.

Step 3:  On receiving this TTL Time exceeded message, my traceroute program will come to know the source address and other details about the first hop (Which is my gateway server.).

Step 4: Now the traceroute program will again send the same UDP packet with the destination of 8.8.8.8, and a random UDP destination port between 33434 to 33534. But this time i will make the initial TTL 2.  This is because my gateway router will reduce it by 1 and then forwards that same packet which send to the next hop/router (the packet send by my gateway to its next hop will have a TTL value of 1).

Step 5: On receiving UDP packet, the next hop to my gateway server will once again reduce it to 1 which means now the TTL has once again become 0. Hence it will send me back a ICMP Time exceeded message with its source address, and also the first 28 byte header of the packet which i send.

Step 6: On receiving that message of TTL Time Exceeded, my traceroute program will come to know about that hop/routers IP address and it will show that on my screen.

Step 7: Now again my traceroute program will make a similar UDP packet with again a random udp port with the destination address of 8.8.8.8. But this time the ttl value is made to 3, so that the ttl will automatically become 0, when it reaches the third hop/router(Please remember that my gateway and the next hop to it, will reduce it by 1 ). So that it will reply me with a TTL Time exceeded message, and my traceroute program will come to know about that hop/routers IP  address.
Step 8: On receiving that reply, the traceroute program will once again make a UDP packet with TTL value of 4 this time. If i gets a TTL Time exceeded for that also, then my traceroute program will send a UDP packet with TTL of 5 and so on.

But how will my traceroute program come to know that the final destination of 8.8.8.8 has reached. The traceroute program will come to know about that because, when the original receiver of the packet 8.8.8.8 (remember that all UDP packet had a destination address of 8.8.8.8) gets the request it will send me a message that will be completely different from all the messages of "TTL Time exceeded".

When the original receiver (8.8.8.8) gets my UDP packet, it will send me a "ICMP Destination/PORT Unreachable" message. This is bound to happen because we are always sending a random UDP port between 33434 to 33534. Hence my Traceroute program will come to know that we have reached the final destination and will stop sending any further packets.

Now anything described in words is called a theory. We need to confirm this, by doing a tcpdump while doing a traceroute. Let's see the tcpdump output. Please note that i will not show you the entire output of tcpdump because it too long.

Run traceroute on one terminal of your linux machine. And on another terminal run the below tcpdump command to see what happens.



































root@workstation:~# tcpdump -n '(icmp or udp)' -vvv
12:13:06.585187 IP (tos 0x0, ttl 1, id 37285, offset 0, flags [none], proto UDP (17), length 60)
    192.168.0.102.43143 > 8.8.8.8.33434: [bad udp cksum 0xd157 -> 0x0e59!] UDP, length 32
12:13:06.585218 IP (tos 0x0, ttl 1, id 37286, offset 0, flags [none], proto UDP (17), length 60)
    192.168.0.102.38682 > 8.8.8.8.33435: [bad udp cksum 0xd157 -> 0x1fc5!] UDP, length 32
12:13:06.585228 IP (tos 0x0, ttl 1, id 37287, offset 0, flags [none], proto UDP (17), length 60)
    192.168.0.102.48381 > 8.8.8.8.33436: [bad udp cksum 0xd157 -> 0xf9e0!] UDP, length 32
12:13:06.585237 IP (tos 0x0, ttl 2, id 37288, offset 0, flags [none], proto UDP (17), length 60)
    192.168.0.102.57602 > 8.8.8.8.33437: [bad udp cksum 0xd157 -> 0xd5da!] UDP, length 32
12:13:06.585247 IP (tos 0x0, ttl 2, id 37289, offset 0, flags [none], proto UDP (17), length 60)
    192.168.0.102.39195 > 8.8.8.8.33438: [bad udp cksum 0xd157 -> 0x1dc1!] UDP, length 32
12:13:06.585256 IP (tos 0x0, ttl 2, id 37290, offset 0, flags [none], proto UDP (17), length 60)
    192.168.0.102.47823 > 8.8.8.8.33439: [bad udp cksum 0xd157 -> 0xfc0b!] UDP, length 32
12:13:06.585264 IP (tos 0x0, ttl 3, id 37291, offset 0, flags [none], proto UDP (17), length 60)
    192.168.0.102.52815 > 8.8.8.8.33440: [bad udp cksum 0xd157 -> 0xe88a!] UDP, length 32
12:13:06.585273 IP (tos 0x0, ttl 3, id 37292, offset 0, flags [none], proto UDP (17), length 60)
    192.168.0.102.51780 > 8.8.8.8.33441: [bad udp cksum 0xd157 -> 0xec94!] UDP, length 32
12:13:06.585281 IP (tos 0x0, ttl 3, id 37293, offset 0, flags [none], proto UDP (17), length 60)
    192.168.0.102.34782 > 8.8.8.8.33442: [bad udp cksum 0xd157 -> 0x2efa!] UDP, length 32
12:13:06.585290 IP (tos 0x0, ttl 4, id 37294, offset 0, flags [none], proto UDP (17), length 60)
    192.168.0.102.53015 > 8.8.8.8.33443: [bad udp cksum 0xd157 -> 0xe7bf!] UDP, length 32
12:13:06.585299 IP (tos 0x0, ttl 4, id 37295, offset 0, flags [none], proto UDP (17), length 60)
    192.168.0.102.58417 > 8.8.8.8.33444: [bad udp cksum 0xd157 -> 0xd2a4!] UDP, length 32
12:13:06.585308 IP (tos 0x0, ttl 4, id 37296, offset 0, flags [none], proto UDP (17), length 60)
    192.168.0.102.55943 > 8.8.8.8.33445: [bad udp cksum 0xd157 -> 0xdc4d!] UDP, length 32
12:13:06.585318 IP (tos 0x0, ttl 5, id 37297, offset 0, flags [none], proto UDP (17), length 60)
    192.168.0.102.33265 > 8.8.8.8.33446: [bad udp cksum 0xd157 -> 0x34e3!] UDP, length 32
12:13:06.585327 IP (tos 0x0, ttl 5, id 37298, offset 0, flags [none], proto UDP (17), length 60)
    192.168.0.102.53485 > 8.8.8.8.33447: [bad udp cksum 0xd157 -> 0xe5e5!] UDP, length 32
12:13:06.585335 IP (tos 0x0, ttl 5, id 37299, offset 0, flags [none], proto UDP (17), length 60)
    192.168.0.102.40992 > 8.8.8.8.33448: [bad udp cksum 0xd157 -> 0x16b2!] UDP, length 32
12:13:06.585344 IP (tos 0x0, ttl 6, id 37300, offset 0, flags [none], proto UDP (17), length 60)
    192.168.0.102.41538 > 8.8.8.8.33449: [bad udp cksum 0xd157 -> 0x148f!] UDP, length 32

The above output only shows the UDP packets my machine send.. I will show the reply messages seperate to make this more clear.
Notice the TTL value on each line. It starts from TTL of 1 and then 2, and then 3 till TTL 6. But you might be wondering why my server is sending 3 UDP messages with TTL value of 1 and then 2 and then 3.?

The reason behind this is to calculate an average Round Trip Time. Traceroute program sends three UDP packets to each hop to measure the exact average round trip time. Round trip time is nothing but the time it took to send and then receive the reply in milliseconds. I purposely didn't mention about this in the beginning to avoid confusion.

So the bottom line is my traceroute program sends three UDP packets to each hop to simply calculate the round trip average. because the traceroute output shows you those three values in its output. Please see the traceroute output more closely. It shows three millisecond values for each hop. To get a clear idea about the round trip time.


Now let's see the reply we got from all the hops through TCPDUMP. Please note that the reply messages am showing below are part of the same tcpdump i did above, but showing you seperately to make this more clear.

One more interesting thing to note is that each time my traceroute program is sending a different random UDP port number. This is to identify the reply belonged to which packet. As told before the reply messages send by the hops and destination contains the header of original packet we send, hence traceroute program can accurately calculate the round trip time (For each three UDP packets send to each hop), as it can easily identify the reply and correlate. The random port numbers are sort of identifiers to identify the reply.

The reply messages looks like the below.










192.168.0.1 > 192.168.0.102: ICMP time exceeded in-transit, length 68
        IP (tos 0x0, ttl 1, id 37285, offset 0, flags [none], proto UDP (17), le                                                                                        ngth 60)
    192.168.0.1 > 192.168.0.102: ICMP time exceeded in-transit, length 68
        IP (tos 0x0, ttl 1, id 37286, offset 0, flags [none], proto UDP (17), le                                                                                        ngth 60)
    183.83.192.1 > 192.168.0.102: ICMP time exceeded in-transit, length 60
        IP (tos 0x0, id 37288, offset 0, flags [none], proto UDP (17), length 60                                                                                        )
    192.168.0.1 > 192.168.0.102: ICMP time exceeded in-transit, length 68
        IP (tos 0x0, ttl 1, id 37287, offset 0, flags [none], proto UDP (17), le                                                                                        ngth 60)


Please note the ICMP time exceeded messages in the reply shown above (I have not shown all reply messages).

Now let me show the final message which is different than the ICMP time exceeded message. This messages is a destination port unreachable, as told before. And my traceroute program will come to know that our destination has reached.








8.8.8.8 > 192.168.0.102: ICMP 8.8.8.8 udp port 33458 unreachable, length 68
        IP (tos 0x80, ttl 2, id 37309, offset 0, flags [none], proto UDP (17), l                                                                                        ength 60)
    8.8.8.8 > 192.168.0.102: ICMP 8.8.8.8 udp port 33457 unreachable, length 68
        IP (tos 0x80, ttl 1, id 37308, offset 0, flags [none], proto UDP (17), l                                                                                        ength 60)
    8.8.8.8 > 192.168.0.102: ICMP 8.8.8.8 udp port 33459 unreachable, length 68
        IP (tos 0x80, ttl 2, id 37310, offset 0, flags [none], proto UDP (17), l                                                                                        ength 60)


Note that there are three replies from 8.8.8.8 to my traceroute program. As told before traceroute sends three similar UDP packets with different ports to simply calculate the round trip time. The final destination is nothing different.

Different types of Traceroute program

There are different types of traceroute programs. Each of them works slightly differently. But the overall concept behind each of them is the same. All of them uses the TTL value.

Why different implementations? That because you can use the one which is applicable to your environment. If suppose A firewall block the UDP traffic then you can use another traceroute for this purpose.  The different types are mentioned below.

  • UDP Traceroute
  • ICMP traceroute
  • TCP Traceroute

The one we used previously is UDP traceroute. Its the default protocol used by linux traceroute program. However you can ask our traceroute utility in linux to use ICMP instead of UDP by the below command.



root@workstation:~# traceroute -I -n 8.8.8.8

ICMP for traceroute works the same way as UDP traceroute. Traceroute program will send ICMP Echo Request messages and the hops in between will reply with a ICMP Time exceeded messages. But the final destination will reply with ICMP Echo reply.

Tracert command available in windows operating system by default uses ICMP traceroute method.

Now the last is the most interesting one. Its called TCP traceroute. Its used because almost all firewall and routers in between allows you to send TCP traffic. And if the packet is toward port 80, which is the web traffic then most of the routers allow that packet. TCPTRACEROUTE by default sends TCP SYN requests to port 80.


All routers in between the source and destination will send a TTL time exceeded message and the destination will send either a RST packet if port 80 is closed or will send a SYN/ACK packet(But tcptraceroute does not make a tcp connection, on receiving the SYN/ACK packet, traceroute program will send a RST packet to close the connection). Hence the traceroute program comes to know that the destination has reached.

Please note the fact that the -n option i have used in the previously shown traceroute command will not do a DNS name resolution. Otherwise traceroute will send dns queries for all the hops it finds on the way.


Now the main question is which one should i use from icmp, udp, and tcp traceroutes?

It all depends on the environment. If suppose the routers in between blocks a particular protocol then you must try using the other.