Automation with the Java Robot Class

Java logo

I’ve recently been experimenting with Java Robot class, which allows one to control mouse pointer position, clicks and emulate keyboard input. This makes the Java Robot class perfect for implementing automation in systems which have no accessible API.

The Robot class can also be used to take screenshots of all or part of the screen, allowing automated tasks to be visually verified. For example, a program designed to open a specific file in a basic image editor could take to the following steps:

  1. Press CTRL+O, the keyboard shortcut typically used to access the ‘Open File’ window.
  2. Wait 1 second for file open window to appear.
  3. Type the file name ‘test.jpg’.
  4. Press Enter.
  5. Wait 2 seconds for file to open.
  6. etc.

In the above example, delays are used between keyboard input actions, in an attempt to make sure the application the robot is interacting with is ‘keeping up’.

This works, but could be unreliable if the application we are interacting with is delayed for any reason. There are many reasons why this may be the case, such as background tasks running (antivirus, operating system updates, etc.). As well as this, the application we are interacting with may behave differently than expected. For example, the application could show a dialog box that was not expected. Using the current program, the robot would continue typing regardless. This is the equivalent of a human attempting to use a computer normally, whilst at the same time, being blindfolded.

To get around this, we can use the screenshot ability of the Robot class to visually verify that the results of our keyboard or mouse actions are as expected. The example below shows how we could re-work the previous set of instructions, to utilise visual verification.

  1. Press CTRL+O, the keyboard shortcut typically used to access the ‘Open File’ window.
  2. Take a screenshot and search the screen for the title bar of the expected ‘Open File’ window. Repeat until found.
  3. Type the file name ‘test.jpg’.
  4. Press Enter.
  5. Take a screenshot and search the screen for the expected image that we just asked the image editor to open. Repeat until found.
  6. etc.

This method not only visually verifies what we expected actually happened, but also works to speed up the program. If the loading of the image took less than the ‘2 seconds’ in the first example, the program would still wait regardless. In the second example, the robot would continue with its sequence of events as soon as it determined the image was loaded (and being shown in the image editor’s window).

I’ve recently become quite interest in automation via this class and have written a number of application’s to do various tasks, from playing basic games to automating the operating of a complex industry standard application.

Some of what I have with these ideas is listed below.

Automated batch processing of files using an industry standard application

I am not currently able to discuss the precise details of the application this robot uses and thus this description is very vague, but I can say that this application of the Robot class does the following.

  1. Retrieves task information from a central database-driven API that holds a queue of task to complete
  2. Downloads the prerequisite file required to complete the task
  3. Emulates the pressing of keyboard shortcuts to make the application it interacts with complete the task requested, and save out the required results.
  4. Submits the results to the central API, which then marks that specific task as complete.

In this implementation, I wrote the server side code and the client side robot to work directly with one another to automate the batch production of these ‘results’. This results are therefore generated at speeds much faster than any human could operate the application being used, via the rapid, automated use of keyboard shortcuts.

I’ll post again about this when I’m permitted to add additional details.

Defeating basic 2D games

Using the ‘Are you Human?’ game based CAPTCHAs as a test bed, I have successfully created a robot that can defeat the majority of the small games available on the demo page at

The program scans the screen for instructions to determine if a game it is capable of playing is shown and when found, will click the start button and follow a sequence of steps to play and win that particular game type.

The majority of these games revolve around dragging moving objects to a destination object, such as dragging images of food to a fridge. This is completed in two steps.

The first step is scanning the screen for a known image that need dragging, moving the mouse cursor to it gradually and pressing the left mouse button. In the second step, the program scans the screen for the image of the destination area, and when found gradually moves the move cursor to it, and releases the left mouse button. These two steps, repeated for all known images we need to drag, results in easily winning the game.

Due to the nature of these games being used as CAPTCHAs, I will not be released this code to anyone so please do not ask. Unless the guys at ‘are you human’ wish to use it to help improve their system, in which case I will be happy to provide them with the working program.

It is worth noting that after working on this as a programming challenge for myself, I discovered someone else also had the same idea. An article at Spamtech discusses cracking the Are you human CAPTCHAs, this time in Python using existing computer vision libraries.

Tracking and following targets in a 3D world

This was another challenge I set myself. Being a fan of Minecraft, put very simply, a first-person 3D block building game, I thought it may be interested to see how easy it would be to make an automated bot that would identify an object, aim the centre screen cross hair at it and then walk towards it.

I chose the common ‘yellow flower’ of Minecraft as my target object and proceeded to write the code to identify this object on the screen.

Due to the nature of 3D perspective, it was not possible to simply scan the screen for previously captured images of the yellow flower, and its size and orientation differed dependent of how far away the object was from the player and which way the player was facing. To work around this issues, I chose to simply identify the yellow flower by searching for pixels on the screen within a certain RGB range that matched the colour of the yellow flower within the game’s day and night cycles.

After identifying the flower, the robot needed to aim at the flower. The 3D nature of the game again complicates things here, as a single movement to slightly re-aim modifies the entire view of the world from the first person perspective. I coded this such that when the flower was identified, its location on screen would be compared to the location of the cross hair. The mouse would then be moved a small number of pixels towards this location, thus moving the 3D view slightly closer to the target. Because this slightly movement has caused the view to change, the flower’s location needs to be identified again.

The process of identifying the flower and moving the mouse cursor slightly towards it is repeated continually, until the yellow flower is identified as being within an acceptable range of the cross hair. It can then be said the flower is directly in front of the player. At this point, ‘W’ is held for 1 second (to move foward), released, and then the entire process repeats to keep tracking and moving towards the flower.

The entire process looks like this:

  1. Identify position of the yellow flower on the screen based on appropriate colour.
  2. If the yellow flower is close to the cross hair, and thus is directly in front of the player, move forward for 1 second.
  3. If the yellow flower is not close to the cross hair, move the mouse pointer slightly in the direction of the flower to adjust to view point.
  4. Go to step 1.

The only other part of this little program was a small piece of code to handle a situation where no appropriate yellow was detected. I handled this simply by having the program move the mouse right slowly, by a small number of pixels in order to turn the player’s view right, in the hope of finding something yellow.

It is worth noting that my recognition of the yellow flower simply by a RGB range similar to the colour of the flower turned out to be insufficient on its own. Left to its own devices in a standard Minecraft world, I found the robot not only chasing down yellow flowers, but also being drawn to the orange-yellowish of torch flames.

In addition, it attempted to commit suicide several times by attempting to jump into pits of glowing  red/yellow lava!

The Future of IBM’s Watson

After defeating the two greatest Jeopardy! champions of all time, the technology behind Watson will now be applied to some of the world’s most enticing challenges. Watch a breakdown of the match from Ken Jennings, Brad Rutter and the IBM team members as they look toward the future.

Want more information? IBM’s Watson AI software

IBM’s Watson AI

Watson is an artificial intelligence program developed by IBM designed to answer questions posed in natural language. Named after IBM’s founder, Thomas J. Watson, Watson is being developed as part of the DeepQA research project. The program runs on POWER7 processor-based systems.

In 2011, Watson competed on the television quiz show Jeopardy! as a test of its abilities. In a two-game, combined-point match aired in three Jeopardy! episodes running from February 14–16, Watson bested Brad Rutter, the biggest all-time money winner on Jeopardy!, and Ken Jennings, the record holder for the longest championship streak. Watson received first prize of $1 million, while Ken Jennings and Brad Rutter received $300,000 and $200,000, respectively. Jennings and Rutter pledged to donate half their winnings to charity, while IBM divided Watson’s winnings among two charities. This was the first man-versus-machine competition in Jeopardy!’s history.


How do you define intelligence?

Intelligence is a term describing a property of the mind including related abilities, such as the capacities for abstract thought, understanding, communication, reasoning, learning, learning from past experiences, planning, and problem solving.

The Study of Intelligence

Intelligence is most widely studied in humans, but is also observed in animals and plants. Artificial intelligence is the intelligence of machines or the simulation of intelligence in machines.

Numerous definitions of and hypotheses about intelligence have been proposed since before the twentieth century, with no consensus yet reached by scholars. Within the discipline of psychology, various approaches to human intelligence have been adopted, with the psychometric approach being especially familiar to the general public. Influenced by his cousin Charles Darwin, Francis Galton was the first scientist to propose a theory of general intelligence; that intelligence is a true, biologically-based mental faculty that can be studied by measuring a person’s reaction times to cognitive tasks. Galton’s research in measuring the head sizes of British scientists and laymen led to the conclusion that head-size is unrelated to a person’s intelligence.

Alfred Binet, and the French school of intelligence, believed intelligence was an aggregate of dissimilar abilities, not a unitary entity with specific, identifiable properties.


Dennis Hong’s Awesome Robots

Dennis Hong, the founder and director of RoMeLa, a technology and robotics lab based in Virginia, gets straight in to discussion and video demonstrations of his robotic lab’s awesome robots in his talk below. The robots focus on various forms of robotic motion, such as walking, climbing and humanoid style.

I really think some of these robots are fantastic. They really demo some of the advanced strides in robot motion that are going ahead. Yeap, strides… motions… I made an awful pun.

More so than the physical engineering behind these robots, I really admire and am interested by the software engineering that control these robots. The programming and artificial intelligence structures that must be used to control this motion accurately are fantastic. Artificial intelligence elements must predict necessary adjustments for robotic leg motor must be very quick and accurate to deal with unstable surfaces, such as the walker on ice in the video above.

I admire the guys who engineered and programmed these robots. Legged robotic motion is still one of the biggest challenges facing modern robotics in my opinion, with computer vision and object recognition being the most problematic.

Does anyone else interesting in robotics or artificial intelligence have any other examples of robotic motion? Or how about a great demo of computer vision in action?

Henry Markram discusses computer simulated brains

I previously discussed how Henry Markram claimed to be able to build an artificial brain within 10 years time, so I thought it would be apt to show footage of a lecture he gave at TED regarding this topic.

I still find this idea to be fascinating along with all the moral, scientific and philosophical arguments that arise from such a concept. Already small rodent brains have been successfully modelled in computer systems. Henry Markram states that many of the mysteries of the human brain can be solved by computing modelling. Mental illnesses, memory and perception are all made of the neurons and electrical signals within the human brain, this is a given. However, Markram plans to find all of these links via a supercomputer software simulation of all ~ 100,000 million synapses within the human brain.

What do you think about this topic? A full simulation of the human brain and that of many other mammals – will this put an end to animal brain testing and will it assist in finding cures for various mental illnesses and memory problems in human beings?

AI Brain – Possible within ten years time according to Henry Markram

If you know me much at all, you’ll know I have a small hobby centred around artificial intelligence. Fact or fiction, anything containing traces of AI tends to peak my interest, so obviously when a leading scientist in the field, Henry Markram made this announcement I was most interested.

Henry Markram, the director of the AI Blue Brain Project, has already simulated many parts of lower life-form brains, such as rat. He announced at this year’s TED Global conference that “It is not impossible to build a human brain and we can do it in 10 years,” and comically added that “… if we do succeed, we will send a hologram to TED to talk.”

One of the primary uses of this artificial brain, it is stated, will be research into and the hope for cures of various mental illnesses. This is an admirable goal indeed and shows how projects that initially seem set in motion initially for research purposes, can be directed to a humanitarian cause. This is fantastic is my opinion, but it was certainly not the first thing that crossed my mind regarding the idea of the creation of a functional human.

If you are like myself, and believe that we (as humans) are the sum of our parts, you most likely do not believe in a ‘soul’ or spiritual presence that defines us as who we are. In which case, you believe our personality, memories, our will and those things often referred to as being part of a ‘soul’ are all in fact, stored within our human brains. All our actions, decisions and motivations as a species have therefore been dedicated by the neural impulses within our brains, ignoring overriding environment factors. Assuming this is indeed the case, if we were to create a fully functional human brain (be it artificial), then what would we actually be creating?

If you believe what was just assumed, we would be creating an ‘entity’ capable of actions, decisions and motivations and would have a personality, memories and will. In which case, could it be deemed to have a ‘soul’ if such things are to be believed in? It would, regardless of spiritual meaning, have all the attributes we commonly refer as being part of a ‘soul’. What does this mean? If fully functional, the brain would be conscious, be capable of thought, decisions and learning right from wrong. Would this give the ‘entity’ rights?

I expect the arguments that this ‘entity’ is only artificial, but then again, many humans who have electronic or mechanical implants are, in part, artificial. It could also be said that the ‘entity’ is merely a computer. The counter-argument in this case is obviously that so are we. We may be biological in nature, but the human brain is merely that; a biological computer, even using electronic impulses sent from neuron to synapse to neuron, causing all of our own personal processing and data storage in the form of thoughts and memories.

Please, feel free to give your opinions. What do you think of the idea of an artificial brain? Should it be seen a life form, of sorts? Or do you simply not believe that the creation of an artificial human brain will be possible within the next 10 years, or indeed ever?