Automation with the Java Robot Class

Posted on 9th August 2017 - Takes 8 minutes to read


I've recently been experimenting with Java Robot class, which allows one to control mouse pointer position, clicks and emulate keyboard input. This makes the Java Robot class perfect for implementing automation in systems which have no accessible API.

The Robot class can also be used to take screenshots of all or part of the screen, allowing automated tasks to be visually verified. For example, a program designed to open a specific file in a basic image editor could take to the following steps:

  1. Press CTRL+O, the keyboard shortcut typically used to access the 'Open File' window.
  2. Wait 1 second for file open window to appear.
  3. Type the file name 'test.jpg'.
  4. Press Enter.
  5. Wait 2 seconds for file to open.
  6. etc.

In the above example, delays are used between keyboard input actions, in an attempt to make sure the application the robot is interacting with is 'keeping up'.

This works, but could be unreliable if the application we are interacting with is delayed for any reason. There are many reasons why this may be the case, such as background tasks running (antivirus, operating system updates, etc.). As well as this, the application we are interacting with may behave differently than expected. For example, the application could show a dialog box that was not expected. Using the current program, the robot would continue typing regardless. This is the equivalent of a human attempting to use a computer normally, whilst at the same time, being blindfolded.

To get around this, we can use the screenshot ability of the Robot class to visually verify that the results of our keyboard or mouse actions are as expected. The example below shows how we could re-work the previous set of instructions, to utilise visual verification.

  1. Press CTRL+O, the keyboard shortcut typically used to access the 'Open File' window.
  2. Take a screenshot and search the screen for the title bar of the expected 'Open File' window. Repeat until found.
  3. Type the file name 'test.jpg'.
  4. Press Enter.
  5. Take a screenshot and search the screen for the expected image that we just asked the image editor to open. Repeat until found.
  6. etc.

This method not only visually verifies what we expected actually happened, but also works to speed up the program. If the loading of the image took less than the '2 seconds' in the first example, the program would still wait regardless. In the second example, the robot would continue with its sequence of events as soon as it determined the image was loaded (and being shown in the image editor's window).

I've recently become quite interest in automation via this class and have written a number of application's to do various tasks, from playing basic games to automating the operating of a complex industry standard application.

Some of what I have with these ideas is listed below.

Automated batch processing of files using an industry standard application

I am not currently able to discuss the precise details of the application this robot uses and thus this description is very vague, but I can say that this application of the Robot class does the following.

  1. Retrieves task information from a central database-driven API that holds a queue of task to complete
  2. Downloads the prerequisite file required to complete the task
  3. Emulates the pressing of keyboard shortcuts to make the application it interacts with complete the task requested, and save out the required results.
  4. Submits the results to the central API, which then marks that specific task as complete.

In this implementation, I wrote the server side code and the client side robot to work directly with one another to automate the batch production of these ‘results'. This results are therefore generated at speeds much faster than any human could operate the application being used, via the rapid, automated use of keyboard shortcuts.

I'll post again about this when I'm permitted to add additional details.

Defeating basic 2D games

Using the 'Are you Human?' game based CAPTCHAs as a test bed, I have successfully created a robot that can defeat the majority of the small games available on the demo page at http://areyouahuman.com/demo.

The program scans the screen for instructions to determine if a game it is capable of playing is shown and when found, will click the start button and follow a sequence of steps to play and win that particular game type.

The majority of these games revolve around dragging moving objects to a destination object, such as dragging images of food to a fridge. This is completed in two steps.

The first step is scanning the screen for a known image that need dragging, moving the mouse cursor to it gradually and pressing the left mouse button. In the second step, the program scans the screen for the image of the destination area, and when found gradually moves the move cursor to it, and releases the left mouse button. These two steps, repeated for all known images we need to drag, results in easily winning the game.

Due to the nature of these games being used as CAPTCHAs, I will not be released this code to anyone so please do not ask. Unless the guys at 'are you human' wish to use it to help improve their system, in which case I will be happy to provide them with the working program.

It is worth noting that after working on this as a programming challenge for myself, I discovered someone else also had the same idea. An article at Spamtech discusses cracking the Are you human CAPTCHAs, this time in Python using existing computer vision libraries.

Tracking and following targets in a 3D world

This was another challenge I set myself. Being a fan of Minecraft, put very simply, a first-person 3D block building game, I thought it may be interested to see how easy it would be to make an automated bot that would identify an object, aim the centre screen cross hair at it and then walk towards it.

I chose the common 'yellow flower' of Minecraft as my target object and proceeded to write the code to identify this object on the screen.

Due to the nature of 3D perspective, it was not possible to simply scan the screen for previously captured images of the yellow flower, and its size and orientation differed dependent of how far away the object was from the player and which way the player was facing. To work around this issues, I chose to simply identify the yellow flower by searching for pixels on the screen within a certain RGB range that matched the colour of the yellow flower within the game's day and night cycles.

After identifying the flower, the robot needed to aim at the flower. The 3D nature of the game again complicates things here, as a single movement to slightly re-aim modifies the entire view of the world from the first person perspective. I coded this such that when the flower was identified, its location on screen would be compared to the location of the cross hair. The mouse would then be moved a small number of pixels towards this location, thus moving the 3D view slightly closer to the target. Because this slightly movement has caused the view to change, the flower's location needs to be identified again.

The process of identifying the flower and moving the mouse cursor slightly towards it is repeated continually, until the yellow flower is identified as being within an acceptable range of the cross hair. It can then be said the flower is directly in front of the player. At this point, 'W' is held for 1 second (to move foward), released, and then the entire process repeats to keep tracking and moving towards the flower.

The entire process looks like this:

  1. Identify position of the yellow flower on the screen based on appropriate colour.
  2. If the yellow flower is close to the cross hair, and thus is directly in front of the player, move forward for 1 second.
  3. If the yellow flower is not close to the cross hair, move the mouse pointer slightly in the direction of the flower to adjust to view point.
  4. Go to step 1.

The only other part of this little program was a small piece of code to handle a situation where no appropriate yellow was detected. I handled this simply by having the program move the mouse right slowly, by a small number of pixels in order to turn the player's view right, in the hope of finding something yellow.

It is worth noting that my recognition of the yellow flower simply by a RGB range similar to the colour of the flower turned out to be insufficient on its own. Left to its own devices in a standard Minecraft world, I found the robot not only chasing down yellow flowers, but also being drawn to the orange-yellowish of torch flames.

In addition, it attempted to commit suicide several times by attempting to jump into pits of glowing red/yellow lava!


Java automation CAPTCHA