GAE: Task Queue

[Fuente: https://developers.google.com/appengine/docs/java/taskqueue/]

With the Task Queue API, applications can perform work outside of a user request, initiated by a user request. If an app needs to execute some background work, it can use the Task Queue API to organize that work into small, discrete units, called tasks. The app adds tasks to task queues to be executed later.

App Engine provides two different queue configurations:

  1. Push queues process tasks based on the processing rate configured in the queue definition. App Engine automatically scales processing capacity to match your queue configuration and processing volume, and also deletes tasks after processing. Push queues are the default.
  2. Pull queues allow a task consumer (either your application or code external to your application) to lease tasks at a specific time for processing within a specific timeframe. Pull queues give you more control over when tasks are processed, and also allow you to integrate your application with non-App-Engine code using the experimental Task Queue REST API. When using pull queues, your application needs to handle scaling of instances based on processing volume, and also needs to delete tasks after processing.

This page provides the basic concepts common to both types of queues. Once you’ve understood the basics, you can check out the queue configuration pagepush queue overview, and pull queue overview to see how to configure and use these two types of queues.

  1. Task Queue concepts
  2. Task concepts
  3. Asynchronous operations

Task Queue concepts

Tasks queues are an efficient and powerful tool for background processing; they allow your application to define tasks, add them to a queue, and then use the queue to process them in aggregate. You name queues and configure their properties in a configuration file named queue.xml.

Push queues function only within the App Engine environment. These queues are the best choice for applications whose tasks work only with App Engine tools and services. With push queues, you simply configure a queue and add tasks to it. App Engine handles the rest. Push queues are easier to implement, but are restricted to use within App Engine. For more information about push queues and examples of how to use them, see Using Push Queues.

If you want to use a different system to consume tasks, pull queues are your best choice. In pull queues, a task consumer (either in your App Engine application, a backend, or code outside of App Engine) leases a specific number of tasks from a specific queue for a specific timeframe. After leasing tasks from a pull queue, the task consumer is responsible for deleting them. If you are consuming tasks from within App Engine, you can use calls from the com.google.appengine.api.taskqueue package. If you are consuming tasks from outside of App Engine, you need to use the Task Queue REST API. Pull queues give you more power and flexibility over when and where tasks are processed, but they require you to handle scaling of workers based on processing volume. Your task consumer also needs to delete tasks after processing.

In summary, push queues allow you to process tasks within App Engine at a steady rate and App Engine scales computing resources according to the number of tasks in your queue. Pull queues allow an alternate task consumer to process tasks at a specific time, either in or outside App Engine, but your application needs to scale workers based on processing volume, as well as delete tasks after processing.

You can read more about these two types of queues in the Using Push Queues and Using Pull Queues.

The default queue

For convenience, App Engine provides a default push queue for each application (there is no default pull queue). If you do not name a queue for a task, App Engine automatically inserts it into the default queue. You can use this queue immediately without any additional configuration. All modules and versions of the same application share the same default task queue.

The default queue is preconfigured with a throughput rate of 5 task invocations per second. If you want to change the preconfigured settings, simply define a queue named default in queue.xml. Code may always insert new tasks into the default queue, but if you wish to disable execution of these tasks, you may do so by clicking the Pause Queue button in the Task Queues tab of the Administration Console.

Named queues

While the default queue makes it easy to enqueue tasks with no configuration, you can also create custom queues by defining them in queue.xml. Custom queues allow you to more effectively handle task processing by grouping similar types of tasks. You can control the processing rate—and a number of other settings—based specifically on the type of task in each queue.

All versions of an application share the same named task queues.

For more information about configuring queues in queue.xml, please see Task Queue Configuration.

Task queues in the Administration Console

You can manage task queues for an application using the Task Queue tab of the Administration Console. The Task Queue tab lists all of the queues in the application. Clicking on a queue name brings up the Task Queue Details page where you can see all of the tasks scheduled to run in a queue and you can manually delete individual tasks or purge every task from a queue. This is useful if a task in a push queue cannot be completed successfully and is stuck waiting to be retried. You can also pause and resume a queue on this page.

You can view details of individual tasks by clicking the task name from the list of tasks on the Task Queue Details page. This page allows you to debug why a task did not run successfully. You can also see information about the previous run of the task as well as the task’s body.

Checking task queue statistics

You can view queue statistics in the Console to determine performance and status. However, you can also access queue statistics programmatically using theQueueStatistics class.

Task concepts

task is a unit of work to be performed by the application. Each task is an object of the TaskOptions class. Each Task object contains an endpoint (with a request handler for the task and an optional data payload that parameterizes the task). You can enqueue push tasks to a queue defined in queue.xml. Push tasks and pull tasks are defined differently; see the Using Push Queues and Using Pull Queues for specific usage details.

Task names

In addition to a task’s contents, you can declare a task’s name. Once a task with name N is written, any subsequent attempts to insert a task named N fail.

While this is generally true, task names do not provide an absolute guarantee of once-only semantics. In rare cases, multiple calls to create a task of the same name may succeed. It’s also possible in exceptional cases for a task to run more than once—even if it was only created once.

All project, queue, and task names must be a combination of one or more digits, letters a–z, underscores, and/or dashes, satisfying the following regular expression:

[0-9a-zA-Z\-\_]+

Task names may be up to 500 characters long.

If a push task is created successfully, it will eventually be deleted (at most seven days after the task successfully executes). Once deleted, its name can be reused.

If a pull task is created successfully, your application needs to delete the task after processing. The system may take up to seven days to recognize that a task has been deleted; during this time, the task name remains unavailable. Attempting to create another task during this time with the same name will result in an “item exists” error. The system offers no method to determine if deleted task names are still in the system. To avoid these issues, we recommend that you let App Engine generate the task name automatically.

Tasks within transactions

You can enqueue a task as part of a datastore transaction, such that the task is only enqueued—and guaranteed to be enqueued—if the transaction is committed successfully. Tasks added within a transaction are considered to be a part of it and have the same level of isolation and consistency.

An application cannot insert more than five transactional tasks into task queues during a single transaction. Transactional tasks must not have user-specified names.

The following code sample demonstrates how to insert transactional tasks into a push queue as part of a datastore transaction:

DatastoreService ds = DatastoreServiceFactory.getDatastoreService();
Queue queue = QueueFactory.getDefaultQueue();
try {
    Transaction txn = ds.beginTransaction();

    // ...

    queue.add(TaskOptions.Builder.withUrl("/path/to/my/worker"));

    // ...
    txn.commit();
} catch (DatastoreFailureException e) {
}

Deleting tasks

You have two ways to delete tasks:

  1. Using the Administration Console.
  2. Programmatically, using purge() to delete all tasks from the specified queue, or using deleteTask() to delete an individual task:
// Purge entire queue...
Queue queue = QueueFactory.getQueue("foo");
queue.purge();

// Delete an individual task...
Queue q = QueueFactory.getQueue("queue1");
q.deleteTask("foo")

Asynchronous operations

If you want to make asynchronous calls to a task queue, you use the asynchronous methods provided by the Queue class. Call get on the returned Future to force the request to complete. When asynchronously adding tasks in a transaction, you should call get() on the Future before committing the transaction to ensure that the request has finished.

 

Using Push Queues in Java

In App Engine push queues, a task is a unit of work to be performed by the application. Each task is an object of the TaskOptions class. Each Task object contains an application-specific URL with a request handler for the task, and an optional data payload that parameterizes the task.

For example, consider a calendaring application that needs to notify an invitee, via email, that an event has been updated. The data payload for this task consists of the email address and name of the invitee, along with a description of the event. The webhook might live at /app_worker/send_email and contain a function that adds the relevant strings to an email template and sends the email. The app can create a separate task for each email it needs to send.

You can use push queues only within the App Engine environment; if you need to access App Engine tasks from outside of App Engine, use pull queues.

  1. Using push queues
  2. Push task execution
  3. Deferred tasks
  4. URL endpoints
  5. Push queues and the development server
  6. Push queues and backends
  7. Quotas and limits for push queues

Using push queues

A Java app sets up queues using a configuration file named queue.xml, in the WEB-INF/ directory inside the WAR. See Java Task Queue Configuration. Every app has a push queue named default with some default settings.

To enqueue a task, you get a Queue using the QueueFactory, then call its add() method. You can get a named queue specified in the queue.xml file using thegetQueue() method of the factory, or you can get the default queue using getDefaultQueue(). You can call the Queue‘s add() method with a TaskOptions instance (produced by TaskOptions.Builder), or you can call it with no arguments to create a task with the default options for the queue.

The following code adds a task to a queue with options.

In index.html:

<!-- A basic index.html file served from the "/" URL. -->
<html>
  <body>
    <p>Enqueue a value, to be processed by a worker.</p>
    <form action="/enqueue" method="post">
      <input type="text" name="key">
      <input type="submit">
    </form>
  </body>
</html>

In Enqueue.java:

// The Enqueue servlet should be mapped to the "/enqueue" URL.
import com.google.appengine.api.taskqueue.Queue;
import com.google.appengine.api.taskqueue.QueueFactory;
import static com.google.appengine.api.taskqueue.TaskOptions.Builder.*;

public class Enqueue extends HttpServlet {
    protected void doPost(HttpServletRequest request, HttpServletResponse response)
            throws ServletException, IOException {
        String key = request.getParameter("key");

        // Add the task to the default queue.
        Queue queue = QueueFactory.getDefaultQueue();
        queue.add(withUrl("/worker").param("key", key));

        response.sendRedirect("/");
    }
}

In Worker.java:

// The Worker servlet should be mapped to the "/worker" URL.
public class Worker extends HttpServlet {
    protected void doPost(HttpServletRequest request, HttpServletResponse response)
            throws ServletException, IOException {
        String key = request.getParameter("key");
        // Do something with key.
    }
}

Tasks added to this queue will execute by calling the request handler at the URL /worker with the parameter key. They will execute at the rate set in thequeue.xml file, or the default rate of 5 tasks per second.

Push task execution

App Engine executes push tasks by sending HTTP requests to your app. Specifying a programmatic asynchronous callback as an HTTP request is sometimes called a web hook. The web hook model enables efficient parallel processing.

The task’s URL determines the handler for the task and the module that runs the handler.

The handler is determined by the path part of the URL (the forward-slash separated string following the hostname), which is specified by the url parameter in the TaskOptions that you include in your call to the Queue.add() method. The url must be relative and local to your application’s root directory.

 

The module (or frontend or backend) and version in which the handler runs is determined by:

 

If you do not specify any of these parameters, the task will run in the same module/version in which it was enqueued, subject to these rules:

  • If the default version of the app enqueues a task, the task will run on the default version. Note that if the app enqueues a task and the default version is changed before the task actually runs, the task will be executed in the new default version.
  • If a non-default version enqueues a task, the task will always run on that same version.

The namespace in which a push task runs is determined when the task is added to the queue. By default, a task will run in the current namespace of the process that created the task. You can override this behavior by explicitly setting the namespace before adding a task to a queue, as described on the multitenancy page.

A task must finish executing and send an HTTP response value between 200–299 within 10 minutes of the original request. This deadline is separate from user requests, which have a 60-second deadline. If your task’s execution nears the limit, App Engine raises a DeadlineExceededException that you can catch to save your work or log progress before the deadline passes. If the task failed to execute, App Engine retries it based on criteria that you can configure.

Task request headers

Requests from the Task Queue service contain the following HTTP headers:

  • X-AppEngine-QueueName, the name of the queue (possibly default)
  • X-AppEngine-TaskName, the name of the task, or a system-generated unique ID if no name was specified
  • X-AppEngine-TaskRetryCount, the number of times this task has been retried; for the first attempt, this value is 0. This number includes attempts where the task failed due to a lack of available instances and never reached the execution phase.
  • X-AppEngine-TaskExecutionCount, the number of times this task has previously failed during the execution phase. This number does not include failures due to a lack of available instances.
  • X-AppEngine-TaskETA, the target execution time of the task, specified in milliseconds since January 1st 1970.

These headers are set internally by Google App Engine. If your request handler finds any of these headers, it can trust that the request is a Task Queue request. If any of the above headers are present in an external user request to your app, they are stripped. The exception being requests from logged in administrators of the application, who are allowed to set the headers for testing purposes.

Tasks may be created with the X-AppEngine-FailFast header, which specifies that a task running on a backend fails immediately instead of waiting in a pending queue.

Google App Engine issues Task Queue requests from the IP address 0.1.0.2.

The rate of task execution

You set the maximum processing rate for the entire queue when you configure the queue. App Engine uses a token bucket algorithm to execute tasks once they’ve been delivered to the queue. Each queue has a token bucket, and each bucket holds a certain number of tokens. Your app consumes a token each time it executes a task. If the bucket runs out of tokens, the system pauses until the bucket has more tokens. The rate at which the bucket is refilled is the limiting factor that determines the rate of the queue. See Defining Push Queues and Processing Rates for more details.

To ensure that the Task Queue system does not overwhelm your application, it may throttle the rate at which requests are sent. This throttled rate is known as theenforced rate. The enforced rate may be decreased when your application returns a 503 HTTP response code, or if there are no instances able to execute a request for an extended period of time. You can view the enforced rate on the Task Queue tab of the Administration Console.

The order of task execution

The order in which tasks are executed depends on several factors:

  • The position of the task in the queue. App Engine attempts to process tasks based on FIFO (first in, first out) order. In general, tasks are inserted into the end of a queue, and executed from the head of the queue.
  • The backlog of tasks in the queue. The system attempts to deliver the lowest latency possible for any given task via specially optimized notifications to the scheduler. Thus, in the case that a queue has a large backlog of tasks, the system’s scheduling may “jump” new tasks to the head of the queue.
  • The value of the task’s etaMillis property. This specifies the earliest time that a task can execute. App Engine always waits until after the specified ETA to process push tasks.
  • The value of the task’s countdownMillis property. This specifies the minimum number of seconds to wait before executing a task. Countdown and eta are mutually exclusive; if you specify one, do not specify the other.

Task retries

If a push task request handler returns an HTTP status code within the range 200–299, App Engine considers the task to have completed successfully. If the task returns a status code outside of this range, App Engine retries the task until it succeeds. The system backs off gradually to avoid flooding your application with too many requests, but schedules retry attempts for failed tasks to recur at a maximum of once per hour.

You can also configure your own scheme for task retries using the retry-parameters element in queue.xml.

When implementing the code for tasks (as worker URLs within your app), it is important to consider whether the task is idempotent. App Engine’s Task Queue API is designed to only invoke a given task once; however, it is possible in exceptional circumstances that a task may execute multiple times (such as in the unlikely case of major system failure). Thus, your code must ensure that there are no harmful side-effects of repeated execution.

Deferred tasks

Setting up a handler for each distinct task (as described in the previous sections) can be cumbersome, as can serializing and deserializing complex arguments for the task — particularly if you have many diverse but small tasks that you want to run on the queue. The Java SDK includes an interface called DeferredTask. This interface lets you define a task as a single method. This interface uses Java serialization to package a unit of work into a Task Queue. A simple return from that method is considered success. Throwing any exception from that method is considered a failure.

URL endpoints

Push tasks reference their implementation via URL. For example, a task which fetches and parses an RSS feed might use a worker URL called/app_worker/fetch_feed. You can specify this worker URL or use the default. In general, you can use any URL as the worker for a task, so long as it is within your application; all task worker URLs must be specified as relative URLs:

import com.google.appengine.api.taskqueue.Queue;
import com.google.appengine.api.taskqueue.QueueFactory;
import com.google.appengine.api.taskqueue.TaskOptions.Method;
import com.google.appengine.api.taskqueue.TaskOptions.Builder.*;

// ...
    Queue queue = QueueFactory.getDefaultQueue();
    queue.add(withUrl("/path/to/my/worker"));
    queue.add(withUrl("/path?a=b&c=d").method(Method.GET));

If you do not specify a worker URL, the task uses a default worker URL named after the queue:

/_ah/queue/queue_name

A queue’s default URL is used if, and only if, a task does not have a worker URL of its own. If a task does have its own worker URL, then it is only invoked at the worker URL, never another. Once inserted into a queue, its url endpoint cannot be changed.

You can also target tasks to App Engine Backends. Backends allow you to process tasks beyond the 10-minute deadline for task execution. See Push Queues and Backends for more information.

Securing URLs for tasks

If a task performs sensitive operations (such as modifying important data), you might want to secure its worker URL to prevent a malicious external user from calling it directly. You can prevent users from accessing URLs of tasks by restricting access to administrator accounts. Task queues can access admin-only URLs. You can read about restricting URLs at Security and Authentication. An example you would use in web.xml to restrict everything starting with /tasks/ to admin-only is:

<security-constraint>
    <web-resource-collection>
        <web-resource-name>tasks</web-resource-name>
        <url-pattern>/tasks/*</url-pattern>
    </web-resource-collection>
    <auth-constraint>
        <role-name>admin</role-name>
    </auth-constraint>
</security-constraint>

For more on the format of web.xml, see the documentation on the deployment descriptor.

To test a task web hook, sign in as an administrator and visit the URL of the handler in your browser.

Push queues and the development server

When your app is running in the development server, tasks are automatically executed at the appropriate time just as in production.

To disable automatic execution of tasks, set the following jvm flag:

--jvm_flag=-Dtask_queue.disable_auto_task_execution=true

You can examine and manipulate tasks from the developer console at: http://localhost:8000/_ah/admin/taskqueue.

To execute tasks, select the queue by clicking on its name, select the tasks to execute, and click Run Now. To clear a queue without executing any tasks, clickPurge Queue.

The development server and the production server behave differently:

  • The development server doesn’t respect the <rate> and <bucket-size> attributes of your queues. As a result, tasks are executed as close to their ETA as possible. Setting a rate of 0 doesn’t prevent tasks from being executed automatically.
  • The development server doesn’t retry tasks.
  • The development server doesn’t preserve queue state across server restarts.

Push queues and backends

Push tasks typically must finish execution within 10 minutes. If you have push tasks that require more time or computing resources to process, you can use App Engine Backends to process these tasks outside of the normal limits of App Engine applications. The following code sample demonstrates how to create a push task addressed to an instance 1 of a backend named backend1:

import com.google.appengine.api.taskqueue.Queue;
import com.google.appengine.api.taskqueue.QueueFactory;
import static com.google.appengine.api.taskqueue.TaskOptions.Builder.*;
import com.google.appengine.api.backends.*;

// ...
    queue.add(withUrl("/path/to/my/worker").param("key", key).header("Host",
    BackendServiceFactory.getBackendService().getBackendAddress("backend1", 1));

Quotas and limits for push queues

Enqueuing a task in a push queue counts toward the following quotas:

  • Task Queue Stored Task Count
  • Task Queue Stored Task Bytes
  • Task Queue API Calls

The Task Queue Stored Task Bytes quota is configurable in queue.xml by setting <total-storage-limit>. This quota counts towards your Stored Data (billable) quota.

Execution of a task counts toward the following quotas:

  • Requests
  • Incoming Bandwidth
  • Outgoing Bandwidth

The act of executing a task consumes bandwidth-related quotas for the request and response data, just as if the request handler were called by a remote client. When the task queue processes a task, the response data is discarded.

Once a task has been executed or deleted, the storage used by that task is reclaimed. The reclaiming of storage quota for tasks happens at regular intervals, and this may not be reflected in the storage quota immediately after the task is deleted.

For more information on quotas, see Quotas, and the “Quota Details” section of the Admin Console.

The following limits apply to the use of push queues:

Push Queue Limits
Maximum task size 100KB
Maximum number of active queues (not including the default queue) Free apps: 10 queues, Billed apps: 100 queues
Queue execution rate 500 task invocations per second per queue
Maximum countdown/ETA for a task 30 days from the current date and time
Maximum number of tasks that can be added in a batch 100 tasks
Maximum number of tasks that can be added in a transaction 5 tasks