GAE: Almacenando datos

Storing Data

The App Engine environment provides a range of options for storing your data:

  • App Engine Datastore provides a NoSQL schemaless object datastore, with a query engine and atomic transactions.
  • Google Cloud SQL provides a relational SQL database for your App Engine application, based on the familiar MySQL database.
  • Google Cloud Storage provides a storage service for objects and files up to terabytes in size.

This section details the API for accessing the App Engine Datastore from Java. For details of the other storage options, see the documentation for Cloud SQL and Cloud Storage.

The Java Datastore API

The App Engine Datastore is a schemaless object datastore providing robust, scalable storage for your web application, with the following features:

  • No planned downtime
  • Atomic transactions
  • High availability of reads and writes
  • Strong consistency for reads and ancestor queries
  • Eventual consistency for all other queries

The Java Datastore SDK includes implementations of the Java Data Objects (JDO) and Java Persistence API (JPA) interfaces, as well as a low-level Datastore API.

This reference describes the Java interfaces for the App Engine datastore, with an emphasis on JDO. It has the following sections:

  1. Datastore Overview
  2. Entities, Properties, and Keys
  3. Datastore Queries
  4. Datastore Indexes
  5. Datastore Transactions
  6. Structuring Data for Strong Consistency
  7. Using the Master/Slave Datastore
  8. Metadata
  9. Datastore Statistics
  10. Async Datastore API
  11. Datastore Callbacks
  12. JDO
  13. JPA
  14. Google Cloud Storage
  15. Javadoc Reference

Datastore Overview

The App Engine Datastore is a schemaless object datastore providing robust, scalable storage for your web application, with the following features:

  • No planned downtime
  • Atomic transactions
  • High availability of reads and writes
  • Strong consistency for reads and ancestor queries
  • Eventual consistency for all other queries

The Java Datastore SDK includes implementations of the Java Data Objects (JDO) and Java Persistence API (JPA) interfaces, as well as a low-level Datastore API.

App Engine’s primary data repository is the High Replication Datastore (HRD), in which data is replicated across multiple data centers using a system based on the Paxos algorithm. This provides a high level of availability for reads and writes. Most queries are eventually consistent.

A second storage option, the Master/Slave Datastore, has been deprecated in favor of the HRD as of April 4, 2012. Although Google will continue to support the Master/Slave Datastore in accordance with our terms of service, it is strongly recommended that all new applications use the HRD instead, and that existing applications using the Master/Slave Datastore migrate to the HRD.

Watch a video demonstration comparing the Master/Slave and High Replication Datastores.

The Datastore holds data objects known as entities. An entity has one or more properties, named values of one of several supported data types: for instance, a property can be a string, an integer, or a reference to another entity. Each entity is identified by its kind, which categorizes the entity for the purpose of queries, and a key that uniquely identifies it within its kind.

The Datastore can execute multiple operations in a single transaction. By definition, a transaction cannot succeed unless every one of its operations succeeds; if any of the operations fails, the transaction is automatically rolled back. This is especially useful for distributed web applications, where multiple users may be accessing or manipulating the same data at the same time.

Contents

  1. Comparison with Traditional Databases
  2. Java Datastore API
  3. Entities
    1. Kinds, Keys, and Identifiers
    2. Ancestor Paths
  4. Queries and Indexes
  5. Transactions
    1. Transactions and Entity Groups
    2. Cross-Group Transactions
  6. Datastore Writes and Data Visibility
  7. Datastore Statistics
  8. Quotas and Limits

Comparison with Traditional Databases

Unlike traditional relational databases, the App Engine Datastore uses a distributed architecture to automatically manage scaling to very large data sets. While the Datastore interface has many of the same features as traditional databases, it differs from them in the way it describes relationships between data objects. Entities of the same kind can have different properties, and different entities can have properties with the same name but different value types.

These unique characteristics imply a different way of designing and managing data to take advantage of the ability to scale automatically. In particular, the App Engine Datastore differs from a traditional relational database in the following important ways:

  • The App Engine Datastore is designed to scale, allowing applications to maintain high performance as they receive more traffic:
    • Datastore writes scale by automatically distributing data as necessary.
    • Datastore reads scale because the only queries supported are those whose performance scales with the size of the result set (as opposed to the data set). This means that a query whose result set contains 100 entities performs the same whether it searches over a hundred entities or a million. This property is the key reason some types of query are not supported.
  • Because all queries on App Engine are served by pre-built indexes, the types of query that can be executed are more restrictive than those allowed on a relational database with SQL. In particular, the following are not supported:
    • Join operations
    • Inequality filtering on multiple properties
    • Filtering of data based on results of a subquery
  • Unlike traditional relational databases, the App Engine Datastore doesn’t require entities of the same kind to have a consistent property set (although you can choose to enforce such a requirement in your own application code).

For more in-depth information about the design of the Datastore, read our series of articles on Mastering the Datastore.

Java Datastore API

The App Engine Java SDK provides a low-level Datastore API with simple operations on entities, including getputdelete, and query. You can just use this low-level API directly in your applications, or use it as a base on which to implement other interface adapters. The SDK also includes implementations of the Java Data Objects (JDO) and Java Persistence API (JPA) interfaces for modeling and persisting data. These standard interfaces include mechanisms for defining classes for data objects and for performing queries.

In addition to the standard frameworks and low-level Datastore API, the Java SDK supports other frameworks designed to simplify Datastore usage for Java developers. Many Java developers use these frameworks; the Google App Engine team highly recommends them and encourages you to investigate them.

  • Objectify is a very simple and convenient interface to the App Engine Datastore that helps you avoid some of the complexities presented by JDO/JPA and the low-level Datastore.
  • Twig is a configurable object persistence interface that improves support for inheritance, polymorphism, and generic types. Like Objectify, Twig also helps you avoid complexities posed by JDO and the low-level Datastore.
  • Slim3 is a full-stack model-view-controller framework that you can use for a wide variety of App Engine functions, including (but not limited to) the Datastore.

Entities

Objects in the App Engine Datastore are known as entities. An entity has one or more named properties, each of which can have one or more values. Property values can belong to a variety of data types, including integers, floating-point numbers, strings, dates, and binary data, among others. A query on a property with multiple values tests whether any of the values meets the query criteria. This makes such properties useful for membership testing.

Note: Datastore entities are schemaless: unlike traditional relational databases, the App Engine Datastore does not require that all entities of a given kind have the same properties or that all of an entity’s values for a given property be of the same data type. If a formal schema is needed, the application itself is responsible for ensuring that entities conform to it.

Kinds, Keys, and Identifiers

Each Datastore entity is of a particular kind, which categorizes the entity for the purpose of queries: for instance, a human resources application might represent each employee at a company with an entity of kind Employee. In addition, each entity has its own key, which uniquely identifies it. The key consists of the following components:

  • The entity’s kind
  • An identifier, which can be either
    • key name string
    • an integer numeric ID
  • An optional ancestor path locating the entity within the Datastore hierarchy

The identifier is assigned when the entity is created. Because it is part of the entity’s key, it is associated permanently with the entity and cannot be changed. It can be assigned in either of two ways:

  • Your application can specify its own key name string for the entity.
  • You can have the Datastore automatically assign the entity an integer numeric ID.

Note: Instead of using key name strings or generating numeric IDs automatically, advanced applications may sometimes wish to assign their own numeric IDs manually to the entities they create. Be aware, however, that if you choose this option you must take special steps to prevent your manually assigned numeric IDs from conflicting with those assigned automatically by the Datastore; see the Entities, Properties, and Keys page for further details.

Ancestor Paths

Entities in the Datastore form a hierarchically structured space similar to the directory structure of a file system. When you create an entity, you can optionally designate another entity as its parent; the new entity is a child of the parent entity. An entity without a parent is a root entity. The association between an entity and its parent is permanent, and cannot be changed once the entity is created. The Datastore will never assign the same numeric ID to two entities with the same parent, or to two root entities (those without a parent).

An entity’s parent, parent’s parent, and so on recursively, are its ancestors; its children, children’s children, and so on, are its descendants. An entity and its descendants are said to belong to the same entity group. The sequence of entities beginning with a root entity and proceeding from parent to child, leading to a given entity, constitute that entity’s ancestor path. The complete key identifying the entity consists of a sequence of kind-identifier pairs specifying its ancestor path and terminating with those of the entity itself:

Person:GreatGrandpa / Person:Grandpa / Person:Dad / Person:Me

For a root entity, the ancestor path is empty and the key consists solely of the entity’s own kind and identifier:

Person:GreatGrandpa

Queries and Indexes

In addition to retrieving entities from the Datastore directly by their keys, an application can perform a query to retrieve them by the values of their properties. The query operates on entities of a given kind; it can specify filters on the entities’ property values, keys, and ancestors, and can return zero or more entities as results. A query can also specify sort orders to sequence the results by their property values. The results include all entities that have at least one (possibly null) value for every property named in the filters and sort orders, and whose property values meet all the specified filter criteria. The query can return entire entities, projected entities, or just entity keys.

A typical query includes the following:

  • An entity kind to which the query applies
  • Zero or more filters based on the entities’ property values, keys, and ancestors
  • Zero or more sort orders to sequence the results

When executed, the query retrieves all entities of the given kind that satisfy all of the given filters, sorted in the specified order.

Note: To conserve memory and improve performance, a query should, whenever possible, specify a limit on the number of results returned.

A query can also include an ancestor filter limiting the results to just the entity group descended from a specified ancestor. Such a query is known as an ancestor query. By default, ancestor queries return strongly consistent results, which are guaranteed to be up to date with the latest changes to the data. Non-ancestor queries, by contrast, can span the entire Datastore rather than just a single entity group, but are only eventually consistent and may return stale results. If strong consistency is important to your application, you may need to take this into account when structuring your data, placing related entities in the same entity group so they can be retrieved with an ancestor rather than a non-ancestor query; see Structuring Data for Strong Consistency for more information.

Note: Consistency considerations are a bit different with the Master/Slave Datastore; see Using the Master/Slave Datastore for details.

Every Datastore query computes its results using one or more indexes, tables containing entities in a sequence specified by the index’s properties and, optionally, the entity’s ancestors. The indexes are updated incrementally to reflect any changes the application makes to its entities, so that the correct results of all queries are immediately available with no further computation needed.

App Engine predefines a simple index on each property of an entity. An App Engine application can define further custom indexes in an index configuration file named datastore-indexes.xml. The development web server automatically adds suggestions to this file as it encounters queries that cannot be executed with the existing indexes. You can tune indexes manually by editing the file before uploading the application.

Note: This index-based query mechanism supports a wide range of queries and is suitable for most applications. However, it does not support some kinds of query common in other database technologies: in particular, joins and aggregate queries aren’t supported within the Datastore query engine. See the Datastore Queries page for limitations on App Engine Datastore queries.

Transactions

Every attempt to insert, update, or delete an entity takes place in the context of a transaction. A single transaction can include any number of such operations. To maintain the consistency of the data, the transaction ensures that all of the operations it contains are applied to the Datastore as a unit or, if any of the operations fails, that none of them are applied.

You can perform multiple actions on an entity within a single transaction. For example, to increment a counter field in an object, you need to read the value of the counter, calculate the new value, and then store it back. Without a transaction, it is possible for another process to increment the counter between the time you read the value and the time you update it, causing your application to overwrite the updated value. Doing the read, calculation, and write in a single transaction ensures that no other process can interfere with the increment.

Transactions and Entity Groups

Only ancestor queries are allowed within a transaction: that is, each query must be limited to a single entity group. The transaction itself can apply to multiple entities, which can belong either to a single entity group or (in the case of a cross-group transaction) to as many as five different entity groups.

The Datastore uses optimistic concurrency to manage transactions. When two or more application instances try to change the same entity group at the same time (either updating existing entities or creating new ones), the first application to commit its changes will succeed and all others will fail on commit. These other applications can then try their transactions again to apply them to the updated data. Note that this limits the number of concurrent writes you can do to any entity in a given entity group.

Cross-Group Transactions

A transaction on entities belonging to different entity groups is called a cross-group (XG) transaction. The transaction can be applied across a maximum of five entity groups, and will succeed as long as no concurrent transaction touches any of the entity groups to which it applies. This gives you more flexibility in organizing your data, because you aren’t forced to put disparate pieces of data under the same ancestor just to perform atomic writes on them.

As in a single-group transaction, you cannot perform a non-ancestor query in an XG transaction. You can, however, perform ancestor queries on separate entity groups. Nontransactional (non-ancestor) queries may see all, some, or none of the results of a previously committed transaction. (For background on this issue, see Datastore Writes and Data Visibility.) However, such nontransactional queries are more likely to see the results of a partially committed XG transaction than those of a partially commited single-group transaction.

Note: The first read of an entity group in an XG transaction may throw a ConcurrentModificationException if there is a conflict with other transactions accessing that same entity group. This means that even an XG transaction that performs only reads can fail with a concurrency exception.

An XG transaction that touches only a single entity group has exactly the same performance and cost as a single-group, non-XG transaction. In an XG transaction that touches multiple entity groups, operations cost the same as if they were performed in a non-XG transaction, but may experience higher latency.

Datastore Writes and Data Visibility

This section describes the behavior of App Engine with the High Replication Datastore (HRD). The deprecated Master/Slave Datastore behaves differently in a few ways; see the Using the Master/Slave Datastore page for details. For a fuller discussion of this topic, see the articles Life of a Datastore Write and Transaction Isolation in App Engine.

Data is written to the Datastore in two phases:

  1. In the Commit phase, the entity data is recorded in a log.
  2. The Apply phase consists of two actions performed in parallel:
    • The entity data is written.
    • The index rows for the entity are written. (Note that this can take longer than writing the data itself.)

The write operation returns immediately after the Commit phase and the Apply phase then takes place asynchronously. If a failure occurs during the Commit phase, there are automatic retries; but if failures continue, the Datastore returns an error message that your application receives as an exception. If the Commit phase succeeds but the Apply fails, the Apply is rolled forward to completion when one of the following occurs:

  • Periodic Datastore sweeps check for uncompleted Commit jobs and apply them.
  • Certain application operations (getputdelete, and ancestor queries) that use the affected entity group cause any changes that have been committed but not yet applied to be completed before proceeding with the new operation.

This write behavior can have several implications for how and when data is visible to your application at different parts of the Commit and Apply phases:

  • If a write operation reports a timeout error, it cannot be determined (without attempting to read the data) whether the operation succeeded or failed.
  • Because Datastore gets and ancestor queries apply any outstanding modifications before executing, these operations always see a consistent view of all previous successful transactions. This means that a get operation (looking up an updated entity by its key) is guaranteed to see the latest version of that entity.
  • As long as a few hundred milliseconds may elapse from the time a write operation returns until the transaction is completely applied. In this case, queries spanning more than one entity group cannot determine whether there are any outstanding modifications before executing and may return stale results.
  • The timing of concurrent query requests may affect their results. If an entity initially satisfies a query but is later changed so that it no longer does, the entity may still be included in the query’s result set; it will be omitted only if the query executes after the Apply phase of the update has been completed (that is, after the indexes have been written).

Datastore Statistics

The Datastore maintains statistics about the data stored for an application, such as how many entities there are of a given kind or how much space is used by property values of a given type. You can view these statistics in the Administration Console under DatastoreStatistics. You can also use the Datastore API to access these values programmatically from within the application by querying for specially named entities; see Datastore Statistics in Java for more information.

Quotas and Limits

Various aspects of your application’s Datastore usage are counted toward your App Engine resource quotas:

  • Each call to the Datastore API counts toward the Datastore API Calls quota. (Note that some library calls result in multiple calls to the API, and so use more of your quota.)
  • Data sent to the Datastore by the application counts toward the Data Sent to Datastore API quota.
  • Data received by the application from the Datastore counts toward the Data Received from Datastore API quota.
  • The total amount of data currently stored in the Datastore for the application cannot exceed the Stored Data (billable) quota. This includes all entity properties and keys, as well as the indexes needed to support querying those entities. See the article How Entities and Indexes Are Stored for a complete breakdown of the metadata required to store entities and indexes at the Bigtable level.

For information on systemwide safety limits, see the Quotas and Limits page and the Quota Details section of the Administration Console. In addition to such systemwide limits, the following limits apply specifically to the use of the Datastore:

Limit Amount
Maximum entity size 1 megabyte
Maximum transaction size 10 megabytes
Maximum number of index entries for an entity 20000
Maximum number of bytes in composite indexes for an entity 2 megabytes