Documents reference

📝 NOTE: This content was pulled from PR 308

Documents represent a single row or record of data in Astra DB Serverless.

Use the Collection class to work with documents.

If you haven’t already, consult the Collections reference topic for details on how to get a Collection object.

Working with dates

  • Python

  • TypeScript

  • Java

  • cURL

Date and datetime objects, which are instances of the Python standard library datetime.datetime and datetime.date classes, can be used anywhere in documents.

Read operations from a collection always return the datetime class regardless of whether a date or a datetime was provided in the insertion.

import datetime

from astrapy import DataAPIClient
from astrapy.ids import ObjectId, uuid8, UUID
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")
collection = database.my_collection

collection.insert_one({"when": datetime.datetime.now()})
collection.insert_one({"date_of_birth": datetime.date(2000, 1, 1)})

collection.update_one(
    {"registered_at": datetime.date(1999, 11, 14)},
    {"$set": {"message": "happy Sunday!"}},
)

print(
    collection.find_one(
        {"date_of_birth": {"$lt": datetime.date(2001, 1, 1)}},
        projection={"_id": False},
    )
)
# will print:
#    {'date_of_birth': datetime.datetime(2000, 1, 1, 0, 0)}

As shown in the example, read operations from a collection always return the datetime class regardless of whether a date or a datetime was provided in the insertion.

Hello TypeScript world. TBD.
Hello Java world. TBD.
Hello cURL world. TBD.

Working with document IDs

Documents in a collection are always identified by an ID that is unique within the collection. The ID can be any of several types, such as a string, integer, or datetime. However, it’s recommended to instead prefer the uuid or the ObjectId types.

The Data API supports uuid identifiers up to version 8, as well as ObjectId identifiers as provided by the bson library. These can appear anywhere in the document, not only in its _id field. Moreover, different types of identifier can appear in different parts of the same document. And these identifiers can be part of filtering clauses and update/replace directives just like any other data type.

One of the optional settings of a collection is the "default ID type": that is, it is possible to specify what kind of identifiers the server should supply for documents without an explicit _id field. (For details, see the create_collection method and Data API createCollection command in the Collections reference.) Regardless of the defaultId setting, however, identifiers of any type can be explicitly provided for documents. For example, during insertions, and will be honored by the insertion process.

  • Python

  • TypeScript

  • Java

  • cURL

from astrapy.ids import (
    ObjectId,
    uuid1,
    uuid3,
    uuid4,
    uuid5,
    uuid6,
    uuid7,
    uuid8,
    UUID,
)

AstraPy recognizes uuid versions 1 through 8 (with the exception of 2) as provided by the uuid and uuid6 Python libraries, as well as the ObjectId from the bson package. Furthermore, out of convenience, these same utilities are exposed as shown in the example above.

You can then generate new identifiers with statements such as new_id = uuid8() or new_obj_id = ObjectId(). Keep in mind that all uuid versions are instances of the same class (UUID), which exposes a version property, should you need to access it.

Here is a short example of the concepts:

from astrapy import DataAPIClient
from astrapy.ids import ObjectId, uuid8, UUID
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")
collection = database.my_collection

collection.insert_one({"_id": uuid8(), "tag": "new_id_v_8"})
collection.insert_one(
    {"_id": UUID("018e77bc-648d-8795-a0e2-1cad0fdd53f5"), "tag": "id_v_8"}
)
collection.insert_one({"id": ObjectId(), "tag": "new_obj_id"})
collection.insert_one(
    {"id": ObjectId("6601fb0f83ffc5f51ba22b88"), "tag": "obj_id"}
)
collection.find_one_and_update(
    {"_id": ObjectId("6601fb0f83ffc5f51ba22b88")},
    {"$set": {"item_inventory_id": UUID("1eeeaf80-e333-6613-b42f-f739b95106e6")}},
)
Hello TypeScript world. TBD.
Hello Java world. TBD.
Hello cURL world. TBD.

Insert a single document

Insert a single document into a collection.

  • Python

  • TypeScript

  • Java

  • cURL

insert_result = collection.insert_one({"name": "Jane Doe"})

Insert a document with an associated vector.

insert_result = collection.insert_one(
    {"name": "Jane Doe"},
    vector=[.08, .68, .30],
)

Parameters:

Name Type Description

document

Dict

The dictionary expressing the document to insert. The _id field of the document can be left out, in which case it will be created automatically.

vector

Dict[str, Any]

A vector (a list of numbers appropriate for the collection) for the document. Passing this parameter is equivalent to providing the vector in the "$vector" field of the document itself, however the two are mutually exclusive.

max_time_ms

Dict[str, Any]

A timeout, in milliseconds, for the underlying HTTP request.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")
collection = database.my_collection

# Insert a document with a specific ID
response1 = collection.insert_one(
    {
        "_id": 101,
        "name": "John Doe",
    },
    vector=[.12, .52, .32],
)

# Insert a document without specifying an ID
# so that _id is generated automatically
response2 = collection.insert_one(
    {
        "name": "Jane Doe",
    },
    vector=[.08, .68, .30],
)
Collection<Schema>.insertOne(
  document: MaybeId<Schema>,
  options?: InsertOneOptions,
): Promise<InsertOneResult<Schema>>

Parameters:

Name Type Description

document

MaybeId<Schema>

The document to insert. If the document does not have an _id field, the server generates one.

options?

InsertOneOptions

The options for this operation.

Options (InsertOneOptions):

Name Type Description

vector?

number[]

The vector for the document. Equivalent to providing the vector in the $vector field of the document itself; however, the two are mutually exclusive.

maxTimeMS?

number

The maximum time in milliseconds that the client should wait for the operation to complete.

Returns:

Promise<InsertOneResult<Schema>> - A promise that resolves to the inserted ID.

Example:

import { DataApiClient } from '@datastax/astra-db-ts';

// Reference an untyped collection
const db = new DataApiClient('TOKEN').db('API_ENDPOINT');
const collection = db.collection('my_collection');

(async () => {
  // Insert a document with a specific ID
  await collection.insertOne({ _id: '1', name: 'John Doe' });

  // Insert a document with an autogenerated ID
  await collection.insertOne({ name: 'Jane Doe' });

  // Insert a document with a vector
  await collection.insertOne({ name: 'Jane Doe' }, { vector: [.12, .52, .32] });
  await collection.insertOne({ name: 'Jane Doe', $vector: [.12, .52, .32] });
})();

TBD

cURL -s --location \
--request POST ${ASTRA_DB_API_ENDPOINT}/api/json/v1/${ASTRA_DB_KEYSPACE}/${ASTRA_DB_COLLECTION} \
--header "Token: ${ASTRA_DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
  "insertOne": {
    "document": {
      "_id": "1",
      "purchase_type": "Online",
      "$vector": [0.25, 0.25, 0.25, 0.25, 0.25],
      "customer": {
        "name": "Jim A.",
        "phone": "123-456-1111",
        "age": 51,
        "credit_score": 782,
        "address": {
          "address_line": "1234 Broadway",
          "city": "New York",
          "state": "NY"
        }
      },
      "purchase_date": {"$date": 1690045891},
      "seller": {
        "name": "Jon B.",
        "location": "Manhattan NYC"
      },
      "items": [
        {
          "car" : "BMW 330i Sedan",
          "color": "Silver"
        },
        "Extended warranty - 5 years"
      ],
      "amount": 47601,
      "status" : "active",
      "preferred_customer" : true
    }
  }
}' | json_pp

Properties:

Name Type Description

insertOne

command

Data API designation that a single document is inserted.

api-reference:partial$insert-command-payload.adoc

Response
{
    "status": {
        "insertedIds": [
            "1"
        ]
    }
}

Insert many documents

Insert multiple documents into a collection.

  • Python

  • TypeScript

  • Java

  • cURL

response = collection.insert_many(
    [
        {
            "_id": 101,
            "name": "John Doe",
        },
        {
            # ID is generated automatically
            "name": "Jane Doe",
        },
    ],
    vectors=[
        [.12, .52, .32],
        [.08, .68, .30],
    ],
    ordered=True,
)

Returns:

InsertManyResult - An object representing the response from the database after the insert operation. It includes information about the success of the operation and details of the inserted documents.

Example response
InsertManyResult(raw_results=[{'status': {'insertedIds': [101, '81077d86-05dc-43ca-877d-8605dce3ca4d']}}], inserted_ids=[101, '81077d86-05dc-43ca-877d-8605dce3ca4d'])

Parameters:

Name Type Description

documents

Iterable[Dict[str, Any]],

An iterable of dictionaries, each a document to insert. Documents may specify their _id field or leave it out, in which case it will be added automatically.

vectors

Optional[Iterable[Optional[Iterable[float]]]]

An optional list of vectors (as many vectors as the provided documents) to associate to the documents when inserting. Each vector is added to the corresponding document prior to insertion on database. The list can be a mixture of None and vectors, in which case some documents will not have a vector, unless it is specified in their "$vector" field already. Passing vectors this way is indeed equivalent to the "$vector" field of the documents, however the two are mutually exclusive.

ordered

bool

If True (default), the insertions are processed sequentially. If False, they can occur in arbitrary order and possibly concurrently.

chunk_size

Optional[int]

How many documents to include in a single API request. Exceeding the server maximum allowed value results in an error. Leave it unspecified (recommended) to use the system default.

concurrency

Optional[int]

Maximum number of concurrent requests to the API at a given time. It cannot be more than one for ordered insertions.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the operation.

Unless there are specific reasons not to, it is recommended to prefer ordered = False as it will result in a much higher insert throughput than an equivalent ordered insertion.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")
collection = database.my_collection

collection.insert_many([{"a": 10}, {"a": 5}, {"b": [True, False, False]}])

collection.insert_many(
    [{"seq": i} for i in range(50)],
    ordered=False,
    concurrency=5,
)

# The following are three equivalent statements:
collection.insert_many(
    [{"tag": "a"}, {"tag": "b"}],
    vectors=[[1, 2], [3, 4]],
)

collection.insert_many(
    [{"tag": "a", "$vector": [1, 2]}, {"tag": "b"}],
    vectors=[None, [3, 4]],
)

collection.insert_many(
    [
        {"tag": "a", "$vector": [1, 2]},
        {"tag": "b", "$vector": [3, 4]},
    ]
)
Collection<Schema>.insertMany(
  documents: MaybeId<Schema>[],
  options?: InsertManyOptions,
): Promise<InsertManyResult<Schema>>

Parameters:

Name Type Description

documents

MaybeId<Schema>[]

The documents to insert. If any document does not have an _id field, the server generates one.

options?

InsertManyOptions

The options for this operation.

Options (InsertManyOptions):

Name Type Description

ordered?

boolean

You may set the ordered option to true to stop the operation after the first error; otherwise all documents may be parallelized and processed in arbitrary order, improving, perhaps vastly, performance.

concurrency?

number

You can set the concurrency option to control how many network requests are made in parallel on unordered insertions. Defaults to 8. Not available for ordered insertions.

chunkSize?

number

Control how many documents are sent each network request. Defaults to 20.

vectors?

(number[] | null | undefined)[]

An array of vectors to associate with each document. If a vector is null or undefined, the document will not have a vector. Must equal the number of documents if provided. Equivalent to providing the vector in the $vector field of the documents themselves; however, the two are mutually exclusive.

Unless there are specific reasons not to, it is recommended to prefer to leave ordered false as it will result in a much higher insert throughput than an equivalent ordered insertion.

Returns:

Promise<InsertManyResult<Schema>> - A promise that resolves to the inserted IDs.

Example:

import { DataApiClient, InsertManyError } from '@datastax/astra-db-ts';

// Reference an untyped collection
const db = new DataApiClient('TOKEN').db('API_ENDPOINT');
const collection = db.collection('my_collection');

(async () => {
  try {
    // Insert many documents
    await collection.insertMany([
      { _id: '1', name: 'John Doe' },
      { name: 'Jane Doe' }, // Will autogen ID
    ], { ordered: true });

    // Insert many with vectors
    await collection.insertMany([
      { name: 'John Doe', $vector: [.12, .52, .32] },
      { name: 'Jane Doe' },
      { name: 'Jane Doe', $vector: [.32, .52, .12] },
    ]);

    await collection.insertMany([
      { name: 'John Doe' },
      { name: 'Jane Doe' },
      { name: 'Dane Joe' },
    ], {
      vectors: [
        [.12, .52, .32],
        null,
        [.32, .52, .12],
      ],
      ordered: true,
    });
  } catch (e) {
    if (e instanceof InsertManyError) {
      console.log(e.insertedIds);
    }
  }
})();

TBD

The following Data API insertMany command adds 20 documents to a collection.

cURL -s --location \
--request POST ${ASTRA_DB_API_ENDPOINT}/api/json/v1/${ASTRA_DB_KEYSPACE}/${ASTRA_DB_COLLECTION} \
--header "Token: ${ASTRA_DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
  "insertMany": {
    "documents": [
      {
        "_id": "2",
        "purchase_type": "Online",
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
        "customer": {
          "name": "Jack B.",
          "phone": "123-456-2222",
        "age": 34,
        "credit_score": 700,
          "address": {
            "address_line": "888 Broadway",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1690391491},
        "seller": {
          "name": "Tammy S.",
          "location": "Staten Island NYC"
        },
        "items": [
            {
          "car" : "Tesla Model 3",
          "color": "White"
            },
            "Extended warranty - 10 years",
            "Service - 5 years"
        ],
        "amount": 53990,
      "status" : "active"
      },
      {
        "_id": "3",
        "purchase_type": "Online",
        "$vector": [0.15, 0.1, 0.1, 0.35, 0.55],
        "customer": {
          "name": "Jill D.",
          "phone": "123-456-3333",
        "age": 30,
        "credit_score": 742,
          "address": {
            "address_line": "12345 Broadway",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1690564291},
        "seller": {
          "name": "Jasmine S.",
          "location": "Brooklyn NYC"
        },
        "items": "Extended warranty - 10 years",
        "amount": 4600,
      "status" : "active"
      },
      {
        "_id": "4",
        "purchase_type": "In Person",
        "$vector": [0.25, 0.25, 0.25, 0.25, 0.26],
        "customer": {
          "name": "Lester M.",
          "phone": "123-456-4444",
        "age": 40,
        "credit_score": 802,
          "address": {
            "address_line": "12346 Broadway",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1690909891},
        "seller": {
          "name": "Jon B.",
          "location": "Manhattan NYC"
        },
        "items": [
            {
          "car" : "BMW 330i Sedan",
          "color": "Red"
            },
            "Extended warranty - 5 years",
            "Service - 5 years"
        ],
        "amount": 48510,
      "status" : "active"
      },
      {
        "_id": "5",
        "purchase_type": "Online",
        "$vector": [0.25, 0.045, 0.38, 0.31, 0.67],
        "customer": {
          "name": "David C.",
          "phone": "123-456-5555",
        "age": 50,
        "credit_score": 800,
          "address": {
            "address_line": "32345 Main Ave",
            "city": "Jersey City",
            "state": "NJ"
          }
        },
        "purchase_date": {"$date": 1690996291},
        "seller": {
          "name": "Jim A.",
          "location": "Jersey City NJ"
        },
        "items": [
          {
          "car" : "Tesla Model S",
          "color": "Red"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 94990,
      "status" : "active"
      },
      {
        "_id": "6",
        "purchase_type": "In Person",
        "$vector": [0.11, 0.02, 0.78, 0.10, 0.27],
        "customer": {
          "name": "Chris E.",
          "phone": "123-456-6666",
        "age": 43,
        "credit_score": 764,
          "address": {
            "address_line": "32346 Broadway",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1691860291},
        "seller": {
          "name": "Jim A.",
          "location": "Jersey City NJ"
        },
        "items": [
          {
          "car" : "Tesla Model X",
          "color": "Blue"
            }
        ],
        "amount": 109990,
      "status" : "active"
      },
      {
        "_id": "7",
        "purchase_type": "Online",
        "$vector": [0.21, 0.22, 0.33, 0.44, 0.53],
        "customer": {
          "name": "Jeff G.",
          "phone": "123-456-7777",
        "age": 66,
        "credit_score": 802,
          "address": {
            "address_line": "22999 Broadway",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1692119491},
        "seller": {
          "name": "Jasmine S.",
          "location": "Brooklyn NYC"
        },
        "items": [{
          "car" : "BMW M440i Gran Coupe",
          "color": "Black"
            },
            "Extended warranty - 5 years"],
        "amount": 61050,
      "status" : "active"
      },
      {
        "_id": "8",
        "purchase_type": "In Person",
        "$vector": [0.3, 0.23, 0.15, 0.17, 0.4],
        "customer": {
          "name": "Harold S.",
          "phone": "123-456-8888",
        "age": 29,
        "credit_score": 710,
          "address": {
            "address_line": "1234 Main St",
            "city": "Orange",
            "state": "NJ"
          }
        },
        "purchase_date": {"$date": 1693329091},
        "seller": {
          "name": "Tammy S.",
          "location": "Staten Island NYC"
        },
        "items": [{
          "car" : "BMW X3 SUV",
          "color": "Black"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 46900,
      "status" : "active"
      },
      {
        "_id": "9",
        "purchase_type": "Online",
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.06],
        "customer": {
          "name": "Richard Z.",
          "phone": "123-456-9999",
        "age": 22,
        "credit_score": 690,
          "address": {
            "address_line": "22345 Broadway",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1693588291},
        "seller": {
          "name": "Jasmine S.",
          "location": "Brooklyn NYC"
        },
        "items": [{
          "car" : "Tesla Model 3",
          "color": "White"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 53990,
      "status" : "active"
      },
      {
        "_id": "10",
        "purchase_type": "In Person",
        "$vector": [0.25, 0.045, 0.38, 0.31, 0.68],
        "customer": {
          "name": "Eric B.",
          "phone": null,
        "age": 54,
        "credit_score": 780,
          "address": {
            "address_line": "9999 River Rd",
            "city": "Fair Haven",
            "state": "NJ"
          }
        },
        "purchase_date": {"$date": 1694797891},
        "seller": {
          "name": "Jim A.",
          "location": "Jersey City NJ"
        },
        "items": [{
          "car" : "Tesla Model S",
          "color": "Black"
            }
        ],
        "amount": 93800,
      "status" : "active"
      },
      {
        "_id": "11",
        "purchase_type": "Online",
        "$vector": [0.44, 0.11, 0.33, 0.22, 0.88],
        "customer": {
          "name": "Ann J.",
          "phone": "123-456-1112",
        "age": 47,
        "credit_score": 660,
          "address": {
            "address_line": "99 Elm St",
            "city": "Fair Lawn",
            "state": "NJ"
          }
        },
        "purchase_date": {"$date": 1695921091},
        "seller": {
          "name": "Jim A.",
          "location": "Jersey City NJ"
        },
        "items": [{
          "car" : "Tesla Model Y",
          "color": "White"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 57500,
      "status" : "active"
      },
      {
        "_id": "12",
        "purchase_type": "In Person",
        "$vector": [0.33, 0.44, 0.55, 0.77, 0.66],
        "customer": {
          "name": "John T.",
          "phone": "123-456-1123",
        "age": 55,
        "credit_score": 786,
          "address": {
            "address_line": "23 Main Blvd",
            "city": "Staten Island",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1696180291},
        "seller": {
          "name": "Tammy S.",
          "location": "Staten Island NYC"
        },
        "items": [{
          "car" : "BMW 540i xDrive Sedan",
          "color": "Black"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 64900,
      "status" : "active"
      },
      {
        "_id": "13",
        "purchase_type": "Online",
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.07],
        "customer": {
          "name": "Aaron W.",
          "phone": "123-456-1133",
        "age": 60,
        "credit_score": 702,
          "address": {
            "address_line": "1234 4th Ave",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1697389891},
        "seller": {
          "name": "Jon B.",
          "location": "Manhattan NYC"
        },
        "items": [{
          "car" : "Tesla Model 3",
          "color": "White"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 55000,
      "status" : "active"
      },
      {
        "_id": "14",
        "purchase_type": "In Person",
        "$vector": [0.11, 0.02, 0.78, 0.21, 0.27],
        "customer": {
          "name": "Kris S.",
          "phone": "123-456-1144",
        "age": 44,
        "credit_score": 702,
          "address": {
            "address_line": "1414 14th Pl",
            "city": "Brooklyn",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1698513091},
        "seller": {
          "name": "Jasmine S.",
          "location": "Brooklyn NYC"
        },
        "items": [{
          "car" : "Tesla Model X",
          "color": "White"
            }
        ],
        "amount": 110400,
      "status" : "active"
      },
      {
        "_id": "15",
        "purchase_type": "Online",
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.08],
        "customer": {
          "name": "Maddy O.",
          "phone": "123-456-1155",
        "age": 41,
        "credit_score": 782,
          "address": {
            "address_line": "1234 Maple Ave",
            "city": "West New York",
            "state": "NJ"
          }
        },
        "purchase_date": {"$date": 1701191491},
        "seller": {
          "name": "Jim A.",
          "location": "Jersey City NJ"
        },
        "items": {
          "car" : "Tesla Model 3",
          "color": "White"
            },
        "amount": 52990,
      "status" : "active"
      },
      {
        "_id": "16",
        "purchase_type": "In Person",
        "$vector": [0.44, 0.11, 0.33, 0.22, 0.88],
        "customer": {
          "name": "Tim C.",
          "phone": "123-456-1166",
        "age": 38,
        "credit_score": 700,
          "address": {
            "address_line": "1234 Main St",
            "city": "Staten Island",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1701450691},
        "seller": {
          "name": "Tammy S.",
          "location": "Staten Island NYC"
        },
        "items": [{
          "car" : "Tesla Model Y",
          "color": "White"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 58990,
      "status" : "active"
      },
      {
        "_id": "17",
        "purchase_type": "Online",
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.09],
        "customer": {
          "name": "Yolanda Z.",
          "phone": "123-456-1177",
        "age": 61,
        "credit_score": 694,
          "address": {
            "address_line": "1234 Main St",
            "city": "Hoboken",
            "state": "NJ"
          }
        },
        "purchase_date": {"$date": 1702660291},
        "seller": {
          "name": "Jim A.",
          "location": "Jersey City NJ"
        },
        "items": [{
          "car" : "Tesla Model 3",
          "color": "Blue"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 54900,
      "status" : "active"
      },
      {
        "_id": "18",
        "purchase_type": "Online",
        "$vector": [0.15, 0.17, 0.15, 0.43, 0.55],
        "customer": {
          "name": "Thomas D.",
          "phone": "123-456-1188",
        "age": 45,
        "credit_score": 724,
          "address": {
            "address_line": "98980 20th St",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1703092291},
        "seller": {
          "name": "Jon B.",
          "location": "Manhattan NYC"
        },
        "items": [{
          "car" : "BMW 750e xDrive Sedan",
          "color": "Black"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 106900,
      "status" : "active"
      },
      {
        "_id": "19",
        "purchase_type": "Online",
        "$vector": [0.25, 0.25, 0.25, 0.25, 0.27],
        "customer": {
          "name": "Vivian W.",
          "phone": "123-456-1199",
        "age": 20,
        "credit_score": 698,
          "address": {
            "address_line": "5678 Elm St",
            "city": "Hartford",
            "state": "CT"
          }
        },
        "purchase_date": {"$date": 1704215491},
        "seller": {
          "name": "Jasmine S.",
          "location": "Brooklyn NYC"
        },
        "items": [{
          "car" : "BMW 330i Sedan",
          "color": "White"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 46980,
      "status" : "active"
      },
      {
        "_id": "20",
        "purchase_type": "In Person",
        "$vector": [0.44, 0.11, 0.33, 0.22, 0.88],
        "customer": {
          "name": "Leslie E.",
          "phone": null,
        "age": 44,
        "credit_score": 782,
          "address": {
            "address_line": "1234 Main St",
            "city": "Newark",
            "state": "NJ"
          }
        },
        "purchase_date": {"$date": 1705338691},
        "seller": {
          "name": "Jim A.",
          "location": "Jersey City NJ"
        },
        "items": [{
          "car" : "Tesla Model Y",
          "color": "Black"
            },
            "Extended warranty - 5 years"
        ],
        "amount": 59800,
      "status" : "active"
      },
      {
        "_id": "21",
        "purchase_type": "In Person",
        "$vector": [0.21, 0.22, 0.33, 0.44, 0.53],
        "customer": {
          "name": "Rachel I.",
          "phone": null,
        "age": 62,
        "credit_score": 786,
          "address": {
            "address_line": "1234 Park Ave",
            "city": "New York",
            "state": "NY"
          }
        },
        "purchase_date": {"$date": 1706202691},
        "seller": {
          "name": "Jon B.",
          "location": "Manhattan NYC"
        },
        "items": [{
          "car" : "BMW M440i Gran Coupe",
          "color": "Silver"
            },
            "Extended warranty - 5 years",
            "Gap Insurance - 5 years"
        ],
        "amount": 65250,
      "status" : "active"
      }
    ],
    "options": {
        "ordered": false
    }
  }
}' | json_pp
Response
{
   "status" : {
      "insertedIds" : [
         "4",
         "7",
         "10",
         "13",
         "16",
         "19",
         "21",
         "18",
         "6",
         "12",
         "15",
         "9",
         "3",
         "11",
         "2",
         "17",
         "14",
         "8",
         "20",
         "5"
      ]
   }
}

Properties:

Name Type Description

insertMany

command

Data API designation that many documents (up to 20 at a time) are being inserted.

api-reference:partial$insert-command-payload.adoc

Find a document

Retrieve a single document from a collection using various options.

  • Python

  • TypeScript

  • Java

  • cURL

Retrieve a single document from a collection by its _id.

document = collection.find_one({"_id": 101})

Retrieve a single document from a collection by any attribute, as long as it is covered by the collection’s indexing configuration.

As noted in The Indexing option in the Collections reference topic, any field that is part of a subsequent filter or sort operation must be an indexed field. If you elected to not index certain or all fields when you created the collection, you cannot reference that field in a filter/sort query.

document = collection.find_one({"location": "warehouse_C"})

Retrieve a single document from a collection by an arbitrary filtering clause.

document = collection.find_one({"tag": {"$exists": True}})

Retrieve the most similar document to a given vector.

result = collection.find_one({}, vector=[.12, .52, .32])

Retrieve only specific fields from a document.

result = collection.find_one({"_id": 101}, projection={"name": True})

Returns:

Union[Dict[str, Any], None] - Either the found document as a dictionary or None if no matching document is found.

Example response
{'_id': 101, 'name': 'John Doe', '$vector': [0.12, 0.52, 0.32]}

Parameters:

Name Type Description

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$le": 100}}, {"$and": [{"name": "John"}, {"price": {"$le": 100}}]}. See Data API operators for the full list of operators.

projection

Optional[Union[Iterable[str], Dict[str, bool]]]

Used to select a subset of fields in the documents being returned. The projection can be: an iterable over the field names to return; a dictionary {field_name: True} to positively select certain fields; or a dictionary {field_name: False} if one wants to discard some fields from the response. The default is to return the whole documents.

vector

Optional[Iterable[float]]

A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to perform vector search. That is, Approximate Nearest Neighbors (ANN) search, extracting the most similar document in the collection matching the filter. This parameter cannot be used together with sort. See the find method for more details on this parameter.

include_similarity

Optional[bool]

A boolean to request the numeric value of the similarity to be returned as an added "$similarity" key in the returned document. Can only be used for vector ANN search, i.e. when either vector is supplied or the sort parameter has the shape {"$vector": …​}.

sort

Optional[Dict[str, Any]]

With this dictionary parameter one can control the order the documents are returned. See the Note about sorting for details.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request.

Example:

from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")
collection = database.my_collection

collection.find_one({})
# prints: {'_id': '68d1e515-...', 'seq': 37}
collection.find_one({"seq": 10})
# prints: {'_id': 'd560e217-...', 'seq': 10}
collection.find_one({"seq": 1011})
# (returns None for no matches)
collection.find_one({}, projection={"seq": False})
# prints: {'_id': '68d1e515-...'}
collection.find_one(
    {},
    sort={"seq": astrapy.constants.SortDocuments.DESCENDING},
)
# prints: {'_id': '97e85f81-...', 'seq': 69}
collection.find_one({}, vector=[1, 0])
# prints: {'_id': '...', 'tag': 'D', '$vector': [4.0, 1.0]}

TBD

TBD

This Data API findOne command retrieves a document based on a filter using a specific _id value.

curl -s --location \
--request POST ${ASTRA_DB_API_ENDPOINT}/api/json/v1/${ASTRA_DB_KEYSPACE}/${ASTRA_DB_COLLECTION} \
--header "Token: ${ASTRA_DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
  "findOne": {
    "filter": {"_id" : "14"}
  }
}' | json_pp

Result:

{
   "data" : {
      "document" : {
         "$vector" : [
            0.11,
            0.02,
            0.78,
            0.21,
            0.27
         ],
         "_id" : "14",
         "amount" : 110400,
         "customer" : {
            "address" : {
               "address_line" : "1414 14th Pl",
               "city" : "Brooklyn",
               "state" : "NY"
            },
            "age" : 44,
            "credit_score" : 702,
            "name" : "Kris S.",
            "phone" : "123-456-1144"
         },
         "items" : [
            {
               "car" : "Tesla Model X",
               "color" : "White"
            }
         ],
         "purchase_date" : {
            "$date" : 1698513091
         },
         "purchase_type" : "In Person",
         "seller" : {
            "location" : "Brooklyn NYC",
            "name" : "Jasmine S."
         },
         "status" : "active"
      }
   }
}

Find documents using filtering options

Iterate over documents in a collection matching a given filter.

  • Python

  • TypeScript

  • Java

  • cURL

doc_iterator = collection.find({"category": "house_appliance"}, limit=10)

Iterate over the documents most similar to a given query vector.

doc_iterator = collection.find({}, vector=[0.55, -0.40, 0.08], limit=5)

Returns:

Cursor - A cursor for iterating over documents. An AstraPy cursor can be used in a for loop, and provides a few additional features.

Example response
Cursor("vector_collection", new, retrieved: 0)

Parameters:

Name Type Description

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$le": 100}}, {"$and": [{"name": "John"}, {"price": {"$le": 100}}]}. See Data API operators for the full list of operators.

projection

Optional[Union[Iterable[str], Dict[str, bool]]]

Used to select a subset of fields in the documents being returned. The projection can be: an iterable over the field names to return; a dictionary {field_name: True} to positively select certain fields; or a dictionary {field_name: False} if one wants to discard some fields from the response. The default is to return the whole documents.

skip

Optional[int]

With this integer parameter, what would be the first skip documents returned by the query are discarded, and the results start from the (skip+1)-th document. This parameter can be used only in conjunction with an explicit sort criterion of the ascending/descending type (i.e. it cannot be used when not sorting, nor with vector-based ANN search).

limit

Optional[int]

This (integer) parameter sets a limit over how many documents are returned. Once limit is reached (or the cursor is exhausted for lack of matching documents), nothing more is returned.

vector

Optional[Iterable[float]]

A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to perform vector search; that is, Approximate Nearest Neighbors (ANN) search. When running similarity search on a collection, no other sorting criteria can be specified. Moreover, there is an upper bound to the number of documents that can be returned. For details, see the Data API Limits.

include_similarity

Optional[bool]

A boolean to request the numeric value of the similarity to be returned as an added "$similarity" key in each returned document. Can only be used for vector ANN search, i.e. when either vector is supplied or the sort parameter has the shape {"$vector": …​}.

sort

Optional[Dict[str, Any]]

With this dictionary parameter one can control the order the documents are returned. See the Note about sorting, as well as the one about upper bounds, for details.

max_time_ms

Optional[int]

A timeout, in milliseconds, for each single one of the underlying HTTP requests used to fetch documents as the cursor is iterated over.

TBD

TBD

TBD

Example values for sort operations

  • Python

  • TypeScript

  • Java

  • cURL

When no particular order is required:

sort={}  # (default when parameter not provided)

When sorting by a certain value in ascending/descending order:

from astrapy.constants import SortDocuments
sort={"field": SortDocuments.ASCENDING}
sort={"field": SortDocuments.DESCENDING}

When sorting first by "field" and then by "subfield" (while modern Python versions preserve the order of dictionaries, it is suggested for clarity to employ a collections.OrderedDict in these cases):

sort={
    "field": SortDocuments.ASCENDING,
    "subfield": SortDocuments.ASCENDING,
}

When running a vector similarity (ANN) search:

sort={"$vector": [0.4, 0.15, -0.5]}

Some combinations of arguments impose an implicit upper bound on the number of documents that are returned by the Data API. More specifically:

  • Vector ANN searches cannot return more than a certain number of documents; currently, 1000 per search operation.

  • When using a sort criterion of the ascending/descending type, the Data API returns a smaller number of documents, currently set to 20, and stops there. The returned documents are the top results across the whole collection according to the requested criterion.

Keep in mind these provisions even when subsequently running a command such as .distinct() on a cursor.

When not specifying sorting criteria at all (by vector or otherwise), the cursor can scroll through an arbitrary number of documents as the Data API and the client periodically exchange new chunks of documents.

The behavior of the cursor — in the case that documents have been added/removed after the find was started — depends on database internals. It it is not guaranteed, nor excluded, that such "real-time" changes in the data would be picked up by the cursor.

Example:

from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")
collection = database.my_collection

filter = {"seq": {"$exists": True}}
for doc in collection.find(filter, projection={"seq": True}, limit=5):
    print(doc["seq"])
...
# will print e.g.:
#   37
#   35
#   10
#   36
#   27
cursor1 = collection.find(
    {},
    limit=4,
    sort={"seq": astrapy.constants.SortDocuments.DESCENDING},
)
[doc["_id"] for doc in cursor1]
# prints: ['97e85f81-...', '1581efe4-...', '...', '...']
cursor2 = collection.find({}, limit=3)
cursor2.distinct("seq")
# prints: [37, 35, 10]
collection.insert_many([
    {"tag": "A", "$vector": [4, 5]},
    {"tag": "B", "$vector": [3, 4]},
    {"tag": "C", "$vector": [3, 2]},
    {"tag": "D", "$vector": [4, 1]},
    {"tag": "E", "$vector": [2, 5]},
])
ann_tags = [
    document["tag"]
    for document in collection.find(
        {},
        limit=3,
        vector=[3, 3],
    )
]
ann_tags
# prints: ['A', 'B', 'C']
# (assuming the collection has metric VectorMetric.COSINE)
Hello TypeScript world. TBD.
Hello Java world. TBD.
Hello cURL world. TBD.

Find and update a document

Locate a document matching a filter condition and apply changes to it, returning the document itself.

  • Python

  • TypeScript

  • Java

  • cURL

collection.find_one_and_update(
    {"Marco": {"$exists": True}},
    {"$set": {"title": "Mr."}},
)

Locate and update a document, returning the document itself, creating a new one if nothing is found.

collection.find_one_and_update(
    {"Marco": {"$exists": True}},
    {"$set": {"title": "Mr."}},
    upsert=True,
)

Returns:

Dict[str, Any] - The document that was found, either before or after the update (or a projection thereof, as requested). If no matches are found, None is returned.

Example response
{'_id': 999, 'Marco': 'Polo'}

Parameters:

Name Type Description

filter

Dict[str, Any]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$le": 100}}, {"$and": [{"name": "John"}, {"price": {"$le": 100}}]}. See Data API operators for the full list of operators.

update

Dict[str, Any]

The update prescription to apply to the document, expressed as a dictionary as per Data API syntax. Examples are: {"$set": {"field": "value}}, {"$inc": {"counter": 10}} and {"$unset": {"field": ""}}. See Data API operators for the full syntax.

projection

Optional[ProjectionType]

Used to select a subset of fields in the document being returned. The projection can be: an iterable over the field names to return; a dictionary {field_name: True} to positively select certain fields; or a dictionary {field_name: False} if one wants to discard some fields from the response. The default is to return the whole documents.

vector

Optional[Iterable[float]]

A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to use vector search. That is, Approximate Nearest Neighbors (ANN) search, as the sorting criterion. In this way, the matched document (if any) will be the one that is most similar to the provided vector. This parameter cannot be used together with sort. See the find method for more details on this parameter.

sort

Optional[Dict[str, Any]]

With this dictionary parameter one can control the sorting order of the documents matching the filter, effectively determining what document will come first and hence be the updated one. See the find method for more on sorting.

upsert

bool = False

This parameter controls the behavior in absence of matches. If True, a new document (resulting from applying the update to an empty document) is inserted if no matches are found on the collection. If False, the operation silently does nothing in case of no matches.

return_document

str

A flag controlling what document is returned: if set to ReturnDocument.BEFORE, or the string "before", the document found on database is returned; if set to ReturnDocument.AFTER, or the string "after", the new document is returned. The default is "before".

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request.

Example:

from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")
collection = database.my_collection

collection.insert_one({"Marco": "Polo"})

collection.find_one_and_update(
    {"Marco": {"$exists": True}},
    {"$set": {"title": "Mr."}},
)
# prints: {'_id': 'a80106f2-...', 'Marco': 'Polo'}
collection.find_one_and_update(
    {"title": "Mr."},
    {"$inc": {"rank": 3}},
    projection=["title", "rank"],
    return_document=astrapy.constants.ReturnDocument.AFTER,
)
# prints: {'_id': 'a80106f2-...', 'title': 'Mr.', 'rank': 3}
collection.find_one_and_update(
    {"name": "Johnny"},
    {"$set": {"rank": 0}},
    return_document=astrapy.constants.ReturnDocument.AFTER,
)
# (returns None for no matches)
collection.find_one_and_update(
    {"name": "Johnny"},
    {"$set": {"rank": 0}},
    upsert=True,
    return_document=astrapy.constants.ReturnDocument.AFTER,
)
# prints: {'_id': 'cb4ef2ab-...', 'name': 'Johnny', 'rank': 0}
Hello TypeScript world. TBD.
Hello Java world. TBD.
Hello DevOps cURL world. TBD.

Update a document

Update a single document on the collection as requested.

  • Python

  • TypeScript

  • Java

  • cURL

update_result = collection.update_one(
    {"_id": 456},
    {"$set": {"name": "John Smith"}},
)

Update a single document on the collection, inserting a new one if no match is found.

update_result = collection.update_one(
    {"_id": 456},
    {"$set": {"name": "John Smith"}},
    upsert=True,
)

Returns:

UpdateResult - An object representing the response from the database after the update operation. It includes information about the operation.

Example response
UpdateResult(raw_results=[{'data': {'document': {'_id': '1', 'name': 'John Doe'}}, 'status': {'matchedCount': 1, 'modifiedCount': 1}}], update_info={'n': 1, 'updatedExisting': True, 'ok': 1.0, 'nModified': 1})

Parameters:

Name Type Description

filter

Dict[str, Any]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$le": 100}}, {"$and": [{"name": "John"}, {"price": {"$le": 100}}]}. See Data API operators for the full list of operators.

update

Dict[str, Any]

The update prescription to apply to the document, expressed as a dictionary as per Data API syntax. Examples are: {"$set": {"field": "value}}, {"$inc": {"counter": 10}} and {"$unset": {"field": ""}}. See Data API operators for the full syntax.

vector

Dict[str, Any]

A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to use vector search. That is, Approximate Nearest Neighbors (ANN) search, as the sorting criterion. In this way, the matched document (if any) will be the one that is most similar to the provided vector. This parameter cannot be used together with sort. See the find method for more details on this parameter.

sort

Optional[Dict[str, Any]]

With this dictionary parameter one can control the sorting order of the documents matching the filter, effectively determining what document will come first and hence be the updated one. See the find method for more on sorting.

upsert

bool = False

This parameter controls the behavior in absence of matches. If True, a new document (resulting from applying the update to an empty document) is inserted if no matches are found on the collection. If False, the operation silently does nothing in case of no matches.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")
collection = database.my_collection

collection.insert_one({"Marco": "Polo"})

collection.update_one({"Marco": {"$exists": True}}, {"$inc": {"rank": 3}})
# prints: UpdateResult(raw_results=..., update_info={'n': 1, 'updatedExisting': True, 'ok': 1.0, 'nModified': 1})
collection.update_one({"Mirko": {"$exists": True}}, {"$inc": {"rank": 3}})
# prints: UpdateResult(raw_results=..., update_info={'n': 0, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0})
collection.update_one(
    {"Mirko": {"$exists": True}},
    {"$inc": {"rank": 3}},
    upsert=True,
)
# prints: UpdateResult(raw_results=..., update_info={'n': 1, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0, 'upserted': '2a45ff60-...'})
Hello TypeScript world. TBD.
Hello Java world. TBD.
Hello cURL world. TBD.

Update multiple documents

Update multiple documents in a collection.

  • Python

  • TypeScript

  • Java

  • cURL

results = collection.update_many(
    {"name": {"$exists": False}},
    {"$set": {"name": "unknown"}},
)

Update multiple documents in a collection, inserting a new one if no matches are found.

results = collection.update_many(
    {"name": {"$exists": False}},
    {"$set": {"name": "unknown"}},
    upsert=True,
)

Returns:

UpdateResult - An object representing the response from the database after the update operation. It includes information about the operation.

Example response
UpdateResult(raw_results=[{'status': {'matchedCount': 2, 'modifiedCount': 2}}], update_info={'n': 2, 'updatedExisting': True, 'ok': 1.0, 'nModified': 2})

Parameters:

Name Type Description

filter

Dict[str, Any]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$le": 100}}, {"$and": [{"name": "John"}, {"price": {"$le": 100}}]}. See Data API operators for the full list of operators.

update

Dict[str, Any]

The update prescription to apply to the document, expressed as a dictionary as per Data API syntax. Examples are: {"$set": {"field": "value}}, {"$inc": {"counter": 10}} and {"$unset": {"field": ""}}. See Data API operators for the full syntax.

upsert

bool

This parameter controls the behavior in absence of matches. If True, a single new document (resulting from applying update to an empty document) is inserted if no matches are found on the collection. If False, the operation silently does nothing in case of no matches.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the operation.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")
collection = database.my_collection

collection.insert_many([{"c": "red"}, {"c": "green"}, {"c": "blue"}])

collection.update_many({"c": {"$ne": "green"}}, {"$set": {"nongreen": True}})
# prints: UpdateResult(raw_results=..., update_info={'n': 2, 'updatedExisting': True, 'ok': 1.0, 'nModified': 2})
collection.update_many({"c": "orange"}, {"$set": {"is_also_fruit": True}})
# prints: UpdateResult(raw_results=..., update_info={'n': 0, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0})
collection.update_many(
    {"c": "orange"},
    {"$set": {"is_also_fruit": True}},
    upsert=True,
)
# prints: UpdateResult(raw_results=..., update_info={'n': 1, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0, 'upserted': '46643050-...'})
Hello TypeScript world. TBD.
Hello Java world. TBD.

Use the Data API updateMany command to update multiple documents in a collection.

In this example, the JSON payload uses the $set update operator to change a status to "inactive" for those documents that have an "active" status.

The updateMany command includes pagination support in the event more documents that matched the filter are on a subsequent page. For more, see the pagination note after the cURL example.

Example:

curl -s --location \
--request POST ${ASTRA_DB_API_ENDPOINT}/api/json/v1/${ASTRA_DB_KEYSPACE}/${ASTRA_DB_COLLECTION} \
--header "Token: ${ASTRA_DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
  "updateMany": {
    "filter": {"status" : "active" },
    "update" : {"$set" : { "status" : "inactive"}}
  }
}' | json_pp

Result:

{
   "status" : {
      "matchedCount" : 20,
      "modifiedCount" : 20,
      "moreData" : true
   }
}
Name Type Description

updateMany

command

Updates multiple documents in the database’s collection.

filter

object

Defines the criteria for selecting documents to which the command applies. The filter looks for documents where: * status: The key being evaluated in each document; a property within the documents in the database. * active: The value that the status property must match for the document to be selected. In this case, it’s targeting documents that currently have a status of active.

update

object

Specifies the modifications to be applied to all documents that match the criteria set by the filter.

$set

operator

An update operator indicating that the operation should overwrite the value of a property (or properties) in the selected documents.

status

String

Specifies the property in the document to update. In this example, active or inactive will be set for all selected documents. In this context, it’s changing the status from active to inactive.

In the updateMany response, check whether a nextPageState ID was returned. The updateMany command includes pagination support. You can update one page of matching documents at a time. If there is a subsequent page with matching documents to update, the transaction returns a nextPageState ID. You would then submit the insertMany command again and include the pageState ID in the new request to update the next page of documents that matched the filter:

{
    "updateMany": {
        "filter": {
            "active_user": true
        },
        "update": {
            "$set": {
                "new_data": "new_data_value"
            }
        },
        "options": {
            "pageState": "<id-value-from-prior-response>"
        }
    }
}

Follow the sequence of one or more insertMany commands until all pages with documents matching the filter have the update applied.

Find distinct values across documents

Get a list of the distinct values of a certain key in a collection.

  • Python

  • TypeScript

  • Java

  • cURL

collection.distinct("category")

Get the distinct values in a subset of documents, with a key defined by a dot-syntax path.

collection.distinct(
    "food.allergies",
    filter={"registered_for_dinner": True},
)

Returns:

List[Any] - A list of the distinct values encountered. Documents that lack the requested key are ignored.

Example response
['home_appliance', None, 'sports_equipment', {'cat_id': 54, 'cat_name': 'gardening_gear'}]

Parameters:

Name Type Description

key

str

The name of the field whose value is inspected across documents. Keys can use dot-notation to descend to deeper document levels. Example of acceptable key values: "field", "field.subfield", "field.3", and "field.3.subfield". If lists are encountered and no numeric index is specified, all items in the list are visited.

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$le": 100}}, {"$and": [{"name": "John"}, {"price": {"$le": 100}}]}. See Data API operators for the full list of operators.

max_time_ms

Type

A timeout, in milliseconds, for the operation.

Keep in mind that distinct is a client-side operation, which effectively browses all required documents using the logic of the find method and collects the unique values found for key. As such, there may be performance, latency and ultimately billing implications if the amount of matching documents is large.

For details on the behavior of "distinct" in conjunction with real-time changes in the collection contents, see the discussion in the Sort examples values section.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")
collection = database.my_collection

collection.insert_many(
    [
        {"name": "Marco", "food": ["apple", "orange"], "city": "Helsinki"},
        {"name": "Emma", "food": {"likes_fruit": True, "allergies": []}},
    ]
)

collection.distinct("name")
# prints: ['Marco', 'Emma']
collection.distinct("city")
# prints: ['Helsinki']
collection.distinct("food")
# prints: ['apple', 'orange', {'likes_fruit': True, 'allergies': []}]
collection.distinct("food.1")
# prints: ['orange']
collection.distinct("food.allergies")
# prints: []
collection.distinct("food.likes_fruit")
# prints: [True]
Hello TypeScript world. TBD.
Hello Java world. TBD.
Hello cURL world. TBD.

Count documents in a collection

Get the count of all documents in a collection.

  • Python

  • TypeScript

  • Java

  • cURL

collection.count_documents({}, upper_bound=500)

Get the count of the documents in a collection matching a condition.

collection.count_documents({"seq":{"$gt": 15}}, upper_bound=50)

Returns:

int - The exact count of the documents counted as requested, unless it exceeds the caller-provided or API-set upper bound. In case of overflow, an exception is raised.

Example response
320

Parameters:

Name Type Description

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$le": 100}}, {"$and": [{"name": "John"}, {"price": {"$le": 100}}]}. See Data API operators for the full list of operators.

upper_bound

int

A required ceiling on the result of the count operation. If the actual number of documents exceeds this value, an exception will be raised. Furthermore, if the actual number of documents exceeds the maximum count that the Data API can reach (regardless of upper_bound), an exception will be raised.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")
collection = database.my_collection

collection.insert_many([{"seq": i} for i in range(20)])

collection.count_documents({}, upper_bound=100)
# prints: 20
collection.count_documents({"seq":{"$gt": 15}}, upper_bound=100)
# prints: 4
collection.count_documents({}, upper_bound=10)
# Raises: astrapy.exceptions.TooManyDocumentsToCountException
Hello TypeScript world. TBD.
Hello Java world. TBD.

Use the Data API estimatedDocumentCount command to return the approximate number of documents in the collection.

In the estimatedDocumentCount command’s response, the document count is based on the current system statistics at the time the request was received by the database server. Due to potentially in-progress updates (document additions and/or deletions), the actual number of documents in the collection may be lower or higher in the database.

curl -s --location \
--request POST ${ASTRA_DB_API_ENDPOINT}/api/json/v1/${ASTRA_DB_KEYSPACE}/${ASTRA_DB_COLLECTION} \
--header "Token: ${ASTRA_DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
            "estimatedDocumentCount": {
            }
}' | json_pp

Result:

{
    "status": {
        "count": 21
    }
}

Properties:

Name Type Description

estimatedDocumentCount

command

Returns an estimated count of documents within the context of the specified collection.

The object is { } empty, meaning there are no filters or options for this implementation of the estimatedDocumentCount command.

Find and replace a document

Locate a document matching a filter condition and replace it with a new document, returning the document itself.

  • Python

  • TypeScript

  • Java

  • cURL

collection.find_one_and_replace(
    {"_id": "rule1"},
    {"text": "some animals are more equal!"},
)

Locate and replace a document, returning the document itself, additionally creating it if nothing is found.

collection.find_one_and_replace(
    {"_id": "rule1"},
    {"text": "some animals are more equal!"},
    upsert=True,
)

Returns:

Dict[str, Any] - The document that was found, either before or after the replacement (or a projection thereof, as requested). If no matches are found, None is returned.

Example response
{'_id': 'rule1', 'text': 'all animals are equal'}

Parameters:

Name Type Description

filter

Dict[str, Any]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$le": 100}}, {"$and": [{"name": "John"}, {"price": {"$le": 100}}]}. See Data API operators for the full list of operators.

replacement

Dict[str, Any]

the new document to write into the collection.

projection

Optional[ProjectionType]

Used to select a subset of fields in the document being returned. The projection can be: an iterable over the field names to return; a dictionary {field_name: True} to positively select certain fields; or a dictionary {field_name: False} if one wants to discard some fields from the response. The default is to return the whole documents.

vector

Optional[Iterable[float]]

A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to use vector search. That is, Approximate Nearest Neighbors (ANN) search, as the sorting criterion. In this way, the matched document (if any) will be the one that is most similar to the provided vector. This parameter cannot be used together with sort. See the find method for more details on this parameter.

sort

Optional[Dict[str, Any]]

With this dictionary parameter one can control the sorting order of the documents matching the filter, effectively determining what document will come first and hence be the replaced one. See the find method for more on sorting.

upsert

bool = False

This parameter controls the behavior in absence of matches. If True, replacement is inserted as a new document if no matches are found on the collection. If False, the operation silently does nothing in case of no matches.

return_document

str

A flag controlling what document is returned: if set to ReturnDocument.BEFORE, or the string "before", the document found on database is returned; if set to ReturnDocument.AFTER, or the string "after", the new document is returned. The default is "before".

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")
collection = database.my_collection
import astrapy

collection.insert_one({"_id": "rule1", "text": "all animals are equal"})

collection.find_one_and_replace(
    {"_id": "rule1"},
    {"text": "some animals are more equal!"},
)
# prints: {'_id': 'rule1', 'text': 'all animals are equal'}
collection.find_one_and_replace(
    {"text": "some animals are more equal!"},
    {"text": "and the pigs are the rulers"},
    return_document=astrapy.constants.ReturnDocument.AFTER,
)
# prints: {'_id': 'rule1', 'text': 'and the pigs are the rulers'}
collection.find_one_and_replace(
    {"_id": "rule2"},
    {"text": "F=ma^2"},
    return_document=astrapy.constants.ReturnDocument.AFTER,
)
# (returns None for no matches)
collection.find_one_and_replace(
    {"_id": "rule2"},
    {"text": "F=ma"},
    upsert=True,
    return_document=astrapy.constants.ReturnDocument.AFTER,
    projection={"_id": False},
)
# prints: {'text': 'F=ma'}
Hello TypeScript world. TBD.
Hello Java world. TBD.
Hello cURL world. TBD.

Replace a document

Replace a document in the collection with a new one.

  • Python

  • TypeScript

  • Java

  • cURL

replace_result = collection.replace_one(
    {"Marco": {"$exists": True}},
    {"Buda": "Pest"},
)

Replace a document in the collection with a new one, creating a new one if no match is found.

replace_result = collection.replace_one(
    {"Marco": {"$exists": True}},
    {"Buda": "Pest"},
    upsert=True,
)

Returns:

UpdateResult - An object representing the response from the database after the replace operation. It includes information about the operation.

Example response
UpdateResult(raw_results=[{'data': {'document': {'_id': '1', 'Marco': 'Polo'}}, 'status': {'matchedCount': 1, 'modifiedCount': 1}}], update_info={'n': 1, 'updatedExisting': True, 'ok': 1.0, 'nModified': 1})

Parameters:

Name Type Description

filter

Dict[str, Any]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$le": 100}}, {"$and": [{"name": "John"}, {"price": {"$le": 100}}]}. See Data API operators for the full list of operators.

replacement

Dict[str, Any]

the new document to write into the collection.

vector

Optional[Iterable[float]]

A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to use vector search. That is, Approximate Nearest Neighbors (ANN) search, as the sorting criterion. In this way, the matched document (if any) will be the one that is most similar to the provided vector. This parameter cannot be used together with sort. See the find method for more details on this parameter.

sort

Optional[Dict[str, Any]]

With this dictionary parameter one can control the sorting order of the documents matching the filter, effectively determining what document will come first and hence be the replaced one. See the find method for more on sorting.

upsert

bool = False

This parameter controls the behavior in absence of matches. If True, replacement is inserted as a new document if no matches are found on the collection. If False, the operation silently does nothing in case of no matches.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")
collection = database.my_collection

collection.insert_one({"Marco": "Polo"})
collection.replace_one({"Marco": {"$exists": True}}, {"Buda": "Pest"})
 prints: UpdateResult(raw_results=..., update_info={'n': 1, 'updatedExisting': True, 'ok': 1.0, 'nModified': 1})
collection.find_one({"Buda": "Pest"})
 prints: {'_id': '8424905a-...', 'Buda': 'Pest'}
collection.replace_one({"Mirco": {"$exists": True}}, {"Oh": "yeah?"})
 prints: UpdateResult(raw_results=..., update_info={'n': 0, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0})
collection.replace_one({"Mirco": {"$exists": True}}, {"Oh": "yeah?"}, upsert=True)
 prints: UpdateResult(raw_results=..., update_info={'n': 1, 'updatedExisting': False, 'ok': 1.0, 'nModified': 0, 'upserted': '931b47d6-...'})
Hello TypeScript world. TBD.
Hello Java world. TBD.
Hello cURL world. TBD.

Find and delete a document

Locate a document matching a filter condition and delete it, returning the document itself.

  • Python

  • TypeScript

  • Java

  • cURL

collection.find_one_and_delete({"status": "stale_entry"})

Returns:

Dict[str, Any] - The document that was just deleted (or a projection thereof, as requested). If no matches are found, None is returned.

Example response
{'_id': 199, 'status': 'stale_entry', 'request_id': 'A4431'}

Parameters:

Name Type Description

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$le": 100}}, {"$and": [{"name": "John"}, {"price": {"$le": 100}}]}. See Data API operators for the full list of operators.

projection

Optional[Union[Iterable[str], Dict[str, bool]]]

Used to select a subset of fields in the documents being returned. The projection can be: an iterable over the field names to return; a dictionary {field_name: True} to positively select certain fields; or a dictionary {field_name: False} if one wants to discard some fields from the response. The default is to return the whole documents.

vector

Optional[Iterable[float]]

A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to perform vector search. That is, Approximate Nearest Neighbors (ANN) search, extracting the most similar document in the collection matching the filter. This parameter cannot be used together with sort. See the find method for more details on this parameter.

sort

Optional[Dict[str, Any]]

With this dictionary parameter one can control the sorting order of the documents matching the filter, effectively determining what document will come first and hence be the deleted one. See the find method for more on sorting.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")
collection = database.my_collection

collection.insert_many(
    [
        {"species": "swan", "class": "Aves"},
        {"species": "frog", "class": "Amphibia"},
    ],
)
collection.find_one_and_delete(
    {"species": {"$ne": "frog"}},
    projection=["species"],
)
# prints: {'_id': '5997fb48-...', 'species': 'swan'}
collection.find_one_and_delete({"species": {"$ne": "frog"}})
# (returns None for no matches)
Hello TypeScript world. TBD.
Hello Java world. TBD.
Hello cURL world. TBD.

Delete a document

Locate and delete a single document from a collection.

  • Python

  • TypeScript

  • Java

  • cURL

response = collection.delete_one({ "_id": "1" })

Locate and delete a single document from a collection by any attribute (as long as it is covered by the collection’s indexing configuration).

document = collection.delete_one({"location": "warehouse_C"})

Locate and delete a single document from a collection by an arbitrary filtering clause.

document = collection.delete_one({"tag": {"$exists": True}})

Delete the most similar document to a given vector.

result = collection.delete_one({}, vector=[.12, .52, .32])

Returns:

DeleteResult - An object representing the response from the database after the delete operation. It includes information about the success of the operation.

Example response
DeleteResult(raw_results=[{'status': {'deletedCount': 1}}], deleted_count=1)

Parameters:

Name Type Description

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$le": 100}}, {"$and": [{"name": "John"}, {"price": {"$le": 100}}]}. See Data API operators for the full list of operators.

vector

Optional[Iterable[float]]

A suitable vector, meaning a list of float numbers of the appropriate dimensionality, to use vector search. That is, Approximate Nearest Neighbors (ANN) search, as the sorting criterion. In this way, the matched document (if any) will be the one that is most similar to the provided vector. This parameter cannot be used together with sort. See the find method for more details on this parameter.

sort

Optional[Dict[str, Any]]

With this dictionary parameter one can control the sorting order of the documents matching the filter, effectively determining what document will come first and hence be the deleted one. See the find method for more on sorting.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request.

Example:

from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")
collection = database.my_collection

collection.insert_many([{"seq": 1}, {"seq": 0}, {"seq": 2}])

collection.delete_one({"seq": 1})
# prints: DeleteResult(raw_results=..., deleted_count=1)
collection.distinct("seq")
# prints: [0, 2]
collection.delete_one(
    {"seq": {"$exists": True}},
    sort={"seq": astrapy.constants.SortDocuments.DESCENDING},
)
# prints: DeleteResult(raw_results=..., deleted_count=1)
collection.distinct("seq")
# prints: [0]
collection.delete_one({"seq": 2})
# prints: DeleteResult(raw_results=..., deleted_count=0)
Hello TypeScript world. TBD.
Hello Java world. TBD.
Hello cURL world. TBD.

Delete documents

Delete multiple documents from a collection.

  • Python

  • TypeScript

  • Java

  • cURL

delete_result = collection.delete_many({"status": "processed"})

Returns:

DeleteResult - An object representing the response from the database after the delete operation. It includes information about the success of the operation.

Example response
DeleteResult(raw_results=[{'status': {'deletedCount': 2}}], deleted_count=2)

Parameters:

Name Type Description

filter

Optional[Dict[str, Any]]

A predicate expressed as a dictionary according to the Data API filter syntax. Examples are {}, {"name": "John"}, {"price": {"$le": 100}}, {"$and": [{"name": "John"}, {"price": {"$le": 100}}]}. See Data API operators for the full list of operators. The delete_many method does not accept an empty filter: see delete_all to completely erase all contents of a collection

max_time_ms

Optional[int]

A timeout, in milliseconds, for the operation.

This method would not admit an empty filter clause: use the delete_all method to delete all documents in the collection. If you want to delete all documents, use delete_all instead.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")
collection = database.my_collection

collection.insert_many([{"seq": 1}, {"seq": 0}, {"seq": 2}])

collection.delete_many({"seq": {"$lte": 1}})
# prints: DeleteResult(raw_results=..., deleted_count=2)
collection.distinct("seq")
# prints: [2]
collection.delete_many({"seq": {"$lte": 1}})
# prints: DeleteResult(raw_results=..., deleted_count=0)
Hello TypeScript world. TBD.
Hello Java world. TBD.
Hello cURL world. TBD.

Execute multiple write operations

Execute a (reusable) list of write operations on a collection with a single command.

  • Python

  • TypeScript

  • Java

  • cURL

bw_results = collection.bulk_write(
    [
        InsertMany([{"a": 1}, {"a": 2}]),
        ReplaceOne(
            {"z": 9},
            replacement={"z": 9, "replaced": True},
            upsert=True,
        ),
    ],
)

Returns:

BulkWriteResult - A single object summarizing the whole list of requested operations. The keys in the map attributes of the result (when present) are the integer indices of the corresponding operation in the requests iterable.

Example response
BulkWriteResult(bulk_api_results={0: ..., 1: ...}, deleted_count=0, inserted_count=3, matched_count=0, modified_count=0, upserted_count=1, upserted_ids={1: '2addd676-...'})

Parameters:

Name Type Description

requests

Iterable[BaseOperation]

An iterable over concrete subclasses of BaseOperation, such as InsertMany or ReplaceOne. Each such object represents an operation ready to be executed on a collection, and is instantiated by passing the same parameters as one would the corresponding collection method.

ordered

bool

Whether to launch the requests one after the other or in arbitrary order, possibly in a concurrent fashion. For performance reasons, ordered=False should be preferred when compatible with the needs of the application flow.

concurrency

Optional[int]

Maximum number of concurrent operations executing at a given time. It cannot be more than one for ordered bulk writes.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the whole bulk write. Remember that, if the method call times out, then there’s no guarantee about what portion of the bulk write has been received and successfully executed by the Data API.

Example:

from astrapy import DataAPIClient
from astrapy.operations import (
    InsertOne,
    InsertMany,
    UpdateOne,
    UpdateMany,
    ReplaceOne,
    DeleteOne,
    DeleteMany,
)
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")
collection = database.my_collection

op1 = InsertMany([{"a": 1}, {"a": 2}])
op2 = ReplaceOne({"z": 9}, replacement={"z": 9, "replaced": True}, upsert=True)
collection.bulk_write([op1, op2])
# prints: BulkWriteResult(bulk_api_results={0: ..., 1: ...}, deleted_count=0, inserted_count=3, matched_count=0, modified_count=0, upserted_count=1, upserted_ids={1: '2addd676-...'})
collection.count_documents({}, upper_bound=100)
# prints: 3
collection.distinct("replaced")
# prints: [True]
Hello TypeScript world. TBD.
Hello Java world. TBD.
Hello cURL world. TBD.

Delete all documents from a collection

Delete all documents in a collection.

  • Python

  • TypeScript

  • Java

  • cURL

result = collection.delete_all()

Returns:

Dict - A dictionary in the form {"ok": 1} if the method succeeds.

Example response
{'ok': 1}

Parameters:

Name Type Description

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request.

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")
collection = database.my_collection

my_coll.distinct("seq")
# prints: [2, 1, 0]
my_coll.count_documents({}, upper_bound=100)
# prints: 4
my_coll.delete_all()
# prints: {'ok': 1}
my_coll.count_documents({}, upper_bound=100)
# prints: 0
Hello TypeScript world. TBD.
Hello Java world. TBD.
Hello cURL world. TBD.

Next steps

See the Administration reference topic.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com