[![Visit Carbon](./header.png)](https://carbon.ai) # [Carbon](https://carbon.ai) Connect external data to LLMs, no matter the source. [![npm](https://img.shields.io/badge/gem-v0.2.36-blue)](https://rubygems.org/gems/carbon_ruby_sdk/versions/0.2.36)
## Table of Contents - [Installation](#installation) - [Getting Started](#getting-started) - [Raw HTTP Response](#raw-http-response) - [Reference](#reference) * [`carbon.auth.get_access_token`](#carbonauthget_access_token) * [`carbon.auth.get_white_labeling`](#carbonauthget_white_labeling) * [`carbon.crm.get_account`](#carboncrmget_account) * [`carbon.crm.get_accounts`](#carboncrmget_accounts) * [`carbon.crm.get_contact`](#carboncrmget_contact) * [`carbon.crm.get_contacts`](#carboncrmget_contacts) * [`carbon.crm.get_lead`](#carboncrmget_lead) * [`carbon.crm.get_leads`](#carboncrmget_leads) * [`carbon.crm.get_opportunities`](#carboncrmget_opportunities) * [`carbon.crm.get_opportunity`](#carboncrmget_opportunity) * [`carbon.data_sources.query_user_data_sources`](#carbondata_sourcesquery_user_data_sources) * [`carbon.data_sources.revoke_access_token`](#carbondata_sourcesrevoke_access_token) * [`carbon.embeddings.get_documents`](#carbonembeddingsget_documents) * [`carbon.embeddings.get_embeddings_and_chunks`](#carbonembeddingsget_embeddings_and_chunks) * [`carbon.embeddings.list`](#carbonembeddingslist) * [`carbon.embeddings.upload_chunks_and_embeddings`](#carbonembeddingsupload_chunks_and_embeddings) * [`carbon.files.create_user_file_tags`](#carbonfilescreate_user_file_tags) * [`carbon.files.delete`](#carbonfilesdelete) * [`carbon.files.delete_file_tags`](#carbonfilesdelete_file_tags) * [`carbon.files.delete_many`](#carbonfilesdelete_many) * [`carbon.files.delete_v2`](#carbonfilesdelete_v2) * [`carbon.files.get_parsed_file`](#carbonfilesget_parsed_file) * [`carbon.files.get_raw_file`](#carbonfilesget_raw_file) * [`carbon.files.modify_cold_storage_parameters`](#carbonfilesmodify_cold_storage_parameters) * [`carbon.files.move_to_hot_storage`](#carbonfilesmove_to_hot_storage) * [`carbon.files.query_user_files`](#carbonfilesquery_user_files) * [`carbon.files.query_user_files_deprecated`](#carbonfilesquery_user_files_deprecated) * [`carbon.files.resync`](#carbonfilesresync) * [`carbon.files.upload`](#carbonfilesupload) * [`carbon.files.upload_from_url`](#carbonfilesupload_from_url) * [`carbon.files.upload_text`](#carbonfilesupload_text) * [`carbon.integrations.cancel`](#carbonintegrationscancel) * [`carbon.integrations.connect_data_source`](#carbonintegrationsconnect_data_source) * [`carbon.integrations.connect_freshdesk`](#carbonintegrationsconnect_freshdesk) * [`carbon.integrations.connect_gitbook`](#carbonintegrationsconnect_gitbook) * [`carbon.integrations.connect_guru`](#carbonintegrationsconnect_guru) * [`carbon.integrations.create_aws_iam_user`](#carbonintegrationscreate_aws_iam_user) * [`carbon.integrations.get_oauth_url`](#carbonintegrationsget_oauth_url) * [`carbon.integrations.list_confluence_pages`](#carbonintegrationslist_confluence_pages) * [`carbon.integrations.list_conversations`](#carbonintegrationslist_conversations) * [`carbon.integrations.list_data_source_items`](#carbonintegrationslist_data_source_items) * [`carbon.integrations.list_folders`](#carbonintegrationslist_folders) * [`carbon.integrations.list_gitbook_spaces`](#carbonintegrationslist_gitbook_spaces) * [`carbon.integrations.list_labels`](#carbonintegrationslist_labels) * [`carbon.integrations.list_outlook_categories`](#carbonintegrationslist_outlook_categories) * [`carbon.integrations.list_repos`](#carbonintegrationslist_repos) * [`carbon.integrations.sync_azure_blob_files`](#carbonintegrationssync_azure_blob_files) * [`carbon.integrations.sync_azure_blob_storage`](#carbonintegrationssync_azure_blob_storage) * [`carbon.integrations.sync_confluence`](#carbonintegrationssync_confluence) * [`carbon.integrations.sync_data_source_items`](#carbonintegrationssync_data_source_items) * [`carbon.integrations.sync_files`](#carbonintegrationssync_files) * [`carbon.integrations.sync_git_hub`](#carbonintegrationssync_git_hub) * [`carbon.integrations.sync_gitbook`](#carbonintegrationssync_gitbook) * [`carbon.integrations.sync_gmail`](#carbonintegrationssync_gmail) * [`carbon.integrations.sync_outlook`](#carbonintegrationssync_outlook) * [`carbon.integrations.sync_repos`](#carbonintegrationssync_repos) * [`carbon.integrations.sync_rss_feed`](#carbonintegrationssync_rss_feed) * [`carbon.integrations.sync_s3_files`](#carbonintegrationssync_s3_files) * [`carbon.integrations.sync_slack`](#carbonintegrationssync_slack) * [`carbon.organizations.get`](#carbonorganizationsget) * [`carbon.organizations.update`](#carbonorganizationsupdate) * [`carbon.organizations.update_stats`](#carbonorganizationsupdate_stats) * [`carbon.users.delete`](#carbonusersdelete) * [`carbon.users.get`](#carbonusersget) * [`carbon.users.list`](#carbonuserslist) * [`carbon.users.toggle_user_features`](#carbonuserstoggle_user_features) * [`carbon.users.update_users`](#carbonusersupdate_users) * [`carbon.utilities.fetch_urls`](#carbonutilitiesfetch_urls) * [`carbon.utilities.fetch_webpage`](#carbonutilitiesfetch_webpage) * [`carbon.utilities.fetch_youtube_transcripts`](#carbonutilitiesfetch_youtube_transcripts) * [`carbon.utilities.process_sitemap`](#carbonutilitiesprocess_sitemap) * [`carbon.utilities.scrape_sitemap`](#carbonutilitiesscrape_sitemap) * [`carbon.utilities.scrape_web`](#carbonutilitiesscrape_web) * [`carbon.utilities.search_urls`](#carbonutilitiessearch_urls) * [`carbon.utilities.user_webpages`](#carbonutilitiesuser_webpages) * [`carbon.webhooks.add_url`](#carbonwebhooksadd_url) * [`carbon.webhooks.delete_url`](#carbonwebhooksdelete_url) * [`carbon.webhooks.urls`](#carbonwebhooksurls) ## Installation Add to Gemfile: ```ruby gem 'carbon_ruby_sdk', '~> 0.2.36' ``` ## Getting Started ```ruby require 'carbon_ruby_sdk' # 1) Get an access token for a customer configuration = Carbon::Configuration.new configuration.api_key = "YOUR_API_KEY" configuration.customer_id = "YOUR_CUSTOMER_ID" carbon = Carbon::Client.new(configuration) token = carbon.auth.get_access_token # 2) Use the access token to authenticate moving forward configuration = Carbon::Configuration.new configuration.access_token = token.access_token carbon = Carbon::Client.new(configuration) # use SDK as usual white_labeling = carbon.auth.get_white_labeling ``` ## Raw HTTP Response To access the raw HTTP response, suffix any method with `_with_http_info`. ```ruby result = carbon.auth.get_access_token_with_http_info p result.data # [TokenResponse] Deserialized data p.result.status_code # [Integer] HTTP status code p.result.headers # [Hash] HTTP headers p.result.response # [Faraday::Response] Raw HTTP response ``` ## Reference ### `carbon.auth.get_access_token` Get Access Token #### πŸ› οΈ Usage ```ruby result = carbon.auth.get_access_token p result ``` #### πŸ”„ Return [TokenResponse](./lib/carbon_ruby_sdk/models/token_response.rb) #### 🌐 Endpoint `/auth/v1/access_token` `GET` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.auth.get_white_labeling` Returns whether or not the organization is white labeled and which integrations are white labeled :param current_user: the current user :param db: the database session :return: a WhiteLabelingResponse #### πŸ› οΈ Usage ```ruby result = carbon.auth.get_white_labeling p result ``` #### πŸ”„ Return [WhiteLabelingResponse](./lib/carbon_ruby_sdk/models/white_labeling_response.rb) #### 🌐 Endpoint `/auth/v1/white_labeling` `GET` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.crm.get_account` Get Account #### πŸ› οΈ Usage ```ruby result = carbon.crm.get_account( id: "id_example", data_source_id: 1, include_remote_data: false, includes: [ "string_example" ], ) p result ``` #### βš™οΈ Parameters ##### id: `String` ##### data_source_id: `Integer` ##### include_remote_data: `Boolean` ##### includes: Array<[`BaseIncludes`](./lib/carbon_ruby_sdk/models/base_includes.rb)> #### πŸ”„ Return [Account](./lib/carbon_ruby_sdk/models/account.rb) #### 🌐 Endpoint `/integrations/data/crm/accounts/{id}` `GET` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.crm.get_accounts` Get Accounts #### πŸ› οΈ Usage ```ruby result = carbon.crm.get_accounts( data_source_id: 1, include_remote_data: false, next_cursor: "string_example", page_size: 1, order_dir: "asc", includes: [], filters: { }, order_by: "created_at", ) p result ``` #### βš™οΈ Parameters ##### data_source_id: `Integer` ##### include_remote_data: `Boolean` ##### next_cursor: `String` ##### page_size: `Integer` ##### order_dir: [`OrderDirV2Nullable`](./lib/carbon_ruby_sdk/models/order_dir_v2_nullable.rb) ##### includes: Array<[`BaseIncludes`](./lib/carbon_ruby_sdk/models/base_includes.rb)> ##### filters: [`AccountFilters`](./lib/carbon_ruby_sdk/models/account_filters.rb) ##### order_by: [`AccountsOrderByNullable`](./lib/carbon_ruby_sdk/models/accounts_order_by_nullable.rb) #### πŸ”„ Return [AccountResponse](./lib/carbon_ruby_sdk/models/account_response.rb) #### 🌐 Endpoint `/integrations/data/crm/accounts` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.crm.get_contact` Get Contact #### πŸ› οΈ Usage ```ruby result = carbon.crm.get_contact( id: "id_example", data_source_id: 1, include_remote_data: false, includes: [ "string_example" ], ) p result ``` #### βš™οΈ Parameters ##### id: `String` ##### data_source_id: `Integer` ##### include_remote_data: `Boolean` ##### includes: Array<[`BaseIncludes`](./lib/carbon_ruby_sdk/models/base_includes.rb)> #### πŸ”„ Return [Contact](./lib/carbon_ruby_sdk/models/contact.rb) #### 🌐 Endpoint `/integrations/data/crm/contacts/{id}` `GET` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.crm.get_contacts` Get Contacts #### πŸ› οΈ Usage ```ruby result = carbon.crm.get_contacts( data_source_id: 1, include_remote_data: false, next_cursor: "string_example", page_size: 1, order_dir: "asc", includes: [], filters: { }, order_by: "created_at", ) p result ``` #### βš™οΈ Parameters ##### data_source_id: `Integer` ##### include_remote_data: `Boolean` ##### next_cursor: `String` ##### page_size: `Integer` ##### order_dir: [`OrderDirV2Nullable`](./lib/carbon_ruby_sdk/models/order_dir_v2_nullable.rb) ##### includes: Array<[`BaseIncludes`](./lib/carbon_ruby_sdk/models/base_includes.rb)> ##### filters: [`ContactFilters`](./lib/carbon_ruby_sdk/models/contact_filters.rb) ##### order_by: [`ContactsOrderByNullable`](./lib/carbon_ruby_sdk/models/contacts_order_by_nullable.rb) #### πŸ”„ Return [ContactsResponse](./lib/carbon_ruby_sdk/models/contacts_response.rb) #### 🌐 Endpoint `/integrations/data/crm/contacts` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.crm.get_lead` Get Lead #### πŸ› οΈ Usage ```ruby result = carbon.crm.get_lead( id: "id_example", data_source_id: 1, include_remote_data: false, includes: [ "string_example" ], ) p result ``` #### βš™οΈ Parameters ##### id: `String` ##### data_source_id: `Integer` ##### include_remote_data: `Boolean` ##### includes: Array<[`BaseIncludes`](./lib/carbon_ruby_sdk/models/base_includes.rb)> #### πŸ”„ Return [Lead](./lib/carbon_ruby_sdk/models/lead.rb) #### 🌐 Endpoint `/integrations/data/crm/leads/{id}` `GET` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.crm.get_leads` Get Leads #### πŸ› οΈ Usage ```ruby result = carbon.crm.get_leads( data_source_id: 1, include_remote_data: false, next_cursor: "string_example", page_size: 1, order_dir: "asc", includes: [], filters: { }, order_by: "created_at", ) p result ``` #### βš™οΈ Parameters ##### data_source_id: `Integer` ##### include_remote_data: `Boolean` ##### next_cursor: `String` ##### page_size: `Integer` ##### order_dir: [`OrderDirV2Nullable`](./lib/carbon_ruby_sdk/models/order_dir_v2_nullable.rb) ##### includes: Array<[`BaseIncludes`](./lib/carbon_ruby_sdk/models/base_includes.rb)> ##### filters: [`LeadFilters`](./lib/carbon_ruby_sdk/models/lead_filters.rb) ##### order_by: [`LeadsOrderByNullable`](./lib/carbon_ruby_sdk/models/leads_order_by_nullable.rb) #### πŸ”„ Return [LeadsResponse](./lib/carbon_ruby_sdk/models/leads_response.rb) #### 🌐 Endpoint `/integrations/data/crm/leads` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.crm.get_opportunities` Get Opportunities #### πŸ› οΈ Usage ```ruby result = carbon.crm.get_opportunities( data_source_id: 1, include_remote_data: false, next_cursor: "string_example", page_size: 1, order_dir: "asc", includes: [], filters: { "status" => "WON", }, order_by: "created_at", ) p result ``` #### βš™οΈ Parameters ##### data_source_id: `Integer` ##### include_remote_data: `Boolean` ##### next_cursor: `String` ##### page_size: `Integer` ##### order_dir: [`OrderDirV2Nullable`](./lib/carbon_ruby_sdk/models/order_dir_v2_nullable.rb) ##### includes: Array<[`BaseIncludes`](./lib/carbon_ruby_sdk/models/base_includes.rb)> ##### filters: [`OpportunityFilters`](./lib/carbon_ruby_sdk/models/opportunity_filters.rb) ##### order_by: [`OpportunitiesOrderByNullable`](./lib/carbon_ruby_sdk/models/opportunities_order_by_nullable.rb) #### πŸ”„ Return [OpportunitiesResponse](./lib/carbon_ruby_sdk/models/opportunities_response.rb) #### 🌐 Endpoint `/integrations/data/crm/opportunities` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.crm.get_opportunity` Get Opportunity #### πŸ› οΈ Usage ```ruby result = carbon.crm.get_opportunity( id: "id_example", data_source_id: 1, include_remote_data: false, includes: [ "string_example" ], ) p result ``` #### βš™οΈ Parameters ##### id: `String` ##### data_source_id: `Integer` ##### include_remote_data: `Boolean` ##### includes: Array<[`BaseIncludes`](./lib/carbon_ruby_sdk/models/base_includes.rb)> #### πŸ”„ Return [Opportunity](./lib/carbon_ruby_sdk/models/opportunity.rb) #### 🌐 Endpoint `/integrations/data/crm/opportunities/{id}` `GET` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.data_sources.query_user_data_sources` User Data Sources #### πŸ› οΈ Usage ```ruby result = carbon.data_sources.query_user_data_sources( pagination: { "limit" => 10, "offset" => 0, }, order_by: "created_at", order_dir: "desc", filters: { "source" => "GOOGLE_CLOUD_STORAGE", }, ) p result ``` #### βš™οΈ Parameters ##### pagination: [`Pagination`](./lib/carbon_ruby_sdk/models/pagination.rb) ##### order_by: [`OrganizationUserDataSourceOrderByColumns`](./lib/carbon_ruby_sdk/models/organization_user_data_source_order_by_columns.rb) ##### order_dir: [`OrderDir`](./lib/carbon_ruby_sdk/models/order_dir.rb) ##### filters: [`OrganizationUserDataSourceFilters`](./lib/carbon_ruby_sdk/models/organization_user_data_source_filters.rb) #### πŸ”„ Return [OrganizationUserDataSourceResponse](./lib/carbon_ruby_sdk/models/organization_user_data_source_response.rb) #### 🌐 Endpoint `/user_data_sources` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.data_sources.revoke_access_token` Revoke Access Token #### πŸ› οΈ Usage ```ruby result = carbon.data_sources.revoke_access_token( data_source_id: 1, ) p result ``` #### βš™οΈ Parameters ##### data_source_id: `Integer` #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/revoke_access_token` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.embeddings.get_documents` For pre-filtering documents, using `tags_v2` is preferred to using `tags` (which is now deprecated). If both `tags_v2` and `tags` are specified, `tags` is ignored. `tags_v2` enables building complex filters through the use of "AND", "OR", and negation logic. Take the below input as an example: ```json { "OR": [ { "key": "subject", "value": "holy-bible", "negate": false }, { "key": "person-of-interest", "value": "jesus christ", "negate": false }, { "key": "genre", "value": "religion", "negate": true } { "AND": [ { "key": "subject", "value": "tao-te-ching", "negate": false }, { "key": "author", "value": "lao-tzu", "negate": false } ] } ] } ``` In this case, files will be filtered such that: 1. "subject" = "holy-bible" OR 2. "person-of-interest" = "jesus christ" OR 3. "genre" != "religion" OR 4. "subject" = "tao-te-ching" AND "author" = "lao-tzu" Note that the top level of the query must be either an "OR" or "AND" array. Currently, nesting is limited to 3. For tag blocks (those with "key", "value", and "negate" keys), the following typing rules apply: 1. "key" isn't optional and must be a `string` 2. "value" isn't optional and can be `any` or list[`any`] 3. "negate" is optional and must be `true` or `false`. If present and `true`, then the filter block is negated in the resulting query. It is `false` by default. When querying embeddings, you can optionally specify the `media_type` parameter in your request. By default (if not set), it is equal to "TEXT". This means that the query will be performed over files that have been parsed as text (for now, this covers all files except image files). If it is equal to "IMAGE", the query will be performed over image files (for now, `.jpg` and `.png` files). You can think of this field as an additional filter on top of any filters set in `file_ids` and When `hybrid_search` is set to true, a combination of keyword search and semantic search are used to rank and select candidate embeddings during information retrieval. By default, these search methods are weighted equally during the ranking process. To adjust the weight (or "importance") of each search method, you can use the `hybrid_search_tuning_parameters` property. The description for the different tuning parameters are: - `weight_a`: weight to assign to semantic search - `weight_b`: weight to assign to keyword search You must ensure that `sum(weight_a, weight_b,..., weight_n)` for all *n* weights is equal to 1. The equality has an error tolerance of 0.001 to account for possible floating point issues. In order to use hybrid search for a customer across a set of documents, two flags need to be enabled: 1. Use the `/modify_user_configuration` endpoint to to enable `sparse_vectors` for the customer. The payload body for this request is below: ``` { "configuration_key_name": "sparse_vectors", "value": { "enabled": true } } ``` 2. Make sure hybrid search is enabled for the documents across which you want to perform the search. For the `/uploadfile` endpoint, this can be done by setting the following query parameter: `generate_sparse_vectors=true` Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI's multimodal model; for text, we support OpenAI's `text-embedding-ada-002` and Cohere's embed-multilingual-v3.0. The model can be specified via the `embedding_model` parameter (in the POST body for `/embeddings`, and a query parameter in `/uploadfile`). If no model is supplied, the `text-embedding-ada-002` is used by default. When performing embedding queries, embeddings from files that used the specified model will be considered in the query. For example, if files A and B have embeddings generated with `OPENAI`, and files C and D have embeddings generated with `COHERE_MULTILINGUAL_V3`, then by default, queries will only consider files A and B. If `COHERE_MULTILINGUAL_V3` is specified as the `embedding_model` in `/embeddings`, then only files C and D will be considered. Make sure that the set of all files you want considered for a query have embeddings generated via the same model. For now, **do not** set `VERTEX_MULTIMODAL` as an `embedding_model`. This model is used automatically by Carbon when it detects an image file. #### πŸ› οΈ Usage ```ruby result = carbon.embeddings.get_documents( query: "a", k: 1, tags: { "key": "string_example", }, query_vector: [ 3.14 ], file_ids: [ 1 ], parent_file_ids: [ 1 ], include_all_children: false, tags_v2: { }, include_tags: true, include_vectors: true, include_raw_file: true, hybrid_search: true, hybrid_search_tuning_parameters: { "weight_a" => 0.5, "weight_b" => 0.5, }, media_type: "TEXT", embedding_model: "OPENAI", include_file_level_metadata: false, high_accuracy: false, rerank: { "model" => "model_example", }, file_types_at_source: [ "string_example" ], exclude_cold_storage_files: false, ) p result ``` #### βš™οΈ Parameters ##### query: `String` Query for which to get related chunks and embeddings. ##### k: `Integer` Number of related chunks to return. ##### tags: Hash A set of tags to limit the search to. Deprecated and may be removed in the future. ##### query_vector: Array<`Float`> Optional query vector for which to get related chunks and embeddings. It must have been generated by the same model used to generate the embeddings across which the search is being conducted. Cannot provide both `query` and `query_vector`. ##### file_ids: Array<`Integer`> Optional list of file IDs to limit the search to ##### parent_file_ids: Array<`Integer`> Optional list of parent file IDs to limit the search to. A parent file describes a file to which another file belongs (e.g. a folder) ##### include_all_children: `Boolean` Flag to control whether or not to include all children of filtered files in the embedding search. ##### tags_v2: `Object` A set of tags to limit the search to. Use this instead of `tags`, which is deprecated. ##### include_tags: `Boolean` Flag to control whether or not to include tags for each chunk in the response. ##### include_vectors: `Boolean` Flag to control whether or not to include embedding vectors in the response. ##### include_raw_file: `Boolean` Flag to control whether or not to include a signed URL to the raw file containing each chunk in the response. ##### hybrid_search: `Boolean` Flag to control whether or not to perform hybrid search. ##### hybrid_search_tuning_parameters: [`HybridSearchTuningParamsNullable`](./lib/carbon_ruby_sdk/models/hybrid_search_tuning_params_nullable.rb) ##### media_type: [`FileContentTypesNullable`](./lib/carbon_ruby_sdk/models/file_content_types_nullable.rb) ##### embedding_model: [`EmbeddingGeneratorsNullable`](./lib/carbon_ruby_sdk/models/embedding_generators_nullable.rb) ##### include_file_level_metadata: `Boolean` Flag to control whether or not to include file-level metadata in the response. This metadata will be included in the `content_metadata` field of each document along with chunk/embedding level metadata. ##### high_accuracy: `Boolean` Flag to control whether or not to perform a high accuracy embedding search. By default, this is set to false. If true, the search may return more accurate results, but may take longer to complete. ##### rerank: [`RerankParamsNullable`](./lib/carbon_ruby_sdk/models/rerank_params_nullable.rb) ##### file_types_at_source: Array<[`AutoSyncedSourceTypesPropertyInner`](./lib/carbon_ruby_sdk/models/auto_synced_source_types_property_inner.rb)> Filter files based on their type at the source (for example help center tickets and articles) ##### exclude_cold_storage_files: `Boolean` Flag to control whether or not to exclude files that are not in hot storage. If set to False, then an error will be returned if any filtered files are in cold storage. #### πŸ”„ Return [DocumentResponseList](./lib/carbon_ruby_sdk/models/document_response_list.rb) #### 🌐 Endpoint `/embeddings` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.embeddings.get_embeddings_and_chunks` Retrieve Embeddings And Content #### πŸ› οΈ Usage ```ruby result = carbon.embeddings.get_embeddings_and_chunks( filters: { "user_file_id" => 1, "embedding_model" => "OPENAI", }, pagination: { "limit" => 10, "offset" => 0, }, order_by: "created_at", order_dir: "desc", include_vectors: false, ) p result ``` #### βš™οΈ Parameters ##### filters: [`EmbeddingsAndChunksFilters`](./lib/carbon_ruby_sdk/models/embeddings_and_chunks_filters.rb) ##### pagination: [`Pagination`](./lib/carbon_ruby_sdk/models/pagination.rb) ##### order_by: [`EmbeddingsAndChunksOrderByColumns`](./lib/carbon_ruby_sdk/models/embeddings_and_chunks_order_by_columns.rb) ##### order_dir: [`OrderDir`](./lib/carbon_ruby_sdk/models/order_dir.rb) ##### include_vectors: `Boolean` #### πŸ”„ Return [EmbeddingsAndChunksResponse](./lib/carbon_ruby_sdk/models/embeddings_and_chunks_response.rb) #### 🌐 Endpoint `/text_chunks` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.embeddings.list` Retrieve Embeddings And Content V2 #### πŸ› οΈ Usage ```ruby result = carbon.embeddings.list( filters: { "include_all_children" => false, "non_synced_only" => false, }, pagination: { "limit" => 10, "offset" => 0, }, order_by: "created_at", order_dir: "desc", include_vectors: false, ) p result ``` #### βš™οΈ Parameters ##### filters: [`OrganizationUserFilesToSyncFilters`](./lib/carbon_ruby_sdk/models/organization_user_files_to_sync_filters.rb) ##### pagination: [`Pagination`](./lib/carbon_ruby_sdk/models/pagination.rb) ##### order_by: [`OrganizationUserFilesToSyncOrderByTypes`](./lib/carbon_ruby_sdk/models/organization_user_files_to_sync_order_by_types.rb) ##### order_dir: [`OrderDir`](./lib/carbon_ruby_sdk/models/order_dir.rb) ##### include_vectors: `Boolean` #### πŸ”„ Return [EmbeddingsAndChunksResponse](./lib/carbon_ruby_sdk/models/embeddings_and_chunks_response.rb) #### 🌐 Endpoint `/list_chunks_and_embeddings` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.embeddings.upload_chunks_and_embeddings` Upload Chunks And Embeddings #### πŸ› οΈ Usage ```ruby result = carbon.embeddings.upload_chunks_and_embeddings( embedding_model: "OPENAI", chunks_and_embeddings: [ { "file_id" => 1, "chunks_and_embeddings" => [ { "chunk_number" => 1, "chunk" => "chunk_example", } ], } ], overwrite_existing: false, chunks_only: false, custom_credentials: { "key": {}, }, ) p result ``` #### βš™οΈ Parameters ##### embedding_model: [`EmbeddingGenerators`](./lib/carbon_ruby_sdk/models/embedding_generators.rb) ##### chunks_and_embeddings: Array<[`SingleChunksAndEmbeddingsUploadInput`](./lib/carbon_ruby_sdk/models/single_chunks_and_embeddings_upload_input.rb)> ##### overwrite_existing: `Boolean` ##### chunks_only: `Boolean` ##### custom_credentials: `Hash` #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/upload_chunks_and_embeddings` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.files.create_user_file_tags` A tag is a key-value pair that can be added to a file. This pair can then be used for searches (e.g. embedding searches) in order to narrow down the scope of the search. A file can have any number of tags. The following are reserved keys that cannot be used: - db_embedding_id - organization_id - user_id - organization_user_file_id Carbon currently supports two data types for tag values - `string` and `list`. Keys can only be `string`. If values other than `string` and `list` are used, they're automatically converted to strings (e.g. 4 will become "4"). #### πŸ› οΈ Usage ```ruby result = carbon.files.create_user_file_tags( tags: { "key": "string_example", }, organization_user_file_id: 1, ) p result ``` #### βš™οΈ Parameters ##### tags: Hash ##### organization_user_file_id: `Integer` #### πŸ”„ Return [UserFile](./lib/carbon_ruby_sdk/models/user_file.rb) #### 🌐 Endpoint `/create_user_file_tags` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.files.delete` ![Deprecated](https://img.shields.io/badge/deprecated-yellow) Delete File Endpoint #### πŸ› οΈ Usage ```ruby result = carbon.files.delete( file_id: 1, ) p result ``` #### βš™οΈ Parameters ##### file_id: `Integer` #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/deletefile/{file_id}` `DELETE` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.files.delete_file_tags` Delete File Tags #### πŸ› οΈ Usage ```ruby result = carbon.files.delete_file_tags( tags: [ "string_example" ], organization_user_file_id: 1, ) p result ``` #### βš™οΈ Parameters ##### tags: Array<`String`> ##### organization_user_file_id: `Integer` #### πŸ”„ Return [UserFile](./lib/carbon_ruby_sdk/models/user_file.rb) #### 🌐 Endpoint `/delete_user_file_tags` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.files.delete_many` ![Deprecated](https://img.shields.io/badge/deprecated-yellow) Delete Files Endpoint #### πŸ› οΈ Usage ```ruby result = carbon.files.delete_many( file_ids: [ 1 ], sync_statuses: [ "string_example" ], delete_non_synced_only: false, send_webhook: false, delete_child_files: false, ) p result ``` #### βš™οΈ Parameters ##### file_ids: Array<`Integer`> ##### sync_statuses: Array<[`ExternalFileSyncStatuses`](./lib/carbon_ruby_sdk/models/external_file_sync_statuses.rb)> ##### delete_non_synced_only: `Boolean` ##### send_webhook: `Boolean` ##### delete_child_files: `Boolean` #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/delete_files` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.files.delete_v2` Delete Files V2 Endpoint #### πŸ› οΈ Usage ```ruby result = carbon.files.delete_v2( filters: { "include_all_children" => false, "non_synced_only" => false, }, send_webhook: false, preserve_file_record: false, ) p result ``` #### βš™οΈ Parameters ##### filters: [`OrganizationUserFilesToSyncFilters`](./lib/carbon_ruby_sdk/models/organization_user_files_to_sync_filters.rb) ##### send_webhook: `Boolean` ##### preserve_file_record: `Boolean` Whether or not to delete all data related to the file from the database, BUT to preserve the file metadata, allowing for resyncs. By default `preserve_file_record` is false, which means that all data related to the file *as well as* its metadata will be deleted. Note that even if `preserve_file_record` is true, raw files uploaded via the `uploadfile` endpoint still cannot be resynced. #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/delete_files_v2` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.files.get_parsed_file` ![Deprecated](https://img.shields.io/badge/deprecated-yellow) This route is deprecated. Use `/user_files_v2` instead. #### πŸ› οΈ Usage ```ruby result = carbon.files.get_parsed_file( file_id: 1, ) p result ``` #### βš™οΈ Parameters ##### file_id: `Integer` #### πŸ”„ Return [PresignedURLResponse](./lib/carbon_ruby_sdk/models/presigned_url_response.rb) #### 🌐 Endpoint `/parsed_file/{file_id}` `GET` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.files.get_raw_file` ![Deprecated](https://img.shields.io/badge/deprecated-yellow) This route is deprecated. Use `/user_files_v2` instead. #### πŸ› οΈ Usage ```ruby result = carbon.files.get_raw_file( file_id: 1, ) p result ``` #### βš™οΈ Parameters ##### file_id: `Integer` #### πŸ”„ Return [PresignedURLResponse](./lib/carbon_ruby_sdk/models/presigned_url_response.rb) #### 🌐 Endpoint `/raw_file/{file_id}` `GET` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.files.modify_cold_storage_parameters` Modify Cold Storage Parameters #### πŸ› οΈ Usage ```ruby result = carbon.files.modify_cold_storage_parameters( filters: { "include_all_children" => false, "non_synced_only" => false, }, enable_cold_storage: true, hot_storage_time_to_live: 1, ) p result ``` #### βš™οΈ Parameters ##### filters: [`OrganizationUserFilesToSyncFilters`](./lib/carbon_ruby_sdk/models/organization_user_files_to_sync_filters.rb) ##### enable_cold_storage: `Boolean` ##### hot_storage_time_to_live: `Integer` #### 🌐 Endpoint `/modify_cold_storage_parameters` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.files.move_to_hot_storage` Move To Hot Storage #### πŸ› οΈ Usage ```ruby result = carbon.files.move_to_hot_storage( filters: { "include_all_children" => false, "non_synced_only" => false, }, ) p result ``` #### βš™οΈ Parameters ##### filters: [`OrganizationUserFilesToSyncFilters`](./lib/carbon_ruby_sdk/models/organization_user_files_to_sync_filters.rb) #### 🌐 Endpoint `/move_to_hot_storage` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.files.query_user_files` For pre-filtering documents, using `tags_v2` is preferred to using `tags` (which is now deprecated). If both `tags_v2` and `tags` are specified, `tags` is ignored. `tags_v2` enables building complex filters through the use of "AND", "OR", and negation logic. Take the below input as an example: ```json { "OR": [ { "key": "subject", "value": "holy-bible", "negate": false }, { "key": "person-of-interest", "value": "jesus christ", "negate": false }, { "key": "genre", "value": "religion", "negate": true } { "AND": [ { "key": "subject", "value": "tao-te-ching", "negate": false }, { "key": "author", "value": "lao-tzu", "negate": false } ] } ] } ``` In this case, files will be filtered such that: 1. "subject" = "holy-bible" OR 2. "person-of-interest" = "jesus christ" OR 3. "genre" != "religion" OR 4. "subject" = "tao-te-ching" AND "author" = "lao-tzu" Note that the top level of the query must be either an "OR" or "AND" array. Currently, nesting is limited to 3. For tag blocks (those with "key", "value", and "negate" keys), the following typing rules apply: 1. "key" isn't optional and must be a `string` 2. "value" isn't optional and can be `any` or list[`any`] 3. "negate" is optional and must be `true` or `false`. If present and `true`, then the filter block is negated in the resulting query. It is `false` by default. #### πŸ› οΈ Usage ```ruby result = carbon.files.query_user_files( pagination: { "limit" => 10, "offset" => 0, }, order_by: "created_at", order_dir: "desc", filters: { "include_all_children" => false, "non_synced_only" => false, }, include_raw_file: true, include_parsed_text_file: true, include_additional_files: true, ) p result ``` #### βš™οΈ Parameters ##### pagination: [`Pagination`](./lib/carbon_ruby_sdk/models/pagination.rb) ##### order_by: [`OrganizationUserFilesToSyncOrderByTypes`](./lib/carbon_ruby_sdk/models/organization_user_files_to_sync_order_by_types.rb) ##### order_dir: [`OrderDir`](./lib/carbon_ruby_sdk/models/order_dir.rb) ##### filters: [`OrganizationUserFilesToSyncFilters`](./lib/carbon_ruby_sdk/models/organization_user_files_to_sync_filters.rb) ##### include_raw_file: `Boolean` ##### include_parsed_text_file: `Boolean` ##### include_additional_files: `Boolean` #### πŸ”„ Return [UserFilesV2](./lib/carbon_ruby_sdk/models/user_files_v2.rb) #### 🌐 Endpoint `/user_files_v2` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.files.query_user_files_deprecated` ![Deprecated](https://img.shields.io/badge/deprecated-yellow) This route is deprecated. Use `/user_files_v2` instead. #### πŸ› οΈ Usage ```ruby result = carbon.files.query_user_files_deprecated( pagination: { "limit" => 10, "offset" => 0, }, order_by: "created_at", order_dir: "desc", filters: { "include_all_children" => false, "non_synced_only" => false, }, include_raw_file: true, include_parsed_text_file: true, include_additional_files: true, ) p result ``` #### βš™οΈ Parameters ##### pagination: [`Pagination`](./lib/carbon_ruby_sdk/models/pagination.rb) ##### order_by: [`OrganizationUserFilesToSyncOrderByTypes`](./lib/carbon_ruby_sdk/models/organization_user_files_to_sync_order_by_types.rb) ##### order_dir: [`OrderDir`](./lib/carbon_ruby_sdk/models/order_dir.rb) ##### filters: [`OrganizationUserFilesToSyncFilters`](./lib/carbon_ruby_sdk/models/organization_user_files_to_sync_filters.rb) ##### include_raw_file: `Boolean` ##### include_parsed_text_file: `Boolean` ##### include_additional_files: `Boolean` #### πŸ”„ Return [UserFile](./lib/carbon_ruby_sdk/models/user_file.rb) #### 🌐 Endpoint `/user_files` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.files.resync` Resync File #### πŸ› οΈ Usage ```ruby result = carbon.files.resync( file_id: 1, chunk_size: 1, chunk_overlap: 1, force_embedding_generation: false, skip_file_processing: false, ) p result ``` #### βš™οΈ Parameters ##### file_id: `Integer` ##### chunk_size: `Integer` ##### chunk_overlap: `Integer` ##### force_embedding_generation: `Boolean` ##### skip_file_processing: `Boolean` #### πŸ”„ Return [UserFile](./lib/carbon_ruby_sdk/models/user_file.rb) #### 🌐 Endpoint `/resync_file` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.files.upload` This endpoint is used to directly upload local files to Carbon. The `POST` request should be a multipart form request. Note that the `set_page_as_boundary` query parameter is applicable only to PDFs for now. When this value is set, PDF chunks are at most one page long. Additional information can be retrieved for each chunk, however, namely the coordinates of the bounding box around the chunk (this can be used for things like text highlighting). Following is a description of all possible query parameters: - `chunk_size`: the chunk size (in tokens) applied when splitting the document - `chunk_overlap`: the chunk overlap (in tokens) applied when splitting the document - `skip_embedding_generation`: whether or not to skip the generation of chunks and embeddings - `set_page_as_boundary`: described above - `embedding_model`: the model used to generate embeddings for the document chunks - `use_ocr`: whether or not to use OCR as a preprocessing step prior to generating chunks. Valid for PDFs, JPEGs, and PNGs - `generate_sparse_vectors`: whether or not to generate sparse vectors for the file. Required for hybrid search. - `prepend_filename_to_chunks`: whether or not to prepend the filename to the chunk text Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI's multimodal model; for text, we support OpenAI's `text-embedding-ada-002` and Cohere's embed-multilingual-v3.0. The model can be specified via the `embedding_model` parameter (in the POST body for `/embeddings`, and a query parameter in `/uploadfile`). If no model is supplied, the `text-embedding-ada-002` is used by default. When performing embedding queries, embeddings from files that used the specified model will be considered in the query. For example, if files A and B have embeddings generated with `OPENAI`, and files C and D have embeddings generated with `COHERE_MULTILINGUAL_V3`, then by default, queries will only consider files A and B. If `COHERE_MULTILINGUAL_V3` is specified as the `embedding_model` in `/embeddings`, then only files C and D will be considered. Make sure that the set of all files you want considered for a query have embeddings generated via the same model. For now, **do not** set `VERTEX_MULTIMODAL` as an `embedding_model`. This model is used automatically by Carbon when it detects an image file. #### πŸ› οΈ Usage ```ruby result = carbon.files.upload( file: File.open("path/to/file", "rb"), chunk_size: 1, chunk_overlap: 1, skip_embedding_generation: false, set_page_as_boundary: false, embedding_model: "string_example", use_ocr: false, generate_sparse_vectors: false, prepend_filename_to_chunks: false, max_items_per_chunk: 1, parse_pdf_tables_with_ocr: false, detect_audio_language: false, transcription_service: "assemblyai", include_speaker_labels: false, media_type: "TEXT", split_rows: false, enable_cold_storage: false, hot_storage_time_to_live: 1, generate_chunks_only: false, ) p result ``` #### βš™οΈ Parameters ##### file: `File` ##### chunk_size: `Integer` Chunk size in tiktoken tokens to be used when processing file. ##### chunk_overlap: `Integer` Chunk overlap in tiktoken tokens to be used when processing file. ##### skip_embedding_generation: `Boolean` Flag to control whether or not embeddings should be generated and stored when processing file. ##### set_page_as_boundary: `Boolean` Flag to control whether or not to set the a page's worth of content as the maximum amount of content that can appear in a chunk. Only valid for PDFs. See description route description for more information. ##### embedding_model: [`EmbeddingModel`](./lib/carbon_ruby_sdk/models/embedding_model.rb) Embedding model that will be used to embed file chunks. ##### use_ocr: `Boolean` Whether or not to use OCR when processing files. Valid for PDFs, JPEGs, and PNGs. Useful for documents with tables, images, and/or scanned text. ##### generate_sparse_vectors: `Boolean` Whether or not to generate sparse vectors for the file. This is *required* for the file to be a candidate for hybrid search. ##### prepend_filename_to_chunks: `Boolean` Whether or not to prepend the file's name to chunks. ##### max_items_per_chunk: `Integer` Number of objects per chunk. For csv, tsv, xlsx, and json files only. ##### parse_pdf_tables_with_ocr: `Boolean` Whether to use rich table parsing when `use_ocr` is enabled. ##### detect_audio_language: `Boolean` Whether to automatically detect the language of the uploaded audio file. ##### transcription_service: [`TranscriptionServiceNullable`](./lib/carbon_ruby_sdk/models/transcription_service_nullable.rb) The transcription service to use for audio files. If no service is specified, 'deepgram' will be used. ##### include_speaker_labels: `Boolean` Detect multiple speakers and label segments of speech by speaker for audio files. ##### media_type: [`FileContentTypesNullable`](./lib/carbon_ruby_sdk/models/file_content_types_nullable.rb) The media type of the file. If not provided, it will be inferred from the file extension. ##### split_rows: `Boolean` Whether to split tabular rows into chunks. Currently only valid for CSV, TSV, and XLSX files. ##### enable_cold_storage: `Boolean` Enable cold storage for the file. If set to true, the file will be moved to cold storage after a certain period of inactivity. Default is false. ##### hot_storage_time_to_live: `Integer` Time in seconds after which the file will be moved to cold storage. ##### generate_chunks_only: `Boolean` If this flag is enabled, the file will be chunked and stored with Carbon, but no embeddings will be generated. This overrides the skip_embedding_generation flag. #### πŸ”„ Return [UserFile](./lib/carbon_ruby_sdk/models/user_file.rb) #### 🌐 Endpoint `/uploadfile` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.files.upload_from_url` Create Upload File From Url #### πŸ› οΈ Usage ```ruby result = carbon.files.upload_from_url( url: "string_example", file_name: "string_example", chunk_size: 1, chunk_overlap: 1, skip_embedding_generation: false, set_page_as_boundary: false, embedding_model: "OPENAI", generate_sparse_vectors: false, use_textract: false, prepend_filename_to_chunks: false, max_items_per_chunk: 1, parse_pdf_tables_with_ocr: false, detect_audio_language: false, transcription_service: "assemblyai", include_speaker_labels: false, media_type: "TEXT", split_rows: false, cold_storage_params: { "enable_cold_storage" => false, }, generate_chunks_only: false, ) p result ``` #### βš™οΈ Parameters ##### url: `String` ##### file_name: `String` ##### chunk_size: `Integer` ##### chunk_overlap: `Integer` ##### skip_embedding_generation: `Boolean` ##### set_page_as_boundary: `Boolean` ##### embedding_model: [`EmbeddingGenerators`](./lib/carbon_ruby_sdk/models/embedding_generators.rb) ##### generate_sparse_vectors: `Boolean` ##### use_textract: `Boolean` ##### prepend_filename_to_chunks: `Boolean` ##### max_items_per_chunk: `Integer` Number of objects per chunk. For csv, tsv, xlsx, and json files only. ##### parse_pdf_tables_with_ocr: `Boolean` ##### detect_audio_language: `Boolean` ##### transcription_service: [`TranscriptionServiceNullable`](./lib/carbon_ruby_sdk/models/transcription_service_nullable.rb) ##### include_speaker_labels: `Boolean` ##### media_type: [`FileContentTypesNullable`](./lib/carbon_ruby_sdk/models/file_content_types_nullable.rb) ##### split_rows: `Boolean` ##### cold_storage_params: [`ColdStorageProps`](./lib/carbon_ruby_sdk/models/cold_storage_props.rb) ##### generate_chunks_only: `Boolean` If this flag is enabled, the file will be chunked and stored with Carbon, but no embeddings will be generated. This overrides the skip_embedding_generation flag. #### πŸ”„ Return [UserFile](./lib/carbon_ruby_sdk/models/user_file.rb) #### 🌐 Endpoint `/upload_file_from_url` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.files.upload_text` Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI's multimodal model; for text, we support OpenAI's `text-embedding-ada-002` and Cohere's embed-multilingual-v3.0. The model can be specified via the `embedding_model` parameter (in the POST body for `/embeddings`, and a query parameter in `/uploadfile`). If no model is supplied, the `text-embedding-ada-002` is used by default. When performing embedding queries, embeddings from files that used the specified model will be considered in the query. For example, if files A and B have embeddings generated with `OPENAI`, and files C and D have embeddings generated with `COHERE_MULTILINGUAL_V3`, then by default, queries will only consider files A and B. If `COHERE_MULTILINGUAL_V3` is specified as the `embedding_model` in `/embeddings`, then only files C and D will be considered. Make sure that the set of all files you want considered for a query have embeddings generated via the same model. For now, **do not** set `VERTEX_MULTIMODAL` as an `embedding_model`. This model is used automatically by Carbon when it detects an image file. #### πŸ› οΈ Usage ```ruby result = carbon.files.upload_text( contents: "aaaaa", name: "string_example", chunk_size: 1, chunk_overlap: 1, skip_embedding_generation: false, overwrite_file_id: 1, embedding_model: "OPENAI", generate_sparse_vectors: false, cold_storage_params: { "enable_cold_storage" => false, }, generate_chunks_only: false, ) p result ``` #### βš™οΈ Parameters ##### contents: `String` ##### name: `String` ##### chunk_size: `Integer` ##### chunk_overlap: `Integer` ##### skip_embedding_generation: `Boolean` ##### overwrite_file_id: `Integer` ##### embedding_model: [`EmbeddingGeneratorsNullable`](./lib/carbon_ruby_sdk/models/embedding_generators_nullable.rb) ##### generate_sparse_vectors: `Boolean` ##### cold_storage_params: [`ColdStorageProps`](./lib/carbon_ruby_sdk/models/cold_storage_props.rb) ##### generate_chunks_only: `Boolean` If this flag is enabled, the file will be chunked and stored with Carbon, but no embeddings will be generated. This overrides the skip_embedding_generation flag. #### πŸ”„ Return [UserFile](./lib/carbon_ruby_sdk/models/user_file.rb) #### 🌐 Endpoint `/upload_text` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.cancel` Cancel Data Source Items Sync #### πŸ› οΈ Usage ```ruby result = carbon.integrations.cancel( data_source_id: 1, ) p result ``` #### βš™οΈ Parameters ##### data_source_id: `Integer` #### πŸ”„ Return [OrganizationUserDataSourceAPI](./lib/carbon_ruby_sdk/models/organization_user_data_source_api.rb) #### 🌐 Endpoint `/integrations/items/sync/cancel` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.connect_data_source` Connect Data Source #### πŸ› οΈ Usage ```ruby result = carbon.integrations.connect_data_source( authentication: { "source" => "GOOGLE_DRIVE", "access_token" => "access_token_example", }, sync_options: { "chunk_size" => 1500, "chunk_overlap" => 20, "skip_embedding_generation" => false, "embedding_model" => "OPENAI", "generate_sparse_vectors" => false, "prepend_filename_to_chunks" => false, "sync_files_on_connection" => true, "set_page_as_boundary" => false, "enable_file_picker" => true, "sync_source_items" => true, "incremental_sync" => false, }, ) p result ``` #### βš™οΈ Parameters ##### authentication: [`AuthenticationProperty`](./lib/carbon_ruby_sdk/models/authentication_property.rb) ##### sync_options: [`SyncOptions`](./lib/carbon_ruby_sdk/models/sync_options.rb) #### πŸ”„ Return [ConnectDataSourceResponse](./lib/carbon_ruby_sdk/models/connect_data_source_response.rb) #### 🌐 Endpoint `/integrations/connect` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.connect_freshdesk` Refer this article to obtain an API key https://support.freshdesk.com/en/support/solutions/articles/215517. Make sure that your API key has the permission to read solutions from your account and you are on a paid plan. Once you have an API key, you can make a request to this endpoint along with your freshdesk domain. This will trigger an automatic sync of the articles in your "solutions" tab. Additional parameters below can be used to associate data with the synced articles or modify the sync behavior. #### πŸ› οΈ Usage ```ruby result = carbon.integrations.connect_freshdesk( domain: "string_example", api_key: "string_example", tags: {}, chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: false, embedding_model: "OPENAI", generate_sparse_vectors: false, prepend_filename_to_chunks: false, sync_files_on_connection: true, request_id: "string_example", sync_source_items: true, file_sync_config: { "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => false, "detect_audio_language" => false, "transcription_service" => "assemblyai", "include_speaker_labels" => false, "split_rows" => false, "generate_chunks_only" => false, "skip_file_processing" => false, }, ) p result ``` #### βš™οΈ Parameters ##### domain: `String` ##### api_key: `String` ##### tags: `Object` ##### chunk_size: `Integer` ##### chunk_overlap: `Integer` ##### skip_embedding_generation: `Boolean` ##### embedding_model: [`EmbeddingGeneratorsNullable`](./lib/carbon_ruby_sdk/models/embedding_generators_nullable.rb) ##### generate_sparse_vectors: `Boolean` ##### prepend_filename_to_chunks: `Boolean` ##### sync_files_on_connection: `Boolean` ##### request_id: `String` ##### sync_source_items: `Boolean` Enabling this flag will fetch all available content from the source to be listed via list items endpoint ##### file_sync_config: [`FileSyncConfigNullable`](./lib/carbon_ruby_sdk/models/file_sync_config_nullable.rb) #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/integrations/freshdesk` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.connect_gitbook` You will need an access token to connect your Gitbook account. Note that the permissions will be defined by the user generating access token so make sure you have the permission to access spaces you will be syncing. Refer this article for more details https://developer.gitbook.com/gitbook-api/authentication. Additionally, you need to specify the name of organization you will be syncing data from. #### πŸ› οΈ Usage ```ruby result = carbon.integrations.connect_gitbook( organization: "string_example", access_token: "string_example", tags: {}, chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: false, embedding_model: "OPENAI", generate_sparse_vectors: false, prepend_filename_to_chunks: false, sync_files_on_connection: true, request_id: "string_example", sync_source_items: true, file_sync_config: { "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => false, "detect_audio_language" => false, "transcription_service" => "assemblyai", "include_speaker_labels" => false, "split_rows" => false, "generate_chunks_only" => false, "skip_file_processing" => false, }, ) p result ``` #### βš™οΈ Parameters ##### organization: `String` ##### access_token: `String` ##### tags: `Object` ##### chunk_size: `Integer` ##### chunk_overlap: `Integer` ##### skip_embedding_generation: `Boolean` ##### embedding_model: [`EmbeddingGenerators`](./lib/carbon_ruby_sdk/models/embedding_generators.rb) ##### generate_sparse_vectors: `Boolean` ##### prepend_filename_to_chunks: `Boolean` ##### sync_files_on_connection: `Boolean` ##### request_id: `String` ##### sync_source_items: `Boolean` Enabling this flag will fetch all available content from the source to be listed via list items endpoint ##### file_sync_config: [`FileSyncConfigNullable`](./lib/carbon_ruby_sdk/models/file_sync_config_nullable.rb) #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/integrations/gitbook` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.connect_guru` You will need an access token to connect your Guru account. To obtain an access token, follow the steps highlighted here https://help.getguru.com/docs/gurus-api#obtaining-a-user-token. The username should be your Guru username. #### πŸ› οΈ Usage ```ruby result = carbon.integrations.connect_guru( username: "string_example", access_token: "string_example", tags: {}, chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: false, embedding_model: "OPENAI", generate_sparse_vectors: false, prepend_filename_to_chunks: false, sync_files_on_connection: true, request_id: "string_example", sync_source_items: true, file_sync_config: { "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => false, "detect_audio_language" => false, "transcription_service" => "assemblyai", "include_speaker_labels" => false, "split_rows" => false, "generate_chunks_only" => false, "skip_file_processing" => false, }, ) p result ``` #### βš™οΈ Parameters ##### username: `String` ##### access_token: `String` ##### tags: `Object` ##### chunk_size: `Integer` ##### chunk_overlap: `Integer` ##### skip_embedding_generation: `Boolean` ##### embedding_model: [`EmbeddingGenerators`](./lib/carbon_ruby_sdk/models/embedding_generators.rb) ##### generate_sparse_vectors: `Boolean` ##### prepend_filename_to_chunks: `Boolean` ##### sync_files_on_connection: `Boolean` ##### request_id: `String` ##### sync_source_items: `Boolean` Enabling this flag will fetch all available content from the source to be listed via list items endpoint ##### file_sync_config: [`FileSyncConfigNullable`](./lib/carbon_ruby_sdk/models/file_sync_config_nullable.rb) #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/integrations/guru` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.create_aws_iam_user` This endpoint can be used to connect S3 as well as Digital Ocean Spaces (S3 compatible) For S3, create a new IAM user with permissions to:
  1. List all buckets.
  2. Read from the specific buckets and objects to sync with Carbon. Ensure any future buckets or objects carry the same permissions.
Once created, generate an access key for this user and share the credentials with us. We recommend testing this key beforehand. For Digital Ocean Spaces, generate the above credentials in your Applications and API page here https://cloud.digitalocean.com/account/api/spaces. Endpoint URL is required to connect Digital Ocean Spaces. It should look like <>.digitaloceanspaces.com #### πŸ› οΈ Usage ```ruby result = carbon.integrations.create_aws_iam_user( access_key: "string_example", access_key_secret: "string_example", sync_source_items: true, endpoint_url: "string_example", ) p result ``` #### βš™οΈ Parameters ##### access_key: `String` ##### access_key_secret: `String` ##### sync_source_items: `Boolean` Enabling this flag will fetch all available content from the source to be listed via list items endpoint ##### endpoint_url: `String` You can specify a Digital Ocean endpoint URL to connect a Digital Ocean Space through this endpoint. The URL should be of format .digitaloceanspaces.com. It's not required for S3 buckets. #### πŸ”„ Return [OrganizationUserDataSourceAPI](./lib/carbon_ruby_sdk/models/organization_user_data_source_api.rb) #### 🌐 Endpoint `/integrations/s3` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.get_oauth_url` This endpoint can be used to generate the following URLs - An OAuth URL for OAuth based connectors - A file syncing URL which skips the OAuth flow if the user already has a valid access token and takes them to the success state. #### πŸ› οΈ Usage ```ruby result = carbon.integrations.get_oauth_url( service: "BOX", tags: None, scope: "string_example", chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: false, embedding_model: "OPENAI", zendesk_subdomain: "string_example", microsoft_tenant: "string_example", sharepoint_site_name: "string_example", confluence_subdomain: "string_example", generate_sparse_vectors: false, prepend_filename_to_chunks: false, max_items_per_chunk: 1, salesforce_domain: "string_example", sync_files_on_connection: true, set_page_as_boundary: false, data_source_id: 1, connecting_new_account: false, request_id: "string_example", use_ocr: false, parse_pdf_tables_with_ocr: false, enable_file_picker: true, sync_source_items: true, incremental_sync: false, file_sync_config: { "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => false, "detect_audio_language" => false, "transcription_service" => "assemblyai", "include_speaker_labels" => false, "split_rows" => false, "generate_chunks_only" => false, "skip_file_processing" => false, }, automatically_open_file_picker: true, gong_account_email: "string_example", servicenow_credentials: { "instance_subdomain" => "instance_subdomain_example", "client_id" => "client_id_example", "client_secret" => "client_secret_example", "redirect_uri" => "redirect_uri_example", }, ) p result ``` #### βš™οΈ Parameters ##### service: [`OauthBasedConnectors`](./lib/carbon_ruby_sdk/models/oauth_based_connectors.rb) ##### tags: `Object` ##### scope: `String` ##### chunk_size: `Integer` ##### chunk_overlap: `Integer` ##### skip_embedding_generation: `Boolean` ##### embedding_model: [`EmbeddingGeneratorsNullable`](./lib/carbon_ruby_sdk/models/embedding_generators_nullable.rb) ##### zendesk_subdomain: `String` ##### microsoft_tenant: `String` ##### sharepoint_site_name: `String` ##### confluence_subdomain: `String` ##### generate_sparse_vectors: `Boolean` ##### prepend_filename_to_chunks: `Boolean` ##### max_items_per_chunk: `Integer` Number of objects per chunk. For csv, tsv, xlsx, and json files only. ##### salesforce_domain: `String` ##### sync_files_on_connection: `Boolean` Used to specify whether Carbon should attempt to sync all your files automatically when authorization is complete. This is only supported for a subset of connectors and will be ignored for the rest. Supported connectors: Intercom, Zendesk, Gitbook, Confluence, Salesforce, Freshdesk ##### set_page_as_boundary: `Boolean` ##### data_source_id: `Integer` Used to specify a data source to sync from if you have multiple connected. It can be skipped if you only have one data source of that type connected or are connecting a new account. ##### connecting_new_account: `Boolean` Used to connect a new data source. If not specified, we will attempt to create a sync URL for an existing data source based on type and ID. ##### request_id: `String` This request id will be added to all files that get synced using the generated OAuth URL ##### use_ocr: `Boolean` Enable OCR for files that support it. Supported formats: pdf, png, jpg ##### parse_pdf_tables_with_ocr: `Boolean` ##### enable_file_picker: `Boolean` Enable integration's file picker for sources that support it. Supported sources: BOX, DROPBOX, GOOGLE_DRIVE, ONEDRIVE, SHAREPOINT ##### sync_source_items: `Boolean` Enabling this flag will fetch all available content from the source to be listed via list items endpoint ##### incremental_sync: `Boolean` Only sync files if they have not already been synced or if the embedding properties have changed. This flag is currently supported by ONEDRIVE, GOOGLE_DRIVE, BOX, DROPBOX, INTERCOM, GMAIL, OUTLOOK, ZENDESK, CONFLUENCE, NOTION, SHAREPOINT, SERVICENOW. It will be ignored for other data sources. ##### file_sync_config: [`FileSyncConfigNullable`](./lib/carbon_ruby_sdk/models/file_sync_config_nullable.rb) ##### automatically_open_file_picker: `Boolean` Automatically open source file picker after the OAuth flow is complete. This flag is currently supported by BOX, DROPBOX, GOOGLE_DRIVE, ONEDRIVE, SHAREPOINT. It will be ignored for other data sources. ##### gong_account_email: `String` If you are connecting a Gong account, you need to input the email of the account you wish to connect. This email will be used to identify your carbon data source. ##### servicenow_credentials: [`ServiceNowCredentialsNullable`](./lib/carbon_ruby_sdk/models/service_now_credentials_nullable.rb) #### πŸ”„ Return [OuthURLResponse](./lib/carbon_ruby_sdk/models/outh_url_response.rb) #### 🌐 Endpoint `/integrations/oauth_url` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.list_confluence_pages` ![Deprecated](https://img.shields.io/badge/deprecated-yellow) This endpoint has been deprecated. Use /integrations/items/list instead. To begin listing a user's Confluence pages, at least a `data_source_id` of a connected Confluence account must be specified. This base request returns a list of root pages for every space the user has access to in a Confluence instance. To traverse further down the user's page directory, additional requests to this endpoint can be made with the same `data_source_id` and with `parent_id` set to the id of page from a previous request. For convenience, the `has_children` property in each directory item in the response list will flag which pages will return non-empty lists of pages when set as the `parent_id`. #### πŸ› οΈ Usage ```ruby result = carbon.integrations.list_confluence_pages( data_source_id: 1, parent_id: "string_example", ) p result ``` #### βš™οΈ Parameters ##### data_source_id: `Integer` ##### parent_id: `String` #### πŸ”„ Return [ListResponse](./lib/carbon_ruby_sdk/models/list_response.rb) #### 🌐 Endpoint `/integrations/confluence/list` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.list_conversations` List all of your public and private channels, DMs, and Group DMs. The ID from response can be used as a filter to sync messages to Carbon types: Comma separated list of types. Available types are im (DMs), mpim (group DMs), public_channel, and private_channel. Defaults to public_channel. cursor: Used for pagination. If next_cursor is returned in response, you need to pass it as the cursor in the next request data_source_id: Data source needs to be specified if you have linked multiple slack accounts exclude_archived: Should archived conversations be excluded, defaults to true #### πŸ› οΈ Usage ```ruby result = carbon.integrations.list_conversations( types: "public_channel", cursor: "string_example", data_source_id: 1, exclude_archived: true, ) p result ``` #### βš™οΈ Parameters ##### types: `String` ##### cursor: `String` ##### data_source_id: `Integer` ##### exclude_archived: `Boolean` #### 🌐 Endpoint `/integrations/slack/conversations` `GET` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.list_data_source_items` List Data Source Items #### πŸ› οΈ Usage ```ruby result = carbon.integrations.list_data_source_items( data_source_id: 1, parent_id: "string_example", filters: { }, pagination: { "limit" => 10, "offset" => 0, }, order_by: "name", order_dir: "asc", ) p result ``` #### βš™οΈ Parameters ##### data_source_id: `Integer` ##### parent_id: `String` ##### filters: [`ListItemsFiltersNullable`](./lib/carbon_ruby_sdk/models/list_items_filters_nullable.rb) ##### pagination: [`Pagination`](./lib/carbon_ruby_sdk/models/pagination.rb) ##### order_by: [`ExternalSourceItemsOrderBy`](./lib/carbon_ruby_sdk/models/external_source_items_order_by.rb) ##### order_dir: [`OrderDirV2`](./lib/carbon_ruby_sdk/models/order_dir_v2.rb) #### πŸ”„ Return [ListDataSourceItemsResponse](./lib/carbon_ruby_sdk/models/list_data_source_items_response.rb) #### 🌐 Endpoint `/integrations/items/list` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.list_folders` After connecting your Outlook account, you can use this endpoint to list all of your folders on outlook. This includes both system folders like "inbox" and user created folders. #### πŸ› οΈ Usage ```ruby result = carbon.integrations.list_folders( data_source_id: 1, ) p result ``` #### βš™οΈ Parameters ##### data_source_id: `Integer` #### 🌐 Endpoint `/integrations/outlook/user_folders` `GET` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.list_gitbook_spaces` After connecting your Gitbook account, you can use this endpoint to list all of your spaces under current organization. #### πŸ› οΈ Usage ```ruby result = carbon.integrations.list_gitbook_spaces( data_source_id: 1, ) p result ``` #### βš™οΈ Parameters ##### data_source_id: `Integer` #### 🌐 Endpoint `/integrations/gitbook/spaces` `GET` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.list_labels` After connecting your Gmail account, you can use this endpoint to list all of your labels. User created labels will have the type "user" and Gmail's default labels will have the type "system" #### πŸ› οΈ Usage ```ruby result = carbon.integrations.list_labels( data_source_id: 1, ) p result ``` #### βš™οΈ Parameters ##### data_source_id: `Integer` #### 🌐 Endpoint `/integrations/gmail/user_labels` `GET` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.list_outlook_categories` After connecting your Outlook account, you can use this endpoint to list all of your categories on outlook. We currently support listing up to 250 categories. #### πŸ› οΈ Usage ```ruby result = carbon.integrations.list_outlook_categories( data_source_id: 1, ) p result ``` #### βš™οΈ Parameters ##### data_source_id: `Integer` #### 🌐 Endpoint `/integrations/outlook/user_categories` `GET` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.list_repos` Once you have connected your GitHub account, you can use this endpoint to list the repositories your account has access to. You can use a data source ID or username to fetch from a specific account. #### πŸ› οΈ Usage ```ruby result = carbon.integrations.list_repos( per_page: 30, page: 1, data_source_id: 1, ) p result ``` #### βš™οΈ Parameters ##### per_page: `Integer` ##### page: `Integer` ##### data_source_id: `Integer` #### 🌐 Endpoint `/integrations/github/repos` `GET` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.sync_azure_blob_files` After optionally loading the items via /integrations/items/sync and integrations/items/list, use the container name and file name as the ID in this endpoint to sync them into Carbon. Additional parameters below can associate data with the selected items or modify the sync behavior #### πŸ› οΈ Usage ```ruby result = carbon.integrations.sync_azure_blob_files( ids: [ { } ], tags: {}, chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: false, embedding_model: "OPENAI", generate_sparse_vectors: false, prepend_filename_to_chunks: false, max_items_per_chunk: 1, set_page_as_boundary: false, data_source_id: 1, request_id: "string_example", use_ocr: false, parse_pdf_tables_with_ocr: false, file_sync_config: { "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => false, "detect_audio_language" => false, "transcription_service" => "assemblyai", "include_speaker_labels" => false, "split_rows" => false, "generate_chunks_only" => false, "skip_file_processing" => false, }, ) p result ``` #### βš™οΈ Parameters ##### ids: Array<[`AzureBlobGetFileInput`](./lib/carbon_ruby_sdk/models/azure_blob_get_file_input.rb)> ##### tags: `Object` ##### chunk_size: `Integer` ##### chunk_overlap: `Integer` ##### skip_embedding_generation: `Boolean` ##### embedding_model: [`EmbeddingGenerators`](./lib/carbon_ruby_sdk/models/embedding_generators.rb) ##### generate_sparse_vectors: `Boolean` ##### prepend_filename_to_chunks: `Boolean` ##### max_items_per_chunk: `Integer` Number of objects per chunk. For csv, tsv, xlsx, and json files only. ##### set_page_as_boundary: `Boolean` ##### data_source_id: `Integer` ##### request_id: `String` ##### use_ocr: `Boolean` ##### parse_pdf_tables_with_ocr: `Boolean` ##### file_sync_config: [`FileSyncConfigNullable`](./lib/carbon_ruby_sdk/models/file_sync_config_nullable.rb) #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/integrations/azure_blob_storage/files` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.sync_azure_blob_storage` This endpoint can be used to connect Azure Blob Storage. For Azure Blob Storage, follow these steps:
  1. Create a new Azure Storage account and grant the following permissions:
    • List containers.
    • Read from specific containers and blobs to sync with Carbon. Ensure any future containers or blobs carry the same permissions.
  2. Generate a shared access signature (SAS) token or an access key for the storage account.
Once created, provide us with the following details to generate the connection URL:
  1. Storage Account KeyName.
  2. Storage Account Name.
#### πŸ› οΈ Usage ```ruby result = carbon.integrations.sync_azure_blob_storage( account_name: "string_example", account_key: "string_example", sync_source_items: true, ) p result ``` #### βš™οΈ Parameters ##### account_name: `String` ##### account_key: `String` ##### sync_source_items: `Boolean` #### πŸ”„ Return [OrganizationUserDataSourceAPI](./lib/carbon_ruby_sdk/models/organization_user_data_source_api.rb) #### 🌐 Endpoint `/integrations/azure_blob_storage` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.sync_confluence` ![Deprecated](https://img.shields.io/badge/deprecated-yellow) This endpoint has been deprecated. Use /integrations/files/sync instead. After listing pages in a user's Confluence account, the set of selected page `ids` and the connected account's `data_source_id` can be passed into this endpoint to sync them into Carbon. Additional parameters listed below can be used to associate data to the selected pages or alter the behavior of the sync. #### πŸ› οΈ Usage ```ruby result = carbon.integrations.sync_confluence( data_source_id: 1, ids: [ "string_example" ], tags: {}, chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: false, embedding_model: "OPENAI", generate_sparse_vectors: false, prepend_filename_to_chunks: false, max_items_per_chunk: 1, set_page_as_boundary: false, request_id: "string_example", use_ocr: false, parse_pdf_tables_with_ocr: false, incremental_sync: false, file_sync_config: { "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => false, "detect_audio_language" => false, "transcription_service" => "assemblyai", "include_speaker_labels" => false, "split_rows" => false, "generate_chunks_only" => false, "skip_file_processing" => false, }, ) p result ``` #### βš™οΈ Parameters ##### data_source_id: `Integer` ##### ids: [`IdsProperty`](./lib/carbon_ruby_sdk/models/ids_property.rb) ##### tags: `Object` ##### chunk_size: `Integer` ##### chunk_overlap: `Integer` ##### skip_embedding_generation: `Boolean` ##### embedding_model: [`EmbeddingGeneratorsNullable`](./lib/carbon_ruby_sdk/models/embedding_generators_nullable.rb) ##### generate_sparse_vectors: `Boolean` ##### prepend_filename_to_chunks: `Boolean` ##### max_items_per_chunk: `Integer` Number of objects per chunk. For csv, tsv, xlsx, and json files only. ##### set_page_as_boundary: `Boolean` ##### request_id: `String` ##### use_ocr: `Boolean` ##### parse_pdf_tables_with_ocr: `Boolean` ##### incremental_sync: `Boolean` Only sync files if they have not already been synced or if the embedding properties have changed. This flag is currently supported by ONEDRIVE, GOOGLE_DRIVE, BOX, DROPBOX, INTERCOM, GMAIL, OUTLOOK, ZENDESK, CONFLUENCE, NOTION, SHAREPOINT, SERVICENOW. It will be ignored for other data sources. ##### file_sync_config: [`FileSyncConfigNullable`](./lib/carbon_ruby_sdk/models/file_sync_config_nullable.rb) #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/integrations/confluence/sync` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.sync_data_source_items` Sync Data Source Items #### πŸ› οΈ Usage ```ruby result = carbon.integrations.sync_data_source_items( data_source_id: 1, ) p result ``` #### βš™οΈ Parameters ##### data_source_id: `Integer` #### πŸ”„ Return [OrganizationUserDataSourceAPI](./lib/carbon_ruby_sdk/models/organization_user_data_source_api.rb) #### 🌐 Endpoint `/integrations/items/sync` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.sync_files` After listing files and folders via /integrations/items/sync and integrations/items/list, use the selected items' external ids as the ids in this endpoint to sync them into Carbon. Sharepoint items take an additional parameter root_id, which identifies the drive the file or folder is in and is stored in root_external_id. That additional paramter is optional and excluding it will tell the sync to assume the item is stored in the default Documents drive. #### πŸ› οΈ Usage ```ruby result = carbon.integrations.sync_files( data_source_id: 1, ids: [ "string_example" ], tags: {}, chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: false, embedding_model: "OPENAI", generate_sparse_vectors: false, prepend_filename_to_chunks: false, max_items_per_chunk: 1, set_page_as_boundary: false, request_id: "string_example", use_ocr: false, parse_pdf_tables_with_ocr: false, incremental_sync: false, file_sync_config: { "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => false, "detect_audio_language" => false, "transcription_service" => "assemblyai", "include_speaker_labels" => false, "split_rows" => false, "generate_chunks_only" => false, "skip_file_processing" => false, }, ) p result ``` #### βš™οΈ Parameters ##### data_source_id: `Integer` ##### ids: [`IdsProperty`](./lib/carbon_ruby_sdk/models/ids_property.rb) ##### tags: `Object` ##### chunk_size: `Integer` ##### chunk_overlap: `Integer` ##### skip_embedding_generation: `Boolean` ##### embedding_model: [`EmbeddingGeneratorsNullable`](./lib/carbon_ruby_sdk/models/embedding_generators_nullable.rb) ##### generate_sparse_vectors: `Boolean` ##### prepend_filename_to_chunks: `Boolean` ##### max_items_per_chunk: `Integer` Number of objects per chunk. For csv, tsv, xlsx, and json files only. ##### set_page_as_boundary: `Boolean` ##### request_id: `String` ##### use_ocr: `Boolean` ##### parse_pdf_tables_with_ocr: `Boolean` ##### incremental_sync: `Boolean` Only sync files if they have not already been synced or if the embedding properties have changed. This flag is currently supported by ONEDRIVE, GOOGLE_DRIVE, BOX, DROPBOX, INTERCOM, GMAIL, OUTLOOK, ZENDESK, CONFLUENCE, NOTION, SHAREPOINT, SERVICENOW. It will be ignored for other data sources. ##### file_sync_config: [`FileSyncConfigNullable`](./lib/carbon_ruby_sdk/models/file_sync_config_nullable.rb) #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/integrations/files/sync` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.sync_git_hub` Refer this article to obtain an access token https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens. Make sure that your access token has the permission to read content from your desired repos. Note that if your access token expires you will need to manually update it through this endpoint. #### πŸ› οΈ Usage ```ruby result = carbon.integrations.sync_git_hub( username: "string_example", access_token: "string_example", sync_source_items: false, ) p result ``` #### βš™οΈ Parameters ##### username: `String` ##### access_token: `String` ##### sync_source_items: `Boolean` Enabling this flag will fetch all available content from the source to be listed via list items endpoint #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/integrations/github` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.sync_gitbook` You can sync upto 20 Gitbook spaces at a time using this endpoint. Additional parameters below can be used to associate data with the synced pages or modify the sync behavior. #### πŸ› οΈ Usage ```ruby result = carbon.integrations.sync_gitbook( space_ids: [ "string_example" ], data_source_id: 1, tags: {}, chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: false, embedding_model: "OPENAI", generate_sparse_vectors: false, prepend_filename_to_chunks: false, request_id: "string_example", file_sync_config: { "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => false, "detect_audio_language" => false, "transcription_service" => "assemblyai", "include_speaker_labels" => false, "split_rows" => false, "generate_chunks_only" => false, "skip_file_processing" => false, }, ) p result ``` #### βš™οΈ Parameters ##### space_ids: Array<`String`> ##### data_source_id: `Integer` ##### tags: `Object` ##### chunk_size: `Integer` ##### chunk_overlap: `Integer` ##### skip_embedding_generation: `Boolean` ##### embedding_model: [`EmbeddingGenerators`](./lib/carbon_ruby_sdk/models/embedding_generators.rb) ##### generate_sparse_vectors: `Boolean` ##### prepend_filename_to_chunks: `Boolean` ##### request_id: `String` ##### file_sync_config: [`FileSyncConfigNullable`](./lib/carbon_ruby_sdk/models/file_sync_config_nullable.rb) #### 🌐 Endpoint `/integrations/gitbook/sync` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.sync_gmail` Once you have successfully connected your gmail account, you can choose which emails to sync with us using the filters parameter. Filters is a JSON object with key value pairs. It also supports AND and OR operations. For now, we support a limited set of keys listed below. label: Inbuilt Gmail labels, for example "Important" or a custom label you created. after or before: A date in YYYY/mm/dd format (example 2023/12/31). Gets emails after/before a certain date. You can also use them in combination to get emails from a certain period. is: Can have the following values - starred, important, snoozed, and unread from: Email address of the sender to: Email address of the recipient in: Can have the following values - sent (sync emails sent by the user) has: Can have the following values - attachment (sync emails that have attachments) Using keys or values outside of the specified values can lead to unexpected behaviour. An example of a basic query with filters can be ```json { "filters": { "key": "label", "value": "Test" } } ``` Which will list all emails that have the label "Test". You can use AND and OR operation in the following way: ```json { "filters": { "AND": [ { "key": "after", "value": "2024/01/07" }, { "OR": [ { "key": "label", "value": "Personal" }, { "key": "is", "value": "starred" } ] } ] } } ``` This will return emails after 7th of Jan that are either starred or have the label "Personal". Note that this is the highest level of nesting we support, i.e. you can't add more AND/OR filters within the OR filter in the above example. #### πŸ› οΈ Usage ```ruby result = carbon.integrations.sync_gmail( filters: {}, tags: {}, chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: false, embedding_model: "OPENAI", generate_sparse_vectors: false, prepend_filename_to_chunks: false, data_source_id: 1, request_id: "string_example", sync_attachments: false, file_sync_config: { "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => false, "detect_audio_language" => false, "transcription_service" => "assemblyai", "include_speaker_labels" => false, "split_rows" => false, "generate_chunks_only" => false, "skip_file_processing" => false, }, incremental_sync: false, ) p result ``` #### βš™οΈ Parameters ##### filters: `Object` ##### tags: `Object` ##### chunk_size: `Integer` ##### chunk_overlap: `Integer` ##### skip_embedding_generation: `Boolean` ##### embedding_model: [`EmbeddingGenerators`](./lib/carbon_ruby_sdk/models/embedding_generators.rb) ##### generate_sparse_vectors: `Boolean` ##### prepend_filename_to_chunks: `Boolean` ##### data_source_id: `Integer` ##### request_id: `String` ##### sync_attachments: `Boolean` ##### file_sync_config: [`FileSyncConfigNullable`](./lib/carbon_ruby_sdk/models/file_sync_config_nullable.rb) ##### incremental_sync: `Boolean` #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/integrations/gmail/sync` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.sync_outlook` Once you have successfully connected your Outlook account, you can choose which emails to sync with us using the filters and folder parameter. "folder" should be the folder you want to sync from Outlook. By default we get messages from your inbox folder. Filters is a JSON object with key value pairs. It also supports AND and OR operations. For now, we support a limited set of keys listed below. category: Custom categories that you created in Outlook. after or before: A date in YYYY/mm/dd format (example 2023/12/31). Gets emails after/before a certain date. You can also use them in combination to get emails from a certain period. is: Can have the following values: flagged from: Email address of the sender An example of a basic query with filters can be ```json { "filters": { "key": "category", "value": "Test" } } ``` Which will list all emails that have the category "Test". Specifying a custom folder in the same query ```json { "folder": "Folder Name", "filters": { "key": "category", "value": "Test" } } ``` You can use AND and OR operation in the following way: ```json { "filters": { "AND": [ { "key": "after", "value": "2024/01/07" }, { "OR": [ { "key": "category", "value": "Personal" }, { "key": "category", "value": "Test" }, ] } ] } } ``` This will return emails after 7th of Jan that have either Personal or Test as category. Note that this is the highest level of nesting we support, i.e. you can't add more AND/OR filters within the OR filter in the above example. #### πŸ› οΈ Usage ```ruby result = carbon.integrations.sync_outlook( filters: {}, tags: {}, folder: "Inbox", chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: false, embedding_model: "OPENAI", generate_sparse_vectors: false, prepend_filename_to_chunks: false, data_source_id: 1, request_id: "string_example", sync_attachments: false, file_sync_config: { "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => false, "detect_audio_language" => false, "transcription_service" => "assemblyai", "include_speaker_labels" => false, "split_rows" => false, "generate_chunks_only" => false, "skip_file_processing" => false, }, incremental_sync: false, ) p result ``` #### βš™οΈ Parameters ##### filters: `Object` ##### tags: `Object` ##### folder: `String` ##### chunk_size: `Integer` ##### chunk_overlap: `Integer` ##### skip_embedding_generation: `Boolean` ##### embedding_model: [`EmbeddingGenerators`](./lib/carbon_ruby_sdk/models/embedding_generators.rb) ##### generate_sparse_vectors: `Boolean` ##### prepend_filename_to_chunks: `Boolean` ##### data_source_id: `Integer` ##### request_id: `String` ##### sync_attachments: `Boolean` ##### file_sync_config: [`FileSyncConfigNullable`](./lib/carbon_ruby_sdk/models/file_sync_config_nullable.rb) ##### incremental_sync: `Boolean` #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/integrations/outlook/sync` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.sync_repos` You can retreive repos your token has access to using /integrations/github/repos and sync their content. You can also pass full name of any public repository (username/repo-name). This will store the repo content with carbon which can be accessed through /integrations/items/list endpoint. Maximum of 25 repositories are accepted per request. #### πŸ› οΈ Usage ```ruby result = carbon.integrations.sync_repos( repos: [ "string_example" ], data_source_id: 1, ) p result ``` #### βš™οΈ Parameters ##### repos: Array<`String`> ##### data_source_id: `Integer` #### 🌐 Endpoint `/integrations/github/sync_repos` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.sync_rss_feed` Rss Feed #### πŸ› οΈ Usage ```ruby result = carbon.integrations.sync_rss_feed( url: "string_example", tags: {}, chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: false, embedding_model: "OPENAI", generate_sparse_vectors: false, prepend_filename_to_chunks: false, request_id: "string_example", ) p result ``` #### βš™οΈ Parameters ##### url: `String` ##### tags: `Object` ##### chunk_size: `Integer` ##### chunk_overlap: `Integer` ##### skip_embedding_generation: `Boolean` ##### embedding_model: [`EmbeddingGenerators`](./lib/carbon_ruby_sdk/models/embedding_generators.rb) ##### generate_sparse_vectors: `Boolean` ##### prepend_filename_to_chunks: `Boolean` ##### request_id: `String` #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/integrations/rss_feed` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.sync_s3_files` After optionally loading the items via /integrations/items/sync and integrations/items/list, use the bucket name and object key as the ID in this endpoint to sync them into Carbon. Additional parameters below can associate data with the selected items or modify the sync behavior #### πŸ› οΈ Usage ```ruby result = carbon.integrations.sync_s3_files( ids: [ { } ], tags: {}, chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: false, embedding_model: "OPENAI", generate_sparse_vectors: false, prepend_filename_to_chunks: false, max_items_per_chunk: 1, set_page_as_boundary: false, data_source_id: 1, request_id: "string_example", use_ocr: false, parse_pdf_tables_with_ocr: false, file_sync_config: { "auto_synced_source_types" => ["ARTICLE"], "sync_attachments" => false, "detect_audio_language" => false, "transcription_service" => "assemblyai", "include_speaker_labels" => false, "split_rows" => false, "generate_chunks_only" => false, "skip_file_processing" => false, }, ) p result ``` #### βš™οΈ Parameters ##### ids: Array<[`S3GetFileInput`](./lib/carbon_ruby_sdk/models/s3_get_file_input.rb)> ##### tags: `Object` ##### chunk_size: `Integer` ##### chunk_overlap: `Integer` ##### skip_embedding_generation: `Boolean` ##### embedding_model: [`EmbeddingGenerators`](./lib/carbon_ruby_sdk/models/embedding_generators.rb) ##### generate_sparse_vectors: `Boolean` ##### prepend_filename_to_chunks: `Boolean` ##### max_items_per_chunk: `Integer` Number of objects per chunk. For csv, tsv, xlsx, and json files only. ##### set_page_as_boundary: `Boolean` ##### data_source_id: `Integer` ##### request_id: `String` ##### use_ocr: `Boolean` ##### parse_pdf_tables_with_ocr: `Boolean` ##### file_sync_config: [`FileSyncConfigNullable`](./lib/carbon_ruby_sdk/models/file_sync_config_nullable.rb) #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/integrations/s3/files` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.integrations.sync_slack` You can list all conversations using the endpoint /integrations/slack/conversations. The ID of conversation will be used as an input for this endpoint with timestamps as optional filters. #### πŸ› οΈ Usage ```ruby result = carbon.integrations.sync_slack( filters: { "conversation_id" => "conversation_id_example", }, tags: {}, chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: false, embedding_model: "OPENAI", generate_sparse_vectors: false, prepend_filename_to_chunks: false, data_source_id: 1, request_id: "string_example", ) p result ``` #### βš™οΈ Parameters ##### filters: [`SlackFilters`](./lib/carbon_ruby_sdk/models/slack_filters.rb) ##### tags: `Object` ##### chunk_size: `Integer` ##### chunk_overlap: `Integer` ##### skip_embedding_generation: `Boolean` ##### embedding_model: [`EmbeddingGenerators`](./lib/carbon_ruby_sdk/models/embedding_generators.rb) ##### generate_sparse_vectors: `Boolean` ##### prepend_filename_to_chunks: `Boolean` ##### data_source_id: `Integer` ##### request_id: `String` #### 🌐 Endpoint `/integrations/slack/sync` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.organizations.get` Get Organization #### πŸ› οΈ Usage ```ruby result = carbon.organizations.get p result ``` #### πŸ”„ Return [OrganizationResponse](./lib/carbon_ruby_sdk/models/organization_response.rb) #### 🌐 Endpoint `/organization` `GET` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.organizations.update` Update Organization #### πŸ› οΈ Usage ```ruby result = carbon.organizations.update( global_user_config: { }, data_source_configs: { "key": { "allowed_file_formats" => [], }, }, ) p result ``` #### βš™οΈ Parameters ##### global_user_config: [`UserConfigurationNullable`](./lib/carbon_ruby_sdk/models/user_configuration_nullable.rb) ##### data_source_configs: Hash Used to set organization level defaults for configuration related to data sources. #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/organization/update` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.organizations.update_stats` Use this endpoint to reaggregate the statistics for an organization, for example aggregate_file_size. The reaggregation process is asyncronous so a webhook will be sent with the event type being FILE_STATISTICS_AGGREGATED to notify when the process is complee. After this aggregation is complete, the updated statistics can be retrieved using the /organization endpoint. The response of /organization willalso contain a timestamp of the last time the statistics were reaggregated. #### πŸ› οΈ Usage ```ruby result = carbon.organizations.update_stats p result ``` #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/organization/statistics` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.users.delete` Delete Users #### πŸ› οΈ Usage ```ruby result = carbon.users.delete( customer_ids: [ "string_example" ], ) p result ``` #### βš™οΈ Parameters ##### customer_ids: Array<`String`> #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/delete_users` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.users.get` User Endpoint #### πŸ› οΈ Usage ```ruby result = carbon.users.get( customer_id: "string_example", ) p result ``` #### βš™οΈ Parameters ##### customer_id: `String` #### πŸ”„ Return [UserResponse](./lib/carbon_ruby_sdk/models/user_response.rb) #### 🌐 Endpoint `/user` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.users.list` List users within an organization #### πŸ› οΈ Usage ```ruby result = carbon.users.list( pagination: { "limit" => 10, "offset" => 0, }, filters: { }, order_by: "created_at", order_dir: "asc", include_count: false, ) p result ``` #### βš™οΈ Parameters ##### pagination: [`Pagination`](./lib/carbon_ruby_sdk/models/pagination.rb) ##### filters: [`ListUsersFilters`](./lib/carbon_ruby_sdk/models/list_users_filters.rb) ##### order_by: [`ListUsersOrderByTypes`](./lib/carbon_ruby_sdk/models/list_users_order_by_types.rb) ##### order_dir: [`OrderDirV2`](./lib/carbon_ruby_sdk/models/order_dir_v2.rb) ##### include_count: `Boolean` #### πŸ”„ Return [UserListResponse](./lib/carbon_ruby_sdk/models/user_list_response.rb) #### 🌐 Endpoint `/list_users` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.users.toggle_user_features` ![Deprecated](https://img.shields.io/badge/deprecated-yellow) Toggle User Features #### πŸ› οΈ Usage ```ruby result = carbon.users.toggle_user_features( configuration_key_name: "string_example", value: {}, ) p result ``` #### βš™οΈ Parameters ##### configuration_key_name: `String` ##### value: `Object` #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/modify_user_configuration` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.users.update_users` Update Users #### πŸ› οΈ Usage ```ruby result = carbon.users.update_users( customer_ids: [ "string_example" ], auto_sync_enabled_sources: [ "string_example" ], max_files: -1, max_files_per_upload: -1, ) p result ``` #### βš™οΈ Parameters ##### customer_ids: Array<`String`> List of organization supplied user IDs ##### auto_sync_enabled_sources: [`AutoSyncEnabledSourcesProperty`](./lib/carbon_ruby_sdk/models/auto_sync_enabled_sources_property.rb) ##### max_files: `Integer` Custom file upload limit for the user over *all* user's files across all uploads. If set, then the user will not be allowed to upload more files than this limit. If not set, or if set to -1, then the user will have no limit. ##### max_files_per_upload: `Integer` Custom file upload limit for the user across a single upload. If set, then the user will not be allowed to upload more files than this limit in a single upload. If not set, or if set to -1, then the user will have no limit. #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/update_users` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.utilities.fetch_urls` ![Deprecated](https://img.shields.io/badge/deprecated-yellow) Extracts all URLs from a webpage. Args: url (str): URL of the webpage Returns: FetchURLsResponse: A response object with a list of URLs extracted from the webpage and the webpage content. #### πŸ› οΈ Usage ```ruby result = carbon.utilities.fetch_urls( url: "url_example", ) p result ``` #### βš™οΈ Parameters ##### url: `String` #### πŸ”„ Return [FetchURLsResponse](./lib/carbon_ruby_sdk/models/fetch_urls_response.rb) #### 🌐 Endpoint `/fetch_urls` `GET` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.utilities.fetch_webpage` Fetch Urls V2 #### πŸ› οΈ Usage ```ruby result = carbon.utilities.fetch_webpage( url: "string_example", ) p result ``` #### βš™οΈ Parameters ##### url: `String` #### 🌐 Endpoint `/fetch_webpage` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.utilities.fetch_youtube_transcripts` Fetches english transcripts from YouTube videos. Args: id (str): The ID of the YouTube video. raw (bool): Whether to return the raw transcript or not. Defaults to False. Returns: dict: A dictionary with the transcript of the YouTube video. #### πŸ› οΈ Usage ```ruby result = carbon.utilities.fetch_youtube_transcripts( id: "id_example", raw: false, ) p result ``` #### βš™οΈ Parameters ##### id: `String` ##### raw: `Boolean` #### πŸ”„ Return [YoutubeTranscriptResponse](./lib/carbon_ruby_sdk/models/youtube_transcript_response.rb) #### 🌐 Endpoint `/fetch_youtube_transcript` `GET` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.utilities.process_sitemap` Retrieves all URLs from a sitemap, which can subsequently be utilized with our `web_scrape` endpoint. #### πŸ› οΈ Usage ```ruby result = carbon.utilities.process_sitemap( url: "url_example", ) p result ``` #### βš™οΈ Parameters ##### url: `String` #### 🌐 Endpoint `/process_sitemap` `GET` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.utilities.scrape_sitemap` Extracts all URLs from a sitemap and performs a web scrape on each of them. Args: sitemap_url (str): URL of the sitemap Returns: dict: A response object with the status of the scraping job message.--> #### πŸ› οΈ Usage ```ruby result = carbon.utilities.scrape_sitemap( url: "string_example", tags: { "key": "string_example", }, max_pages_to_scrape: 1, chunk_size: 1500, chunk_overlap: 20, skip_embedding_generation: false, enable_auto_sync: false, generate_sparse_vectors: false, prepend_filename_to_chunks: false, html_tags_to_skip: [], css_classes_to_skip: [], css_selectors_to_skip: [], embedding_model: "OPENAI", url_paths_to_include: [], url_paths_to_exclude: [], urls_to_scrape: [], download_css_and_media: false, generate_chunks_only: false, ) p result ``` #### βš™οΈ Parameters ##### url: `String` ##### tags: Hash ##### max_pages_to_scrape: `Integer` ##### chunk_size: `Integer` ##### chunk_overlap: `Integer` ##### skip_embedding_generation: `Boolean` ##### enable_auto_sync: `Boolean` ##### generate_sparse_vectors: `Boolean` ##### prepend_filename_to_chunks: `Boolean` ##### html_tags_to_skip: Array<`String`> ##### css_classes_to_skip: Array<`String`> ##### css_selectors_to_skip: Array<`String`> ##### embedding_model: [`EmbeddingGenerators`](./lib/carbon_ruby_sdk/models/embedding_generators.rb) ##### url_paths_to_include: Array<`String`> URL subpaths or directories that you want to include. For example if you want to only include URLs that start with /questions in stackoverflow.com, you will add /questions/ in this input ##### url_paths_to_exclude: Array<`String`> URL subpaths or directories that you want to exclude. For example if you want to exclude URLs that start with /questions in stackoverflow.com, you will add /questions/ in this input ##### urls_to_scrape: Array<`String`> You can submit a subset of URLs from the sitemap that should be scraped. To get the list of URLs, you can check out /process_sitemap endpoint. If left empty, all URLs from the sitemap will be scraped. ##### download_css_and_media: `Boolean` Whether the scraper should download css and media from the page (images, fonts, etc). Scrapes might take longer to finish with this flag enabled, but the success rate is improved. ##### generate_chunks_only: `Boolean` If this flag is enabled, the file will be chunked and stored with Carbon, but no embeddings will be generated. This overrides the skip_embedding_generation flag. #### 🌐 Endpoint `/scrape_sitemap` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.utilities.scrape_web` Conduct a web scrape on a given webpage URL. Our web scraper is fully compatible with JavaScript and supports recursion depth, enabling you to efficiently extract all content from the target website. #### πŸ› οΈ Usage ```ruby result = carbon.utilities.scrape_web( body: [ { "url" => "url_example", "recursion_depth" => 3, "max_pages_to_scrape" => 100, "chunk_size" => 1500, "chunk_overlap" => 20, "skip_embedding_generation" => false, "enable_auto_sync" => false, "generate_sparse_vectors" => false, "prepend_filename_to_chunks" => false, "html_tags_to_skip" => [], "css_classes_to_skip" => [], "css_selectors_to_skip" => [], "embedding_model" => "OPENAI", "url_paths_to_include" => [], "download_css_and_media" => false, "generate_chunks_only" => false, } ], ) p result ``` #### βš™οΈ body Array<[`WebscrapeRequest`](./lib/carbon_ruby_sdk/models/webscrape_request.rb)> #### 🌐 Endpoint `/web_scrape` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.utilities.search_urls` Perform a web search and obtain a list of relevant URLs. As an illustration, when you perform a search for β€œcontent related to MRNA,” you will receive a list of links such as the following: - https://tomrenz.substack.com/p/mrna-and-why-it-matters - https://www.statnews.com/2020/11/10/the-story-of-mrna-how-a-once-dismissed-idea-became-a-leading-technology-in-the-covid-vaccine-race/ - https://www.statnews.com/2022/11/16/covid-19-vaccines-were-a-success-but-mrna-still-has-a-delivery-problem/ - https://joomi.substack.com/p/were-still-being-misled-about-how Subsequently, you can submit these links to the web_scrape endpoint in order to retrieve the content of the respective web pages. Args: query (str): Query to search for Returns: FetchURLsResponse: A response object with a list of URLs for a given search query. #### πŸ› οΈ Usage ```ruby result = carbon.utilities.search_urls( query: "query_example", ) p result ``` #### βš™οΈ Parameters ##### query: `String` #### πŸ”„ Return [FetchURLsResponse](./lib/carbon_ruby_sdk/models/fetch_urls_response.rb) #### 🌐 Endpoint `/search_urls` `GET` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.utilities.user_webpages` User Web Pages #### πŸ› οΈ Usage ```ruby result = carbon.utilities.user_webpages( filters: { }, pagination: { "limit" => 10, "offset" => 0, }, order_by: "created_at", order_dir: "asc", ) p result ``` #### βš™οΈ Parameters ##### filters: [`UserWebPagesFilters`](./lib/carbon_ruby_sdk/models/user_web_pages_filters.rb) ##### pagination: [`Pagination`](./lib/carbon_ruby_sdk/models/pagination.rb) ##### order_by: [`UserWebPageOrderByTypes`](./lib/carbon_ruby_sdk/models/user_web_page_order_by_types.rb) ##### order_dir: [`OrderDirV2`](./lib/carbon_ruby_sdk/models/order_dir_v2.rb) #### 🌐 Endpoint `/user_webpages` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.webhooks.add_url` Add Webhook Url #### πŸ› οΈ Usage ```ruby result = carbon.webhooks.add_url( url: "string_example", ) p result ``` #### βš™οΈ Parameters ##### url: `String` #### πŸ”„ Return [Webhook](./lib/carbon_ruby_sdk/models/webhook.rb) #### 🌐 Endpoint `/add_webhook` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.webhooks.delete_url` Delete Webhook Url #### πŸ› οΈ Usage ```ruby result = carbon.webhooks.delete_url( webhook_id: 1, ) p result ``` #### βš™οΈ Parameters ##### webhook_id: `Integer` #### πŸ”„ Return [GenericSuccessResponse](./lib/carbon_ruby_sdk/models/generic_success_response.rb) #### 🌐 Endpoint `/delete_webhook/{webhook_id}` `DELETE` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ### `carbon.webhooks.urls` Webhook Urls #### πŸ› οΈ Usage ```ruby result = carbon.webhooks.urls( pagination: { "limit" => 10, "offset" => 0, }, order_by: "created_at", order_dir: "desc", filters: { "ids" => [], }, ) p result ``` #### βš™οΈ Parameters ##### pagination: [`Pagination`](./lib/carbon_ruby_sdk/models/pagination.rb) ##### order_by: [`WebhookOrderByColumns`](./lib/carbon_ruby_sdk/models/webhook_order_by_columns.rb) ##### order_dir: [`OrderDir`](./lib/carbon_ruby_sdk/models/order_dir.rb) ##### filters: [`WebhookFilters`](./lib/carbon_ruby_sdk/models/webhook_filters.rb) #### πŸ”„ Return [WebhookQueryResponse](./lib/carbon_ruby_sdk/models/webhook_query_response.rb) #### 🌐 Endpoint `/webhooks` `POST` [πŸ”™ **Back to Table of Contents**](#table-of-contents) --- ## Author This TypeScript package is automatically generated by [Konfig](https://konfigthis.com)