Resources

Everything you need to get started and build with Arkus — documentation, templates, and insights to support your journey.

Explore topics

< Back to all posts

DataFrame Operations

DataFrame Operations = default

What and how it can be used:

The DataFrame Operations component performs various operations on a DataFrame. It provides powerful data manipulation capabilities similar to pandas DataFrames in Python or data frames in R, enabling operations like filtering, sorting, grouping, aggregating, joining, pivoting, and statistical analysis on tabular data structures.

When/how the component should be used:

  • Use when you need to perform complex operations on tabular data.
  • Best for working with structured data in rows and columns.
  • Perfect for aggregations, joins, and advanced data transformations.
  • Create a new flow or use an existing flow.
  • Add a DataFrame Operations component to the flow, and then connect DataFrame output from another component to the DataFrame input.
  • In the Operations field, select the operation you want to perform on the incoming DataFrame. For example, the Filter operation filters the rows based on a specified column and value.
  • Configure the operation’s parameters. The specific parameters depend on the selected operation. For example, if you select the Filter operation, you must define a filter condition using the Column Name, Filter Value, and Filter Operator parameters

Connections with other components:

  • ChatOutput
  • Batch Run
  • Parser
  • Save File
  • Smart Function
  • Split Text
  • Type Convert
  • Notify
  • ChromaDB

Configurable settings:

  • DataFrame
  • Operation

Default settings:

  • DataFrame
  • Operation

Control Section:

  • DataFrame
  • Operation

Desired Behaviour:

  • Preserves column structure
  • Applies operations reliably

DataFrame Operations = Add Column

What and how it can be used:

The Add Column operation adds a new column to the DataFrame with a constant value. It creates a new column in the tabular data structure and populates all rows with the same specified value. This component enables adding fixed values, default settings, or static metadata to every row in a dataset.

When/how the component should be used:

  • Use when you need to add a column with the same value for all rows
  • Use to create a new column.

Connections with other components:

  • ChatOutput
  • Batch Run
  • Parser
  • Save File
  • Smart Function
  • Split Text
  • Type Convert
  • Notify
  • ChromaDB

Configurable settings:

  • DataFrame
  • Operation
  • New Column Name
  • New Column Value

Default settings:

  • DataFrame
  • Operation
  • New Column Name
  • New Column Value

Control Section:

  • DataFrame
  • Operation
  • New Column Name
  • New Column Value
Default values: 
  • Operation : Add Column

Desired Behaviour:

  • New column added without affecting others
  • Preserve row count

DataFrame Operations = Drop Column

What and how it can be used:

The Drop Column operation removes a column from the DataFrame, specified by Column Name. It deletes one or more columns from the tabular data structure, creating a new DataFrame without the specified column. This component enables removing unnecessary, redundant, or sensitive fields from datasets.

When/how the component should be used:

  • Use when you need to remove unwanted columns from a DataFrame
  • Use to remove columns that are not needed downstream.
  • Use to reduce data size and complexity.
  • Use to eliminate sensitive or irrelevant fields.
  • Use before exporting, embedding, routing, or analysis.

Connections with other components:

  • ChatOutput
  • Batch Run
  • Parser
  • Save File
  • Smart Function
  • Split Text
  • Type Convert
  • Notify
  • ChromaDB

Configurable settings:

  • DataFrame
  • Operation
  • Column Name

Default settings:

  • DataFrame
  • Operation
  • Column Name

Control Section:

  • DataFrame
  • Operation
  • Column Name
Default values: 
  • Operation : Drop Column

Desired Behaviour:

  • Remove the specified column.
  • Leave remaining data unchanged.
  • Show an error if the column doesn’t exist.

DataFrame Operations = Filter

What and how it can be used:

The Filter operation filters the DataFrame based on a specified condition. The output is a DataFrame containing only the rows that matched the filter condition. This component enables selecting subsets of data by applying logical conditions on column values, returning only rows where the condition evaluates to true.

When/how the component should be used:

  • Use to keep rows that meet specific conditions.
  • Use before analysis, alerts, or reporting.

Connections with other components:

  • ChatOutput
  • Batch Run
  • Parser
  • Save File
  • Smart Function
  • Split Text
  • Type Convert
  • Notify
  • ChromaDB

Configurable settings:

  • DataFrame
  • Operation
  • Column Name
  • Filter Value
  • Filter Operator

Default settings:

  • DataFrame
  • Operation
  • Column Name
  • Filter Value
  • Filter Operator

Control Section:

  • DataFrame
  • Operation
  • Column Name
  • Filter Value
  • Filter Operator
Default values: 
  • Operation : Filter
  • Filter Operator : equals

Desired Behaviour:

  • Applies condition row-by-row
  • Outputs only matching rows

DataFrame Operations = Head

What and how it can be used:

The Head operation retrieves the first n rows of the DataFrame, where n is set in the Number of Rows. The default is 5. This component enables quick inspection of the beginning of a dataset, useful for previewing data structure, validating imports, or displaying sample records without loading the entire dataset.

When/how the component should be used:

  • Used to inspect the first N rows of a table.
  • Use for quick data validation and sanity checks.
  • Use when you want a small sample without processing the full dataset.
  • Use during development, debugging, or exploration.

Connections with other components:

  • ChatOutput
  • Batch Run
  • Parser
  • Save File
  • Smart Function
  • Split Text
  • Type Convert
  • Notify
  • ChromaDB

Configurable settings:

  • DataFrame
  • Operation
  • Number of Rows

Default settings:

  • DataFrame
  • Operation
  • Number of Rows

Control Section:

  • DataFrame
  • Operation
  • Number of Rows
Default values: 
  • Operation : Filter
  • Filter Operator : equals

Desired Behaviour:

  • Returns the first N rows based on current order
  • Does not change column types or values
  • If fewer than N rows exist, returns all rows

DataFrame Operations = Rename Column

What and how it can be used:

The Rename Column operation renames an existing column in the DataFrame. It changes the name of one or more columns while preserving all data values and the order of columns. This component enables updating column names to follow naming conventions, improve clarity, or match expected schemas.

When/how the component should be used:

  • Used to standardize column names in a schema.
  • Use when integrating heterogeneous sources.

Connections with other components:

  • ChatOutput
  • Batch Run
  • Parser
  • Save File
  • Smart Function
  • Split Text
  • Type Convert
  • Notify
  • ChromaDB

Configurable settings:

  • DataFrame
  • Operation
  • Column Name
  • New Column Name

Default settings:

  • DataFrame
  • Operation
  • Column Name
  • New Column Name

Control Section:

  • DataFrame
  • Operation
  • Column Name
  • New Column Name
Default values: 
  • Operation : Rename Column

Desired Behaviour:

  • Rename column exactly as specified.

DataFrame Operations = Replace Value

What and how it can be used:

The Replace Value operation replaces a target value with a new value. All cells matching the target value are replaced with the new value. This component enables data cleaning, normalization, and transformation by finding and replacing specific values throughout the entire DataFrame or within specific columns.

When/how the component should be used:

  • Used to replace specific values in one or more columns.
  • Used to standardize inconsistent values across datasets.
  • Use to clean data before analysis, routing, or storage.
  • Use when missing, placeholder, or invalid values must be corrected.
  • Use when deterministic data cleanup is required (no inference).

Connections with other components:

  • ChatOutput
  • Batch Run
  • Parser
  • Save File
  • Smart Function
  • Split Text
  • Type Convert
  • Notify
  • ChromaDB

Configurable settings:

  • DataFrame
  • Operation
  • Column Name
  • Value to Replace
  • Replacement Value

Default settings:

  • DataFrame
  • Operation
  • Column Name
  • Value to Replace
  • Replacement Value

Control Section:

  • DataFrame
  • Operation
  • Column Name
  • Value to Replace
  • Replacement Value
Default values: 
  • Operation: Replace Value

Desired Behaviour:

  • Replace only exact matches.
  • Leave all other values unchanged.

DataFrame Operations = Select Columns

What and how it can be used:

The Select Columns operation selects one or more specific columns from the DataFrame. It creates a new DataFrame containing only the specified columns, effectively projecting a subset of the data by removing unwanted columns while preserving all rows. This component enables focusing on relevant fields and simplifying data structures.

When/how the component should be used:

  • Use when you need to reduce data to relevant fields.
  • Use to simplify downstream processing.

Connections with other components:

  • ChatOutput
  • Batch Run
  • Parser
  • Save File
  • Smart Function
  • Split Text
  • Type Convert
  • Notify
  • ChromaDB

Configurable settings:

  • DataFrame
  • Operation
  • Columns to Select

Default settings:

  • DataFrame
  • Operation
  • Columns to Select

Control Section:

  • DataFrame
  • Operation
  • Columns to Select
Default values: 
  • Operation: Select Columns

Desired Behaviour:

  • Output only specified columns.
  • Preserve row order.
  • Fail clearly if a column is missing.

DataFrame Operations = Sort

What and how it can be used:

The Sort operation sorts the DataFrame on a specific column in ascending or descending order. It arranges rows based on the values in one or more columns, enabling data organization by numeric values, alphabetical order, dates, or custom sorting criteria. This component provides flexible sorting capabilities for data analysis and presentation.

When/how the component should be used:

  • Use when order matters.
  • Use to order data by priority, time, or score.
  • Use before presentation or batch processing.

Connections with other components:

  • ChatOutput
  • Batch Run
  • Parser
  • Save File
  • Smart Function
  • Split Text
  • Type Convert
  • Notify
  • ChromaDB

Configurable settings:

  • DataFrame
  • Operation
  • Column Name
  • Sort Ascending

Default settings:

  • DataFrame
  • Operation
  • Column Name
  • Sort Ascending

Control Section:

  • DataFrame
  • Operation
  • Column Name
  • Sort Ascending
Default values: 
  • Operation : Sort
  • Sort Ascending = on

Desired Behaviour:

  • Explicit ascending/descending order

DataFrame Operations = Tail

What and how it can be used:

The Tail operation retrieves the last n rows of the DataFrame, where n is set in Number of Rows. The default is 5. This component enables quick inspection of the end of a dataset, useful for viewing recent entries, validating data imports, or displaying the most recent records without loading the entire dataset.

When/how the component should be used:

  • Use to inspect the last N rows of a table.
  • Used to verify recent or final entries in time-ordered data.
  • Use for debugging batch or streaming outputs.
  • Use when order matters (e.g. timestamps).

Connections with other components:

  • ChatOutput
  • Batch Run
  • Parser
  • Save File
  • Smart Function
  • Split Text
  • Type Convert
  • Notify
  • ChromaDB

Configurable settings:

  • DataFrame
  • Operation
  • Number of Rows

Default settings:

  • DataFrame
  • Operation
  • Number of Rows

Control Section:

  • DataFrame
  • Operation
  • Number of Rows
Default values: 
  • Operation : Tail

Desired Behaviour:

  • Return exactly the last N rows (if available).
  • Preserve the existing row order.

DataFrame Operations = Drop Duplicates

What and how it can be used:

The Drop Duplicates operation removes rows from the DataFrame by identifying all duplicate values within a single column. It eliminates duplicate records based on specified columns, keeping only the first or last occurrence of each unique value. This component enables data cleaning by removing redundant entries and ensuring data uniqueness.

When/how the component should be used:

  • Used to remove repeated records.
  • Use before storage or reporting.

Connections with other components:

  • ChatOutput
  • Batch Run
  • Parser
  • Save File
  • Smart Function
  • Split Text
  • Type Convert
  • Notify
  • ChromaDB

Configurable settings:

  • DataFrame
  • Operation
  • Column Name

Default settings:

  • DataFrame
  • Operation
  • Column Name

Control Section:

  • DataFrame
  • Operation
  • Column Name
Default values: 
  • Operation: Drop Duplicates

Desired Behaviour:

  • Keep only the first occurrence

< Back to all posts