Monday, May 26, 2025

A Coding Implementation to Construct an AI Agent with Dwell Python Execution and Automated Validation

On this tutorial, we’ll uncover find out how to harness the facility of a complicated AI Agent, augmented with each Python execution and result-validation capabilities, to deal with advanced computational duties. By integrating LangChain’s ReAct agent framework with Anthropic’s Claude API, we construct an end-to-end resolution to generate Python code and execute it stay, seize its outputs, preserve execution state, and mechanically confirm outcomes towards anticipated properties or take a look at circumstances. This seamless loop of “write → run → validate” empowers you to develop sturdy analyses, algorithms, and easy ML pipelines with confidence in each step.

!pip set up langchain langchain-anthropic langchain-core anthropic

We set up the core LangChain framework together with the Anthropic integration and its core utilities, guaranteeing you may have each the agent orchestration instruments (langchain, langchain-core) and the Claude-specific bindings (langchain-anthropic, anthropic) out there in your atmosphere.

import os
from langchain.brokers import create_react_agent, AgentExecutor
from langchain.instruments import Software
from langchain_core.prompts import PromptTemplate
from langchain_anthropic import ChatAnthropic
import sys
import io
import re
import json
from typing import Dict, Any, Listing

We carry collectively every part wanted to construct our ReAct-style agent: OS entry for atmosphere variables, LangChain’s agent constructors (create_react_agent, AgentExecutor), and Software class for outlining customized actions, the PromptTemplate for crafting the chain-of-thought immediate, and Anthropic’s ChatAnthropic shopper for connecting to Claude. Commonplace Python modules (sys, io, re, json) deal with I/O seize, common expressions, and serialization, whereas typing gives kind hints for clearer, extra maintainable code.

class PythonREPLTool:
    def __init__(self):
        self.globals_dict = {
            '__builtins__': __builtins__,
            'json': json,
            're': re
        }
        self.locals_dict = {}
        self.execution_history = ()
   
    def run(self, code: str) -> str:
        attempt:
            old_stdout = sys.stdout
            old_stderr = sys.stderr
            sys.stdout = captured_output = io.StringIO()
            sys.stderr = captured_error = io.StringIO()
           
            execution_result = None
           
            attempt:
                outcome = eval(code, self.globals_dict, self.locals_dict)
                execution_result = outcome
                if outcome shouldn't be None:
                    print(outcome)
            besides SyntaxError:
                exec(code, self.globals_dict, self.locals_dict)
           
            output = captured_output.getvalue()
            error_output = captured_error.getvalue()
           
            sys.stdout = old_stdout
            sys.stderr = old_stderr
           
            self.execution_history.append({
                'code': code,
                'output': output,
                'outcome': execution_result,
                'error': error_output
            })
           
            response = f"**Code Executed:**n```pythonn{code}n```nn"
            if error_output:
                response += f"**Errors/Warnings:**n{error_output}nn"
            response += f"**Output:**n{output if output.strip() else 'No console output'}"
           
            if execution_result shouldn't be None and never output.strip():
                response += f"n**Return Worth:** {execution_result}"
           
            return response
           
        besides Exception as e:
            sys.stdout = old_stdout
            sys.stderr = old_stderr
           
            error_info = f"**Code Executed:**n```pythonn{code}n```nn**Runtime Error:**n{str(e)}n**Error Kind:** {kind(e).__name__}"
           
            self.execution_history.append({
                'code': code,
                'output': '',
                'outcome': None,
                'error': str(e)
            })
           
            return error_info
   
    def get_execution_history(self) -> Listing(Dict(str, Any)):
        return self.execution_history
   
    def clear_history(self):
        self.execution_history = ()

This PythonREPLTool encapsulates a stateful in‐course of Python REPL: it captures and executes arbitrary code (evaluating expressions or operating statements), redirects stdout/stderr to report outputs and errors, and maintains a historical past of every execution. Returning a formatted abstract, together with the executed code, any console output or errors, and return values, gives clear, reproducible suggestions for each snippet run inside our agent.

class ResultValidator:
    def __init__(self, python_repl: PythonREPLTool):
        self.python_repl = python_repl
   
    def validate_mathematical_result(self, description: str, expected_properties: Dict(str, Any)) -> str:
        """Validate mathematical computations"""
        validation_code = f"""
# Validation for: {description}
validation_results = {{}}


# Get the final execution outcomes
historical past = {self.python_repl.execution_history}
if historical past:
    last_execution = historical past(-1)
    print(f"Final execution output: {{last_execution('output')}}")
   
    # Extract numbers from the output
    import re
    numbers = re.findall(r'd+(?:.d+)?', last_execution('output'))
    if numbers:
        numbers = (float(n) for n in numbers)
        validation_results('extracted_numbers') = numbers
       
        # Validate anticipated properties
        for prop, expected_value in {expected_properties}.objects():
            if prop == 'depend':
                actual_count = len(numbers)
                validation_results(f'count_check') = actual_count == expected_value
                print(f"Rely validation: Anticipated {{expected_value}}, Received {{actual_count}}")
            elif prop == 'max_value':
                if numbers:
                    max_val = max(numbers)
                    validation_results(f'max_check') = max_val <= expected_value
                    print(f"Max worth validation: {{max_val}} <= {{expected_value}} = {{max_val <= expected_value}}")
            elif prop == 'min_value':
                if numbers:
                    min_val = min(numbers)
                    validation_results(f'min_check') = min_val >= expected_value
                    print(f"Min worth validation: {{min_val}} >= {{expected_value}} = {{min_val >= expected_value}}")
            elif prop == 'sum_range':
                if numbers:
                    whole = sum(numbers)
                    min_sum, max_sum = expected_value
                    validation_results(f'sum_check') = min_sum <= whole <= max_sum
                    print(f"Sum validation: {{min_sum}} <= {{whole}} <= {{max_sum}} = {{min_sum <= whole <= max_sum}}")


print("nValidation Abstract:")
for key, worth in validation_results.objects():
    print(f"{{key}}: {{worth}}")


validation_results
"""
        return self.python_repl.run(validation_code)
   
    def validate_data_analysis(self, description: str, expected_structure: Dict(str, Any)) -> str:
        """Validate information evaluation outcomes"""
        validation_code = f"""
# Knowledge Evaluation Validation for: {description}
validation_results = {{}}


# Test if required variables exist in world scope
required_vars = {record(expected_structure.keys())}
existing_vars = ()


for var_name in required_vars:
    if var_name in globals():
        existing_vars.append(var_name)
        var_value = globals()(var_name)
        validation_results(f'{{var_name}}_exists') = True
        validation_results(f'{{var_name}}_type') = kind(var_value).__name__
       
        # Kind-specific validations
        if isinstance(var_value, (record, tuple)):
            validation_results(f'{{var_name}}_length') = len(var_value)
        elif isinstance(var_value, dict):
            validation_results(f'{{var_name}}_keys') = record(var_value.keys())
        elif isinstance(var_value, (int, float)):
            validation_results(f'{{var_name}}_value') = var_value
           
        print(f"✓ Variable '{{var_name}}' discovered: {{kind(var_value).__name__}} = {{var_value}}")
    else:
        validation_results(f'{{var_name}}_exists') = False
        print(f"✗ Variable '{{var_name}}' not discovered")


print(f"nFound {{len(existing_vars)}}/{{len(required_vars)}} required variables")


# Further construction validation
for var_name, expected_type in {expected_structure}.objects():
    if var_name in globals():
        actual_type = kind(globals()(var_name)).__name__
        validation_results(f'{{var_name}}_type_match') = actual_type == expected_type
        print(f"Kind verify '{{var_name}}': Anticipated {{expected_type}}, Received {{actual_type}}")


validation_results
"""
        return self.python_repl.run(validation_code)
   
    def validate_algorithm_correctness(self, description: str, test_cases: Listing(Dict(str, Any))) -> str:
        """Validate algorithm implementations with take a look at circumstances"""
        validation_code = f"""
# Algorithm Validation for: {description}
validation_results = {{}}
test_results = ()


test_cases = {test_cases}


for i, test_case in enumerate(test_cases):
    test_name = test_case.get('title', f'Check {{i+1}}')
    input_val = test_case.get('enter')
    anticipated = test_case.get('anticipated')
    function_name = test_case.get('perform')
   
    print(f"nRunning {{test_name}}:")
    print(f"Enter: {{input_val}}")
    print(f"Anticipated: {{anticipated}}")
   
    attempt:
        if function_name and function_name in globals():
            func = globals()(function_name)
            if callable(func):
                if isinstance(input_val, (record, tuple)):
                    outcome = func(*input_val)
                else:
                    outcome = func(input_val)
               
                handed = outcome == anticipated
                test_results.append({{
                    'test_name': test_name,
                    'enter': input_val,
                    'anticipated': anticipated,
                    'precise': outcome,
                    'handed': handed
                }})
               
                standing = "✓ PASS" if handed else "✗ FAIL"
                print(f"Precise: {{outcome}}")
                print(f"Standing: {{standing}}")
            else:
                print(f"✗ ERROR: '{{function_name}}' shouldn't be callable")
        else:
            print(f"✗ ERROR: Operate '{{function_name}}' not discovered")
           
    besides Exception as e:
        print(f"✗ ERROR: {{str(e)}}")
        test_results.append({{
            'test_name': test_name,
            'error': str(e),
            'handed': False
        }})


# Abstract
passed_tests = sum(1 for take a look at in test_results if take a look at.get('handed', False))
total_tests = len(test_results)
validation_results('tests_passed') = passed_tests
validation_results('total_tests') = total_tests
validation_results('success_rate') = passed_tests / total_tests if total_tests > 0 else 0


print(f"n=== VALIDATION SUMMARY ===")
print(f"Checks handed: {{passed_tests}}/{{total_tests}}")
print(f"Success charge: {{validation_results('success_rate'):.1%}}")


test_results
"""
        return self.python_repl.run(validation_code)

This ResultValidator class builds on the PythonREPLTool to mechanically generate and run bespoke validation routines, checking numerical properties, verifying information constructions, or operating algorithm take a look at circumstances towards the agent’s execution historical past. Emitting Python snippets that extract outputs, examine them to anticipated standards, and summarize move/fail outcomes closes the loop on “execute → validate” inside our agent’s workflow.

python_repl = PythonREPLTool()
validator = ResultValidator(python_repl)

Right here, we instantiate our interactive Python REPL software (python_repl) after which create a ResultValidator tied to that very same REPL occasion. This wiring ensures any code you execute is straight away out there for automated validation steps, closing the loop on execution and correctness checking.

python_tool = Software(
    title="python_repl",
    description="Execute Python code and return each the code and its output. Maintains state between executions.",
    func=python_repl.run
)


validation_tool = Software(
    title="result_validator",
    description="Validate the outcomes of earlier computations with particular take a look at circumstances and anticipated properties.",
    func=lambda question: validator.validate_mathematical_result(question, {})
)

Right here, we wrap our REPL and validation strategies into LangChain Software objects, assigning them clear names and descriptions. The agent can invoke python_repl to run code and result_validator to verify the final execution towards your specified standards mechanically.

prompt_template = """You're Claude, a complicated AI assistant with Python execution and outcome validation capabilities.


You'll be able to execute Python code to unravel advanced issues after which validate your outcomes to make sure accuracy.


Accessible instruments:
{instruments}


Use this format:
Query: the enter query it's essential to reply
Thought: analyze what must be accomplished
Motion: {tool_names}
Motion Enter: (your enter)
Remark: (outcome)
... (repeat Thought/Motion/Motion Enter/Remark as wanted)
Thought: I ought to validate my outcomes
Motion: (validation if wanted)
Motion Enter: (validation parameters)
Remark: (validation outcomes)
Thought: I now have the entire reply
Ultimate Reply: (complete reply with validation affirmation)


Query: {enter}
{agent_scratchpad}"""


immediate = PromptTemplate(
    template=prompt_template,
    input_variables=("enter", "agent_scratchpad"),
    partial_variables={
        "instruments": "python_repl - Execute Python codenresult_validator - Validate computation outcomes",
        "tool_names": "python_repl, result_validator"
    }
)

Above immediate template frames Claude as a dual-capability assistant that first causes (“Thought”), selects from the python_repl and result_validator instruments to run code and verify outputs, after which iterates till it has a validated resolution. By defining a transparent chain-of-thought construction with placeholders for software names and their utilization examples, it guides the agent to: (1) break down the issue, (2) name python_repl to execute vital code, (3) name result_validator to verify correctness, and at last (4) ship a self-checked “Ultimate Reply.” This scaffolding ensures a disciplined “write → run → validate” workflow.

class AdvancedClaudeCodeAgent:
    def __init__(self, anthropic_api_key=None):
        if anthropic_api_key:
            os.environ("ANTHROPIC_API_KEY") = anthropic_api_key
       
        self.llm = ChatAnthropic(
            mannequin="claude-3-opus-20240229",
            temperature=0,
            max_tokens=4000
        )
       
        self.agent = create_react_agent(
            llm=self.llm,
            instruments=(python_tool, validation_tool),
            immediate=immediate
        )
       
        self.agent_executor = AgentExecutor(
            agent=self.agent,
            instruments=(python_tool, validation_tool),
            verbose=True,
            handle_parsing_errors=True,
            max_iterations=8,
            return_intermediate_steps=True
        )
       
        self.python_repl = python_repl
        self.validator = validator
   
    def run(self, question: str) -> str:
        attempt:
            outcome = self.agent_executor.invoke({"enter": question})
            return outcome("output")
        besides Exception as e:
            return f"Error: {str(e)}"
   
    def validate_last_result(self, description: str, validation_params: Dict(str, Any)) -> str:
        """Manually validate the final computation outcome"""
        if 'test_cases' in validation_params:
            return self.validator.validate_algorithm_correctness(description, validation_params('test_cases'))
        elif 'expected_structure' in validation_params:
            return self.validator.validate_data_analysis(description, validation_params('expected_structure'))
        else:
            return self.validator.validate_mathematical_result(description, validation_params)
   
    def get_execution_summary(self) -> Dict(str, Any):
        """Get abstract of all executions"""
        historical past = self.python_repl.get_execution_history()
        return {
            'total_executions': len(historical past),
            'successful_executions': len((h for h in historical past if not h('error'))),
            'failed_executions': len((h for h in historical past if h('error'))),
            'execution_details': historical past
        }

This AdvancedClaudeCodeAgent class wraps every part right into a single, easy-to-use interface: it configures the Anthropic Claude shopper (utilizing your API key), instantiates a ReAct-style agent with our python_repl and result_validator instruments and the customized immediate, and units up an executor that drives iterative “assume → code → validate” loops. Its run() methodology enables you to submit natural-language queries and returns Claude’s last, self-checked reply; validate_last_result() exposes guide hooks for extra checks; and get_execution_summary() gives a concise report on each code snippet you’ve executed (what number of succeeded, failed, and their particulars).

if __name__ == "__main__":
    API_KEY = "Use Your Personal Key Right here"
   
    agent = AdvancedClaudeCodeAgent(anthropic_api_key=API_KEY)
   
    print("🚀 Superior Claude Code Agent with Validation")
    print("=" * 60)
   
    print("n🔢 Instance 1: Prime Quantity Evaluation with Twin Prime Detection")
    print("-" * 60)
    query1 = """
    Discover all prime numbers between 1 and 200, then:
    1. Calculate their sum
    2. Discover all twin prime pairs (primes that differ by 2)
    3. Calculate the typical hole between consecutive primes
    4. Determine the biggest prime hole on this vary
    After computation, validate that we discovered the right variety of primes and that every one recognized numbers are literally prime.
    """
    result1 = agent.run(query1)
    print(result1)
   
    print("n" + "=" * 80 + "n")
   
    print("📊 Instance 2: Superior Gross sales Knowledge Evaluation with Statistical Validation")
    print("-" * 60)
    query2 = """
    Create a complete gross sales evaluation:
    1. Generate gross sales information for 12 merchandise throughout 24 months with practical seasonal patterns
    2. Calculate month-to-month development charges, yearly totals, and development evaluation
    3. Determine prime 3 performing merchandise and worst 3 performing merchandise
    4. Carry out correlation evaluation between completely different merchandise
    5. Create abstract statistics (imply, median, customary deviation, percentiles)
    After evaluation, validate the information construction, guarantee all calculations are mathematically appropriate, and confirm the statistical measures.
    """
    result2 = agent.run(query2)
    print(result2)
   
    print("n" + "=" * 80 + "n")
   
    print("⚙️ Instance 3: Superior Algorithm Implementation with Check Suite")
    print("-" * 60)
    query3 = """
    Implement and validate a complete sorting and looking system:
    1. Implement quicksort, mergesort, and binary search algorithms
    2. Create take a look at information with varied edge circumstances (empty lists, single components, duplicates, sorted/reverse sorted)
    3. Benchmark the efficiency of various sorting algorithms
    4. Implement a perform to search out the kth largest ingredient utilizing completely different approaches
    5. Check all implementations with complete take a look at circumstances together with edge circumstances
    After implementation, validate every algorithm with a number of take a look at circumstances to make sure correctness.
    """
    result3 = agent.run(query3)
    print(result3)
   
    print("n" + "=" * 80 + "n")
   
    print("🤖 Instance 4: Machine Studying Mannequin with Cross-Validation")
    print("-" * 60)
    query4 = """
    Construct an entire machine studying pipeline:
    1. Generate an artificial dataset with options and goal variable (classification downside)
    2. Implement information preprocessing (normalization, function scaling)
    3. Implement a easy linear classifier from scratch (gradient descent)
    4. Cut up information into prepare/validation/take a look at units
    5. Practice the mannequin and consider efficiency (accuracy, precision, recall)
    6. Implement k-fold cross-validation
    7. Examine outcomes with completely different hyperparameters
    Validate your complete pipeline by guaranteeing mathematical correctness of gradient descent, correct information splitting, and practical efficiency metrics.
    """
    result4 = agent.run(query4)
    print(result4)
   
    print("n" + "=" * 80 + "n")
   
    print("📋 Execution Abstract")
    print("-" * 60)
    abstract = agent.get_execution_summary()
    print(f"Whole code executions: {abstract('total_executions')}")
    print(f"Profitable executions: {abstract('successful_executions')}")
    print(f"Failed executions: {abstract('failed_executions')}")
   
    if abstract('failed_executions') > 0:
        print("nFailed executions particulars:")
        for i, execution in enumerate(abstract('execution_details')):
            if execution('error'):
                print(f"  {i+1}. Error: {execution('error')}")
   
    print(f"nSuccess charge: {(abstract('successful_executions')/abstract('total_executions')*100):.1f}%")

Lastly, we instantiate the AdvancedClaudeCodeAgent along with your Anthropic API key, run 4 illustrative instance queries (masking prime‐quantity evaluation, gross sales information analytics, algorithm implementations, and a easy ML pipeline), and print every validated outcome. Lastly, it gathers and shows a concise execution abstract, whole runs, successes, failures, and error particulars, demonstrating the agent’s stay “write → run → validate” workflow.

In conclusion, we’ve got developed a flexible AdvancedClaudeCodeAgent able to seamlessly mixing generative reasoning with exact computational management. At its core, this Agent doesn’t simply draft Python snippets; it runs them on the spot and checks their correctness towards your specified standards, closing the suggestions loop mechanically. Whether or not you’re performing prime-number analyses, statistical information evaluations, algorithm benchmarking, or end-to-end ML workflows, this sample ensures reliability and reproducibility.


Take a look at the Pocket book on GitHub. All credit score for this analysis goes to the researchers of this undertaking. Additionally, be at liberty to observe us on Twitter and don’t neglect to hitch our 95k+ ML SubReddit and Subscribe to our E-newsletter.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles