On this tutorial, we display tips on how to leverage ScrapeGraph’s highly effective scraping instruments together with Gemini AI to automate the gathering, parsing, and evaluation of competitor info. By utilizing ScrapeGraph’s SmartScraperTool and MarkdownifyTool, customers can extract detailed insights from product choices, pricing methods, expertise stacks, and market presence straight from competitor web sites. The tutorial then employs Gemini’s superior language mannequin to synthesize these disparate information factors into structured, actionable intelligence. All through the method, ScrapeGraph ensures that the uncooked extraction is each correct and scalable, permitting analysts to concentrate on strategic interpretation fairly than handbook information gathering.
%pip set up --quiet -U langchain-scrapegraph langchain-google-genai pandas matplotlib seaborn
We quietly improve or set up the newest variations of important libraries, together with langchain-scrapegraph for superior internet scraping and langchain-google-genai for integrating Gemini AI, in addition to information evaluation instruments corresponding to pandas, matplotlib, and seaborn, to make sure your atmosphere is prepared for seamless aggressive intelligence workflows.
import getpass
import os
import json
import pandas as pd
from typing import Record, Dict, Any
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns
We import important Python libraries for organising a safe, data-driven pipeline: getpass and os handle passwords and atmosphere variables, json handles serialized information, and pandas presents strong DataFrame operations. The typing module gives kind hints for higher code readability, whereas datetime data timestamps. Lastly, matplotlib.pyplot and seaborn equip us with instruments for creating insightful visualizations.
if not os.environ.get("SGAI_API_KEY"):
os.environ("SGAI_API_KEY") = getpass.getpass("ScrapeGraph AI API key:n")
if not os.environ.get("GOOGLE_API_KEY"):
os.environ("GOOGLE_API_KEY") = getpass.getpass("Google API key for Gemini:n")
We test if the SGAI_API_KEY and GOOGLE_API_KEY atmosphere variables are already set; if not, the script securely prompts the person for his or her ScrapeGraph and Google (Gemini) API keys through getpass and shops them within the atmosphere for subsequent authenticated requests.
from langchain_scrapegraph.instruments import (
SmartScraperTool,
SearchScraperTool,
MarkdownifyTool,
GetCreditsTool,
)
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableConfig, chain
from langchain_core.output_parsers import JsonOutputParser
smartscraper = SmartScraperTool()
searchscraper = SearchScraperTool()
markdownify = MarkdownifyTool()
credit = GetCreditsTool()
llm = ChatGoogleGenerativeAI(
mannequin="gemini-1.5-flash",
temperature=0.1,
convert_system_message_to_human=True
)
Right here, we import and instantiate ScrapeGraph instruments, the SmartScraperTool, SearchScraperTool, MarkdownifyTool, and GetCreditsTool, for extracting and processing internet information, then configure the ChatGoogleGenerativeAI with the “gemini-1.5-flash” mannequin (low temperature and human-readable system messages) to drive our evaluation. We additionally usher in ChatPromptTemplate, RunnableConfig, chain, and JsonOutputParser from langchain_core to construction prompts and parse mannequin outputs.
class CompetitiveAnalyzer:
def __init__(self):
self.outcomes = ()
self.analysis_timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
def scrape_competitor_data(self, url: str, company_name: str = None) -> Dict(str, Any):
"""Scrape complete information from a competitor web site"""
extraction_prompt = """
Extract the next info from this web site:
1. Firm identify and tagline
2. Essential merchandise/providers supplied
3. Pricing info (if accessible)
4. Audience/market
5. Key options and advantages highlighted
6. Know-how stack talked about
7. Contact info
8. Social media presence
9. Latest information or bulletins
10. Group dimension indicators
11. Funding info (if talked about)
12. Buyer testimonials or case research
13. Partnership info
14. Geographic presence/markets served
Return the data in a structured JSON format with clear categorization.
If info just isn't accessible, mark as 'Not Accessible'.
"""
strive:
outcome = smartscraper.invoke({
"user_prompt": extraction_prompt,
"website_url": url,
})
markdown_content = markdownify.invoke({"website_url": url})
competitor_data = {
"company_name": company_name or "Unknown",
"url": url,
"scraped_data": outcome,
"markdown_length": len(markdown_content),
"analysis_date": self.analysis_timestamp,
"success": True,
"error": None
}
return competitor_data
besides Exception as e:
return {
"company_name": company_name or "Unknown",
"url": url,
"scraped_data": None,
"error": str(e),
"success": False,
"analysis_date": self.analysis_timestamp
}
def analyze_competitor_landscape(self, opponents: Record(Dict(str, str))) -> Dict(str, Any):
"""Analyze a number of opponents and generate insights"""
print(f"🔍 Beginning aggressive evaluation for {len(opponents)} firms...")
for i, competitor in enumerate(opponents, 1):
print(f"📊 Analyzing {competitor('identify')} ({i}/{len(opponents)})...")
information = self.scrape_competitor_data(
competitor('url'),
competitor('identify')
)
self.outcomes.append(information)
analysis_prompt = ChatPromptTemplate.from_messages((
("system", """
You're a senior enterprise analyst specializing in aggressive intelligence.
Analyze the scraped competitor information and supply complete insights together with:
1. Market positioning evaluation
2. Pricing technique comparability
3. Function hole evaluation
4. Audience overlap
5. Know-how differentiation
6. Market alternatives
7. Aggressive threats
8. Strategic suggestions
Present actionable insights in JSON format with clear classes and suggestions.
"""),
("human", "Analyze this aggressive information: {competitor_data}")
))
clean_data = ()
for end in self.outcomes:
if outcome('success'):
clean_data.append({
'firm': outcome('company_name'),
'url': outcome('url'),
'information': outcome('scraped_data')
})
analysis_chain = analysis_prompt | llm | JsonOutputParser()
strive:
competitive_analysis = analysis_chain.invoke({
"competitor_data": json.dumps(clean_data, indent=2)
})
besides:
analysis_chain_text = analysis_prompt | llm
competitive_analysis = analysis_chain_text.invoke({
"competitor_data": json.dumps(clean_data, indent=2)
})
return {
"evaluation": competitive_analysis,
"raw_data": self.outcomes,
"summary_stats": self.generate_summary_stats()
}
def generate_summary_stats(self) -> Dict(str, Any):
"""Generate abstract statistics from the evaluation"""
successful_scrapes = sum(1 for r in self.outcomes if r('success'))
failed_scrapes = len(self.outcomes) - successful_scrapes
return {
"total_companies_analyzed": len(self.outcomes),
"successful_scrapes": successful_scrapes,
"failed_scrapes": failed_scrapes,
"success_rate": f"{(successful_scrapes/len(self.outcomes)*100):.1f}%" if self.outcomes else "0%",
"analysis_timestamp": self.analysis_timestamp
}
def export_results(self, filename: str = None):
"""Export outcomes to JSON and CSV information"""
if not filename:
filename = f"competitive_analysis_{datetime.now().strftime('%Ypercentmpercentd_percentHpercentMpercentS')}"
with open(f"{filename}.json", 'w') as f:
json.dump({
"outcomes": self.outcomes,
"abstract": self.generate_summary_stats()
}, f, indent=2)
df_data = ()
for end in self.outcomes:
if outcome('success'):
df_data.append({
'Firm': outcome('company_name'),
'URL': outcome('url'),
'Success': outcome('success'),
'Data_Length': len(str(outcome('scraped_data'))) if outcome('scraped_data') else 0,
'Analysis_Date': outcome('analysis_date')
})
if df_data:
df = pd.DataFrame(df_data)
df.to_csv(f"{filename}.csv", index=False)
print(f"✅ Outcomes exported to {filename}.json and {filename}.csv")
The CompetitiveAnalyzer class orchestrates end-to-end competitor analysis, scraping detailed firm info utilizing ScrapeGraph instruments, compiling and cleansing the outcomes, after which leveraging Gemini AI to generate structured aggressive insights. It additionally tracks success charges and timestamps, and gives utility strategies to export each uncooked and summarized information into JSON and CSV codecs for simple downstream reporting and evaluation.
def run_ai_saas_analysis():
"""Run a complete evaluation of AI/SaaS opponents"""
analyzer = CompetitiveAnalyzer()
ai_saas_competitors = (
{"identify": "OpenAI", "url": "https://openai.com"},
{"identify": "Anthropic", "url": "https://anthropic.com"},
{"identify": "Hugging Face", "url": "https://huggingface.co"},
{"identify": "Cohere", "url": "https://cohere.ai"},
{"identify": "Scale AI", "url": "https://scale.com"},
)
outcomes = analyzer.analyze_competitor_landscape(ai_saas_competitors)
print("n" + "="*80)
print("🎯 COMPETITIVE ANALYSIS RESULTS")
print("="*80)
print(f"n📊 Abstract Statistics:")
stats = outcomes('summary_stats')
for key, worth in stats.gadgets():
print(f" {key.change('_', ' ').title()}: {worth}")
print(f"n🔍 Strategic Evaluation:")
if isinstance(outcomes('evaluation'), dict):
for part, content material in outcomes('evaluation').gadgets():
print(f"n {part.change('_', ' ').title()}:")
if isinstance(content material, checklist):
for merchandise in content material:
print(f" • {merchandise}")
else:
print(f" {content material}")
else:
print(outcomes('evaluation'))
analyzer.export_results("ai_saas_competitive_analysis")
return outcomes
The above perform initiates the aggressive evaluation by instantiating CompetitiveAnalyzer and defining the important thing AI/SaaS gamers to be evaluated. It then runs the complete scraping-and-insights workflow, prints formatted abstract statistics and strategic findings, and eventually exports the detailed outcomes to JSON and CSV for additional use.
def run_ecommerce_analysis():
"""Analyze e-commerce platform opponents"""
analyzer = CompetitiveAnalyzer()
ecommerce_competitors = (
{"identify": "Shopify", "url": "https://shopify.com"},
{"identify": "WooCommerce", "url": "https://woocommerce.com"},
{"identify": "BigCommerce", "url": "https://bigcommerce.com"},
{"identify": "Magento", "url": "https://magento.com"},
)
outcomes = analyzer.analyze_competitor_landscape(ecommerce_competitors)
analyzer.export_results("ecommerce_competitive_analysis")
return outcomes
The above perform units up a CompetitiveAnalyzer to judge main e-commerce platforms by scraping particulars from every website, producing strategic insights, after which exporting the findings to each JSON and CSV information underneath the identify “ecommerce_competitive_analysis.”
@chain
def social_media_monitoring_chain(company_urls: Record(str), config: RunnableConfig):
"""Monitor social media presence and engagement methods of opponents"""
social_media_prompt = ChatPromptTemplate.from_messages((
("system", """
You're a social media strategist. Analyze the social media presence and techniques
of those firms. Concentrate on:
1. Platform presence (LinkedIn, Twitter, Instagram, and so on.)
2. Content material technique patterns
3. Engagement ways
4. Group constructing approaches
5. Model voice and messaging
6. Posting frequency and timing
Present actionable insights for enhancing social media technique.
"""),
("human", "Analyze social media information for: {urls}")
))
social_data = ()
for url in company_urls:
strive:
outcome = smartscraper.invoke({
"user_prompt": "Extract all social media hyperlinks, group engagement options, and social proof parts",
"website_url": url,
})
social_data.append({"url": url, "social_data": outcome})
besides Exception as e:
social_data.append({"url": url, "error": str(e)})
chain = social_media_prompt | llm
evaluation = chain.invoke({"urls": json.dumps(social_data, indent=2)}, config=config)
return {
"social_analysis": evaluation,
"raw_social_data": social_data
}
Right here, this chained perform defines a pipeline to assemble and analyze opponents’ social media footprints: it makes use of ScrapeGraph’s sensible scraper to extract social media hyperlinks and engagement parts, then feeds that information into Gemini with a targeted immediate on presence, content material technique, and group ways. Lastly, it returns each the uncooked scraped info and the AI-generated, actionable social media insights in a single structured output.
def check_credits():
"""Verify accessible credit"""
strive:
credits_info = credit.invoke({})
print(f"💳 Accessible Credit: {credits_info}")
return credits_info
besides Exception as e:
print(f"⚠️ Couldn't test credit: {e}")
return None
The above perform calls the GetCreditsTool to retrieve and show your accessible ScrapeGraph/Gemini API credit, printing the outcome or a warning if the test fails, and returns the credit score info (or None on error).
if __name__ == "__main__":
print("🚀 Superior Aggressive Evaluation Software with Gemini AI")
print("="*60)
check_credits()
print("n🤖 Working AI/SaaS Aggressive Evaluation...")
ai_results = run_ai_saas_analysis()
run_additional = enter("n❓ Run e-commerce evaluation as properly? (y/n): ").decrease().strip()
if run_additional == 'y':
print("n🛒 Working E-commerce Platform Evaluation...")
ecom_results = run_ecommerce_analysis()
print("n✨ Evaluation full! Verify the exported information for detailed outcomes.")
Lastly, the final code piece serves because the script’s entry level: it prints a header, checks API credit, then kicks off the AI/SaaS competitor evaluation (and optionally e-commerce evaluation) earlier than signaling that each one outcomes have been exported.
In conclusion, integrating ScrapeGraph’s scraping capabilities with Gemini AI transforms a historically time-consuming aggressive intelligence workflow into an environment friendly, repeatable pipeline. ScrapeGraph handles the heavy lifting of fetching and normalizing web-based info, whereas Gemini’s language understanding turns that uncooked information into high-level strategic suggestions. In consequence, companies can quickly assess market positioning, establish function gaps, and uncover rising alternatives with minimal handbook intervention. By automating these steps, customers achieve pace and consistency, in addition to the flexibleness to increase their evaluation to new opponents or markets as wanted.
Take a look at the Pocket book on GitHub. All credit score for this analysis goes to the researchers of this challenge. Additionally, be happy to observe us on Twitter and don’t overlook to affix our 95k+ ML SubReddit and Subscribe to our E-newsletter.

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.