Making sense of unstructured data with Zapier and ChatGPT

Use Case 1: Structuring raw unstructured website content

1. Process:

2.Code Step to Scrape Raw Website Content from URL

import requests
import re
import logging

# Configure logging
logging.basicConfig(level=logging.INFO)

def extract_text_from_html(html_content):
    """
    Extracts and cleans text from HTML content.
    """
    try:
        # Remove script and style elements
        html_content = re.sub(r'<(script|style).*?>.*?</\1>', '', html_content, flags=re.DOTALL)
        # Remove HTML tags
        text = re.sub(r'<.*?>', ' ', html_content)
        # Collapse whitespace
        text = re.sub(r'\s+', ' ', text)
        return text.strip()
    except Exception as e:
        logging.error(f"Failed to extract text from HTML: {e}")
        return ""

def scrape_website_content(website_url):
    """
    Scrapes raw content from the given website URL.
    """
    try:
        # Ensure the URL is prefixed with http:// or https://
        if not website_url.startswith(('http://', 'https://')):
            website_url = 'http://' + website_url
        response = requests.get(website_url)
        response.raise_for_status()
        raw_text = extract_text_from_html(response.content.decode('utf-8'))
        return raw_text
    except requests.RequestException as e:
        logging.error(f"Failed to fetch URL {website_url}: {e}")
        return ""

# Get the website URL from input data
website_url = input_data.get('websiteUrl')

if website_url:
    raw_content = scrape_website_content(website_url)
    if raw_content:
        output = {'raw_content': raw_content}
    else:
        output = {'error': 'Failed to extract content from the website.'}
else:
    output = {'error': 'No website URL provided.'}

return output

3.ChatGPT Prompt Template for Structuring Raw Unstructured Website Content

Review the raw content here:
----
Website url: {{249095056__fields__websiteUrl}}. # From Zapier Chrome Push Trigger Step
Content: {{249095057__raw_content}}. # From Zapier Chrome Push Trigger Step
----

Extract and structure the data based on the following schema:
---
- 'businessName': Extracted from the business name.
- 'businessStreet': Extracted from business address.
- 'businessCity': Extracted from business address.
- 'businessState': Extracted from business address (state or province).
- 'businessCountry': Extracted from business address.
- 'businessPostalCode': Extracted from business address.
- 'businessPhone': Extracted from the business phone number.
- 'businessEmail': Extracted from the business email address.
- 'websiteUrl': Extracted from the Company website URL.
- 'businessDescription': A detailed and specific description of the business and what they do. Three sentences or less.
- 'businessIndustryName': The business industry (Name only, no code) using NAICS standards.
---

- Once the data is gathered, check your response for accuracy against the provided instructions.
- Unless otherwise told in the schema, all fields are string fields.
- This data will be used to go into a CRM via automation. The data needs to needs to be accurate, clean, and contextual.

++++
Output response in JSON code format with no leading characters.  Your reply will be used as a JSON payload. Don't include (```json```).
++++

  • ChatGPT Settings
    • Model = gpt-4o
    • Memory Key = blank
    • Image = blank
    • User Name = Expert Sales Person
    • Assistant Name = Expert Sales Person
    • Assistant Instructions =
      • You are a helpful Sales assistant that specializes in parsing unstructured content into a structured format.
    • Max Tokens = 1024
    • Temperature = 0.5
    • Top P = 1

4.Parse JSON Payload Code Step

Input Data - payLoad = ChatGPT Response

var obj = JSON.parse(inputData.payLoad);
return obj;

5.Setup Action steps as needed from there

Use Case 2: Structuring raw unstructured email content

1.Process:

2.ChatGPT Prompt Template for Structuring Unstructured Email Content

Review the raw content here:
----
Subject: {{249105134__raw__Subject}}. # From Zapier Email Trigger Step
Body plain: {{249105134__body_plain}}. # From Zapier Email Trigger Step
----

Extract and structure the data based on the following schema:
---
- 'businessName': Extracted from the business name.
- 'businessStreet': Extracted from business address.
- 'businessCity': Extracted from business address.
- 'businessState': Extracted from business address (state or province).
- 'businessCountry': Extracted from business address.
- 'businessPostalCode': Extracted from business address.
- 'businessPhone': Extracted from the business phone number.
- 'businessEmail': Extracted from the business email address.
- 'websiteUrl': Extracted from the URL in the prospects email.
- 'businessDescription': A detailed and specific description of the business and what they do. Three sentences or less.
- 'businessIndustryName': Infer the business industry using NAICS standards.
- 'contactFirstName': Extracted from the contact's name.
- 'contactLastName': Extracted from the contact's name.
- 'contactCellPhone': Extracted from the contact's cell phone number.
- 'contactEmail': Extracted from the contact's email address.
- 'contactTitle': Extracted from the contact's title or position.
- 'budget': Extracted budget information related to the business or project.
- 'prospectInterest': Extracted interest level or area of interest of the prospect.
- 'leadDescription': A brief description of the lead, summarizing the potential opportunity or interest.
- 'whenToContact': Extracted preferred time or date to contact the lead.
---

- Once the data is gathered, check your response for accuracy against the provided instructions.
- Unless otherwise told in the schema, all fields are string fields.
- This data will be used to go into a CRM via automation. The data needs to needs to be accurate, clean, and contextual.

++++
Output response in JSON code format with no leading characters.  Your reply will be used as a JSON payload. Don't include (```json```).
++++

  • ChatGPT Settings
    • Model = gpt-4
    • Memory Key = blank
    • Image = blank
    • User Name = Company Admin
    • Assistant Name = Company Admin Assistant and Email Parser
    • Assistant Instructions =
      • You are a helpful Company assistant that specializes in parsing unstructured content into a structured format.
    • Max Tokens = 1024
    • Temperature = 0.5
    • Top P = 1

4.Parse JSON Payload Code Step

Input Data - payLoad = ChatGPT Response

var obj = JSON.parse(inputData.payLoad);
return obj;

By Aaron LeBlanc, Founder & CEO, Hypelocal

➡️ https://zapier.com/experts/hypelocal