Skip to main content

Improving Hex Magic

The best way to think about Magic is similar to the way you’d think about enabling a new college graduate - they have all the core fundamentals and know how to write great SQL or Python but they lack business context. Being faced with an abundance of new information can obfuscate their understanding and lead to incorrect assumptions.

A great proxy for “how to make AI better” is the same stuff that would be great to enable this same persona. The ideal prompting should also have similar context by providing clear and specific direction as if you’re talking to someone who knows how to code but doesn’t have any business context. Great data curation setup is the #1 lever you can pull to augment Magic prompts and help return higher quality results.

This tutorial outlines steps that help provide Magic a focused view on relevant information, leading to more accurate suggestions.

info

Hex Magic is meant as a way to augment—not replace—human insight and judgement. You should not rely on AI-generated code to be accurate, complete, or free from bias.

Data Connection Setup

The first thing to consider is if your data connections need curation. Often a small subject of teams in Hex need access to the full warehouse, but the majority of users need only a subset of databases and schemas. Creating additional data connections with curated data access has benefits:

  1. You can tailor that data connection to only have access to the schemas/databases that the majority of users need to use, thereby giving a focused view of the data.
  2. You can make that data connection the default data connection, directing new users in Hex to use this connection to start.

You can make this data connection and either:

  1. Create a new role/permissioning structure in your warehouse directly. This will prevent users from querying tables outside of the schema/databases this role has access to.
  2. Create a duplicate data connection but use Hex's schema filter feature - this will only pull in schema details based on regex rules you set. This is a lightweight way to curate your data connection without any extra work in the data warehouse.
caution

Schema filtering is not a security feature and users can still query the underlying tables directly if the user has proper permissions to do so.

Once you have your connection set up, you can set up scheduled refreshes to ensure we’re regularly resyncing and making sure Magic has access to up-to-date schema information and metadata.

Giving AI the right context

tip

For more examples, check out our Managing Magic blog post.

Endorsements

Adding an endorsed status to a database, schema, table, or semantic model is the easiest and fastest way to help Magic (and end users!) know what data is “Approved” or “Trusted” by data leaders. Endorsed data will be prioritized for all AI features. Child data objects will inherit their parents Endorsed status and also will be prioritized - so it's easy to endorse an entire database or schema with just one click!

Ask Magic will only generate explorations using warehouse tables and semantic models that have an endorsed status applied. See Optimizing Ask Magic results for more details.

Magic Exclusions

If you want to maintain access in the Data browser to certain databases/tables/schemas but never want Magic to use these tables, you can toggle the “Include/Exclude for Magic” setting. Child data objects will inherit their parents exclusion state, so if you've excluded an entire database, all schemas, tables and columns contained within would also be excluded from Magic.

Excluding testing schemas from Magic.

Adding Metadata

Table and column descriptions saved in your warehouse get synced with every schema refresh. When available, Magic can use this context to generate better suggestions. If you use dbt, you can take advantage of our dbt Cloud integration - once set up, Hex will sync all your descriptions which Magic will automatically ingest.

What kind of metadata is helpful for Magic?

Below are some guidelines to consider when thinking about what data to include to help Magic:

  • For a table:
    • Information about what a table should be used for and what things can be calculated with it
  • For a column:
    • Add enumerated values for all low cardinality columns
    • For text values, add an example in the description and describe a pattern it might follow:
      • Example: State: Text column following the pattern: CA , AZ , NV
  • Other considerations:
    • Jargon - Companies often have terms or phrases that are very specific to them and LLM’s might not be familiar. Try and explain what specific words or phrases mean to give Magic the best chance of understanding this terminology.
      • Example: The term “S0” is used to refer to a pipeline opportunity that's in stage 0
    • Synonyms(‼️) - Companies sometimes use multiple words to describe the same concept. Giving this context to Magic can help provide suggestions more consistently when folks describe things with different verbiage.
      • Example: The words “organization” and “workspace” are used interchangeably, but “organization” is how it's represented across all data tables
An excerpt from our data teams Coalesce talk on the best documentation you can add.
tip

Above is an excerpt from our data team's Coalesce talk on documentation best practices - you can listen in here!

How do I know if my descriptions are helpful?

If you want to prototype descriptions and see how Magic does with them, you can directly edit the description fields in Hex within the Data browser UI. You can then ask Magic your question and observe the results, updating the descriptions as needed!

Adding column descriptions to give more context to Magic.

If you want Magic to use a specific table, you can @mention the table in the prompt bar.

Target specific tables with Magic by @ mentioning them.

Our suggestion is to prioritize the top 10 tables you expect folks to be using and filter out or exclude STAGING/DEV/RAW schemas. If you prioritize endorsing and documenting these 10 tables, you should see a large increase in the accuracy and perfomance of Magic!

Workspace Rules: Best practices

Workspace rules teach the AI agent your organization's unique context—the tribal knowledge that distinguishes your data team from others. The agent already knows SQL, Python, and analytics best practices. What it needs from you is your business logic, your data architecture, and your specific ways of working.

Think of rules as onboarding a brilliant new analyst who needs to learn your company's specifics, not how to write code. This context will be passed at the beginning of every agent interaction, so utilize it for "universal truths" about your data and business. Context for specific tables should be added at the table documentation.

The best rules file reads like a knowledgeable colleague's handover notes—specific, practical, and immediately useful.

High-Impact Rule Categories

1. Business Definitions & Logic

Define what metrics mean at YOUR company. The agent can calculate CAC, but it doesn't know you exclude free trials from the denominator.

Good:

- Active user: Logged in 3+ times in rolling 30 days (not just once)
- Churn: No revenue-generating activity for 60 days
- MRR excludes paused accounts but includes discounted plans

Avoid:

- Calculate averages properly (too generic)
- Use appropriate metrics (too vague)

2. Data Governance

Specify your compliance requirements and data handling policies.

Good:

- EU customer data: Apply GDPR masking to any output
- Financial reports: Include audit_timestamp field
- Never join PII tables with marketing tables without approval flag

Avoid:

- Follow data privacy laws (agent assumes this)
- Be careful with sensitive data (too vague)

3. Data Architecture Guidance

Map your actual data landscape—which tables are trusted and which are deprecated. Tip: If you have endorsed tables inside Hex the agent will automatically prioritize those.

Good:

- Customer truth: analytics.dim_customer_v3 (v2 has sync issues)
- Revenue: finance.recognized_revenue only (not billing.invoices)
- Events: Use events_cleaned, raw_events has 15% bot traffic

Avoid:

- Use the right tables (unhelpful)
- Check data quality (agent does this)

4. Company Standards

Document your specific conventions and requirements that go beyond general best practices.

Good:

- Currency: Always USD, converted at month-end rates
- Cohorts: Define by first_purchase_week starting Sunday
- Charts: Brand colors #0A2540 primary, #00D924 success

Avoid:

- Make nice looking charts (subjective)
- Format numbers properly (agent knows this)

5. Domain Intelligence

Share industry context and competitive intelligence the agent couldn't know. Remember to specify the conditions where this information would be useful.

Good:

- Seasonality: 40% of revenue in Q4, plan capacity accordingly
- Benchmarks: Our 15% monthly growth exceeds industry 8%
- Market context: We compete on speed, not price—emphasize time metrics

Avoid:

- Consider business context (too generic)
- Industry matters (obvious)

Writing Principles

  • Be Specific: Replace "appropriate filtering" with "exclude test accounts where is_test=true"
  • Show, Don't Tell: Include actual column names, real thresholds, specific examples
  • Explain Why When Non-Obvious: "Use ship_date not order_date (rev rec policy changed 2024-01)"
  • Front-Load Critical Rules: Data governance and common pitfalls should appear first
  • Keep It Scannable: Use bullet points, consistent formatting, clear sections

Complete Example: SaaS Analytics Rules

# AwesomeCo Analytics Workspace Rules

## Critical Data Governance
- PII Handling: Mask email domains in all outputs (show user@****.com)
- EU Data: Filter WHERE consent_status = 'granted' for GDPR compliance
- Financial Access: Only use finance.* tables for board reports
- Test Accounts: Always exclude WHERE is_internal = false AND email NOT LIKE '%@dataco.com'

## Source of Truth Tables
- Customers: warehouse.customers_dim (updates 2am UTC daily)
- Subscriptions: billing.active_subscriptions_v2 (v1 deprecated Nov 2024)
- Usage Events: events.user_actions_cleaned (raw has duplicates)
- Revenue: finance.recognized_revenue (not billing.charges)
- Marketing: marketing.campaigns_attributed (30-day window)

## Business Definitions
- MRR: Sum of monthly_amount WHERE status IN ('active', 'paused')
- Active User: 3+ sessions in rolling 30 days, exclude internal
- Churn Date: First day of month after last payment
- LTV: Revenue until churn / (1 - gross_margin), GM = 0.75
- Qualified Lead: Score > 50 AND company_size > 10

## Calculations & Logic
- Fiscal Year: Feb 1 - Jan 31 (Q1 starts February)
- Cohorts: By signup_week, weeks start Monday
- Attribution: First-touch for acquisition, multi-touch for expansion
- Exchange Rates: Use finance.exchange_rates for month-end conversion
- Growth Rates: Month-over-month using compound formula, not simple

## Required Analysis Patterns
- Time Comparisons: Always show YoY and MoM together
- Segments: Break down by plan_type (Starter/Pro/Enterprise)
- Statistical Significance: Min 100 samples for A/B tests
- Forecasting: Use 13-week rolling average for baseline
- Anomaly Detection: Flag if metric deviates >2 std from 30-day mean

## Data Quality Warnings
- orders table: 2% duplicate rows, use orders_deduped
- user_attributes: NULL company_size for 30% of records pre-2024
- revenue_forecast: Only reliable 30 days out
- marketing_spend: Delayed by 3 days, note in analyses

## Stakeholder Preferences
- Executive Reports: Focus on trends not absolute numbers
- Sales Team: Include pipeline_stage breakdown
- Product Team: Segment by feature_flag exposure
- Finance: Show both GAAP and billing revenue

## Industry Context
- Benchmarks: Target 125% NDR (industry avg: 110%)
- Seasonality: Q4 -20% new sales, Q1 +35% (budget cycles)
- Competition: We're 2x pricing of alternatives, emphasize ROI
- Market Position: #3 by revenue, #1 by NPS in our segment

Anti-Patterns to Avoid

Don't State the Obvious

  • ❌ "Write efficient SQL"
  • ❌ "Handle errors properly"
  • ❌ "Consider performance"

Don't Overspecify Technical Details

  • ❌ Full function implementations
  • ❌ Detailed algorithm specifications
  • ❌ Complex regex patterns (unless critical)

Don't Contradict Core Behaviors

  • ❌ Rules about response formatting
  • ❌ General coding standards
  • ❌ Tool usage instructions

Don't Include Secrets

  • ❌ API keys or passwords
  • ❌ Individual customer names
  • ❌ Confidential financial targets

Testing Your Rules

  1. Specificity Test: Can a new analyst follow this rule without asking clarifying questions?
  2. Value Test: Does this rule contain information unique to our organization?
  3. Usage Test: Run typical queries and verify the agent follows your rules
  4. Maintenance Test: Flag rules that reference specific dates, versions, or temporary conditions for regular review