Interactive Dashboard - Advanced Guide

This guide covers technical architecture, customization, performance optimization, and development topics for the Google-Go Interactive Dashboard.

Technical Architecture

Technology Stack

Core Framework:

Dash (Plotly): Web application framework for Python
Plotly: Interactive visualization library
Dash Bootstrap Components: UI component library

Data Processing:

Pandas: Data manipulation and analysis
NumPy: Numerical computing
Parquet/CSV: Data storage formats

Deployment:

Flask: WSGI web server (built into Dash)
Gunicorn: Production WSGI server (optional)

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                     Browser (User Interface)                 │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐      │
│  │ Tab 1    │ │ Tab 2    │ │ Tab 3    │ │ Tab 4    │ ...  │
│  └──────────┘ └──────────┘ └──────────┘ └──────────┘      │
└────────────────────────┬────────────────────────────────────┘
                         │ HTTP/WebSocket
┌────────────────────────┴────────────────────────────────────┐
│                   Dash Application (app.py)                  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │              Callback Management Layer                │  │
│  │  • User input handlers                                │  │
│  │  • Plot generation                                    │  │
│  │  • Data filtering                                     │  │
│  └────────┬─────────────────────────────────────┬────────┘  │
└───────────┼─────────────────────────────────────┼───────────┘
            │                                     │
┌───────────┴───────────┐           ┌────────────┴────────────┐
│   Layout Components   │           │   Utility Modules       │
│   (layouts/)          │           │   (utils/)              │
│                       │           │                         │
│ • single_scenario     │           │ • DataLoader            │
│ • cross_scenario      │           │   - CSV/Parquet I/O    │
│ • deadzone            │           │   - Caching            │
│ • timeseries          │           │   - Data filtering     │
│ • insights            │           │                         │
└───────────────────────┘           │ • ColorMapper           │
                                    │   - Carrier colors     │
                                    │   - Consistent themes  │
                                    └────────┬────────────────┘
                                             │
┌────────────────────────────────────────────┴────────────────┐
│                    Data Layer (results/)                     │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  results.csv (Consolidated)                           │  │
│  │  • Multi-level headers (year, scenario, scope)        │  │
│  │  • Multi-level index (metric, y-label, carrier)       │  │
│  │  • ~145 rows × ~400 columns                           │  │
│  └──────────────────────────────────────────────────────┘  │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  results_frontier.csv                                 │  │
│  │  • Frontier analysis data                             │  │
│  │  • Multiple scenarios, years, countries               │  │
│  └──────────────────────────────────────────────────────┘  │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  results_time_series.parquet                          │  │
│  │  • Hourly data (8,760 hours/year)                     │  │
│  │  • ~millions of data points                           │  │
│  │  • Chunked loading with caching                       │  │
│  └──────────────────────────────────────────────────────┘  │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  colors.csv                                           │  │
│  │  • Carrier color mappings                             │  │
│  │  • Ensures visual consistency                         │  │
│  └──────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────┘

Data Flow

Startup: DataLoader loads all consolidated results into memory
User Selection: User selects year, scenario, metric via dropdowns
Callback Trigger: Dash detects input changes and fires registered callbacks
Data Filtering: DataLoader filters data based on user selections
Plot Generation: Callback creates Plotly figure with filtered data
Color Mapping: ColorMapper applies consistent colors to carriers
Rendering: Plotly renders interactive visualization in browser

File Structure

dashboard/
├── app.py                          # Main application entry point
├── callbacks.py                    # All callback functions (17 callbacks)
├── layouts/                        # Tab-specific layouts
│   ├── __init__.py
│   ├── single_scenario_layout.py   # Single scenario analysis
│   ├── cross_scenario_layout.py    # Multi-scenario comparison
│   ├── deadzone_layout.py          # Frontier analysis
│   ├── timeseries_layout.py        # Hourly timeseries
│   └── insights_layout.py          # Statistical insights (static)
├── utils/                          # Utility modules
│   ├── __init__.py
│   ├── data_loader.py              # Data loading and caching
│   └── colors.py                   # Color mapping utilities
└── assets/                         # Static assets (CSS, images)
    └── custom.css                  # Custom styling

Callback Architecture

The dashboard uses Dash's callback system for interactivity:

@app.callback(
    Output('plot-id', 'figure'),      # What to update
    [Input('dropdown-id', 'value')]   # What triggers the update
)
def update_plot(selected_value):
    # Filter data based on input
    data = data_loader.get_data(selected_value)
    # Generate plot
    fig = create_plot(data)
    return fig

17 callbacks handle all dashboard interactivity:

5 for Single Scenario Analysis
1 for Cross-Scenario Comparison
3 for Dead Zone Analysis
5 for Timeseries Exploration
3 for dynamic dropdown population

Callback Best Practices

Keep callbacks focused: Each callback should handle one specific UI update
Use caching: Data is cached in DataLoader to avoid redundant I/O
Handle errors gracefully: Return empty figures with error messages
Validate inputs: Check for None/invalid values before processing
Minimize data transfer: Only send necessary data to browser

Multi-Index Data Handling

Pandas MultiIndex is used extensively for efficient data organization:

# Column MultiIndex
columns = pd.MultiIndex.from_tuples([
    (2025, 'baseline', 'system'),
    (2025, 'energy-match-25', 'system'),
    # ...
], names=['year', 'scenario', 'scope'])

# Row MultiIndex
index = pd.MultiIndex.from_tuples([
    ('(a) Energy mix', 'Net generation (TWh)', 'solar'),
    ('(a) Energy mix', 'Net generation (TWh)', 'onwind'),
    # ...
], names=['Results', 'y_label', 'carrier'])

# Fast filtering with IndexSlice
idx = pd.IndexSlice
data = df.loc[idx['(a) Energy mix', :, :], idx[2025, 'baseline', :]]

Why MultiIndex?

Fast filtering: O(log n) lookups vs O(n) for single index
Natural hierarchy: Reflects data structure (metric → label → carrier)
Memory efficient: Shared index objects reduce memory overhead
Pandas optimized: Built-in support for aggregations and operations

Performance Optimization

Memory Management

Typical memory usage:

Dashboard base: ~100-200 MB
Consolidated results: ~50-100 MB
Frontier data: ~10-20 MB
Timeseries cache: ~500 MB (50 queries)
Total: ~1-2 GB for typical usage

For large datasets:

Use Parquet format (10-20x compression vs CSV)
Limit timeseries cache size
Use time range filtering instead of loading full year
Deploy with adequate RAM (4GB+ recommended)

Loading Speed

Startup time:

Consolidated results.csv: ~1-2 seconds
Frontier data: ~0.5 seconds
Timeseries metadata: ~5-10 seconds (parquet), ~30-60 seconds (CSV)

Query response time:

Aggregated plots: ~0.1-0.5 seconds
Timeseries plots (cached): ~0.2-0.5 seconds
Timeseries plots (uncached): ~2-10 seconds (parquet), ~10-30 seconds (CSV)

Optimization Recommendations

1. Convert timeseries to Parquet:

import pandas as pd
df = pd.read_csv('results_time_series.csv')
df.to_parquet('results_time_series.parquet', compression='snappy')

2. Increase cache size for repeated queries (edit data_loader.py):

if len(self.timeseries_cache) > 100:  # Increase to 100
    self.timeseries_cache.pop(next(iter(self.timeseries_cache)))

3. Use shorter time ranges for exploratory analysis

4. Deploy with SSD for faster I/O

Advanced Customization

Modifying Plot Types

To add new visualizations, edit dashboard/callbacks.py:

def create_plot(data, metric, plot_type, color_mapper):
    if plot_type == 'my_custom_plot':
        fig = go.Figure()
        # Add your custom plot logic
        for carrier in data.index:
            fig.add_trace(go.Scatter(
                x=years,
                y=data.loc[carrier],
                name=carrier,
                marker=dict(color=color_mapper.get_color(metric, carrier))
            ))
        return fig

Then add to dropdown in layouts/single_scenario_layout.py:

options=[
    {'label': 'Bar Chart', 'value': 'bar'},
    {'label': 'My Custom Plot', 'value': 'my_custom_plot'},
    # ...
]

Adding New Tabs

To add a new analysis tab:

1. Create layout in layouts/my_new_tab.py:

def create_my_tab_layout(data_loader):
    return dbc.Container([
        html.H3("My New Analysis"),
        dcc.Graph(id='my-plot'),
        # Add controls...
    ])

2. Register in app.py:

from layouts import my_new_tab

# Add tab
dcc.Tab(label='My Analysis', value='my-tab')

# Add layout
html.Div(my_new_tab.create_my_tab_layout(data_loader),
         id='my-content', style={'display': 'none'})

3. Add callback in callbacks.py:

@app.callback(
    Output('my-plot', 'figure'),
    [Input('my-selector', 'value')]
)
def update_my_plot(value):
    # Your plot logic
    return fig

Update tab visibility callback in app.py

Adjusting Data Caching

Edit dashboard/utils/data_loader.py:

# Change cache size (default: 50 entries)
if len(self.timeseries_cache) > 100:  # Increase to 100
    self.timeseries_cache.pop(next(iter(self.timeseries_cache)))

Trade-off: Larger cache → more memory usage, faster repeated queries

API Reference

DataLoader Class

Location: dashboard/utils/data_loader.py

Methods:

# Load all data at startup
data_loader.load_all_data()

# Get summary statistics
stats = data_loader.get_summary_stats()
# Returns: {'years': [...], 'scenarios': [...], 'metrics': [...]}

# Get filtered data
data = data_loader.get_data(
    year=2035,              # int or None
    scenario_name='baseline',  # str or None
    metric='(a) Energy mix'     # str or None
)

# Get carriers for a metric
carriers = data_loader.get_carriers_for_metric('(a) Energy mix')

# Get frontier data
frontier = data_loader.get_frontier_data(
    year=2035,
    country='EU'
)

# Get frontier countries
countries = data_loader.get_frontier_countries(year=2035)

# Get timeseries metadata
metadata = data_loader.get_timeseries_metadata()

# Load timeseries data
data, timestamps = data_loader.load_timeseries_data(
    year=2035,
    scenarios=['baseline', 'energy-match-25'],
    ts_type='Electricity Balance',
    country='EU',
    carriers=['solar', 'onwind'],
    time_range='week_winter'
)

ColorMapper Class

Location: dashboard/utils/colors.py

Methods:

# Initialize with colors.csv
color_mapper = ColorMapper('../results/colors.csv')

# Get color for carrier in metric
color = color_mapper.get_color('(a) Energy mix', 'solar')

# Get all colors for a metric
colors = color_mapper.get_colors_for_metric('(a) Energy mix')

# Format scenario names for display
display_name = format_scenario_name('hourly-match-50-90')
# Returns: "Hourly 90% (CI 50%)"

Development

Code Style

Follow PEP 8 style guidelines
Use descriptive variable names
Add docstrings to all functions
Comment complex logic

Testing Changes

Test with full dataset
Verify all tabs load correctly
Check all dropdown combinations
Test edge cases (empty data, single year, etc.)
Verify timeseries loading with both Parquet and CSV

Debugging Tips

Check browser console (F12) for JavaScript errors
Review Python logs in terminal for backend errors
Use Dash debug mode (default in dev): python app.run_server(debug=True)
Print callback inputs to verify data flow
Test with minimal data to isolate issues

Contributing

Submitting Updates

Document all changes in code comments
Update user guide if adding features
Test on clean Python environment
Ensure backward compatibility with existing data

Code Organization

Layouts: UI components only, no business logic
Callbacks: Data processing and plot generation
Utils: Reusable data loading and utility functions
Assets: Static CSS/images only

Further Resources

Dash Documentation: https://dash.plotly.com/
Plotly Python: https://plotly.com/python/
Pandas MultiIndex: https://pandas.pydata.org/docs/user_guide/advanced.html
Parquet Format: https://parquet.apache.org/
Dash Bootstrap Components: https://dash-bootstrap-components.opensource.faculty.ai/

Production Deployment

Using Gunicorn

For production environments:

pip install gunicorn
gunicorn app:server -b 0.0.0.0:8050 --workers 4 --timeout 300 --log-level info

Configuration options:

--workers 4: Number of worker processes (use 2-4 × CPU cores)
--timeout 300: Request timeout in seconds (5 minutes for large queries)
-b 0.0.0.0:8050: Bind address (0.0.0.0 = all interfaces)
--log-level info: Logging verbosity (debug, info, warning, error)

Using systemd (Linux)

Create service file /etc/systemd/system/dashboard.service:

[Unit]
Description=Google-Go Dashboard
After=network.target

[Service]
User=www-data
WorkingDirectory=/path/to/google-go/dashboard
ExecStart=/usr/bin/gunicorn app:server -b 0.0.0.0:8050 --workers 4 --timeout 300
Restart=always

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl enable dashboard
sudo systemctl start dashboard

Using Docker

Create Dockerfile:

FROM python:3.10-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY dashboard/ dashboard/
COPY results/ results/

WORKDIR /app/dashboard

EXPOSE 8050

CMD ["gunicorn", "app:server", "-b", "0.0.0.0:8050", "--workers", "4", "--timeout", "300"]

Build and run:

docker build -t dashboard .
docker run -p 8050:8050 dashboard

Nginx Reverse Proxy

For production deployments behind Nginx:

server {
    listen 80;
    server_name dashboard.example.com;

    location / {
        proxy_pass http://127.0.0.1:8050;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";

        # Timeout settings
        proxy_connect_timeout 300;
        proxy_send_timeout 300;
        proxy_read_timeout 300;
    }
}

Security Considerations

Data Access

Dashboard serves data to all users - ensure data is not confidential
No authentication by default - add if needed
Consider IP whitelisting for internal deployments

Adding Authentication

Use dash-auth for basic authentication:

import dash_auth

# Add after creating app
VALID_USERNAME_PASSWORD_PAIRS = {
    'username': 'password'
}

auth = dash_auth.BasicAuth(
    app,
    VALID_USERNAME_PASSWORD_PAIRS
)

For production, use proper authentication (OAuth, LDAP, etc.)

HTTPS

Always use HTTPS in production: - Terminate SSL at Nginx/load balancer - Use Let's Encrypt for free certificates - Redirect HTTP to HTTPS

Monitoring and Logging

Application Logging

Add logging to app.py:

import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('dashboard.log'),
        logging.StreamHandler()
    ]
)

logger = logging.getLogger(__name__)

Performance Monitoring

Monitor key metrics: - Memory usage: ps aux | grep python - Request latency: Log callback execution time - Error rates: Count exceptions in logs - Active users: Track concurrent connections

Health Checks

Add health check endpoint in app.py:

from flask import jsonify

@app.server.route('/health')
def health():
    return jsonify({"status": "healthy"}), 200

Troubleshooting Advanced Issues

High Memory Usage

Symptoms: Dashboard consumes >4GB RAM

Solutions:

Reduce timeseries cache size
Clear cache periodically: (see below)
Use time-based cache eviction
Deploy with more RAM

# Add to data_loader.py
def clear_cache(self):
    self.timeseries_cache.clear()

Slow Callback Execution

Symptoms: Plots take >5 seconds to render

Solutions:

Profile callbacks: (see below)
Optimize data filtering queries
Pre-compute expensive calculations
Use Parquet for timeseries

import time
start = time.time()
# ... callback code ...
print(f"Callback took {time.time() - start:.2f}s")

Concurrent User Issues

Symptoms: Dashboard slows with multiple users

Solutions:

Increase Gunicorn workers
Use Redis for shared caching
Deploy multiple instances with load balancer
Consider stateless architecture

Support and Community

For technical questions or contributions:

Review this advanced guide thoroughly
Check the main user guide for usage questions
Search GitHub issues for similar problems
Contact the development team with specific technical questions

Dashboard Status: Production-ready, actively maintained