Building a Health Data Pipeline - iOS Auto Export to FastAPI & PostgreSQL

This article is part of a series.

View all 1 parts

Part 1 – This Article

This article walks through building a working health data pipeline that automatically syncs your Apple Health data to a PostgreSQL database using FastAPI. No fluff, just the code that actually works.

The Problem: Health Data are Trapped

ios-to-warehouse architecture diagram showing data flow from Apple Health to FastAPI to PostgreSQL

I've been wearing an Apple Watch for years, collecting thousands of data points about my heart rate, sleep, workouts, and more. But here's the thing—all that valuable data just sits there in the Health app, basically useless for any real analysis.

As a data engineer, this drives me nuts. I want to run SQL queries against my health data. I want to build dashboards. I want to correlate my stress levels with my sleep quality. But Apple keeps it all locked up in their walled garden.

Enter the Auto Export iOS app . This little gem automatically exports your Apple Health data and can send it via webhook to any API endpoint. Perfect! Now I just need somewhere to catch that data and store it properly.

What We're Building

Here's what our pipeline looks like:

Auto Export iOS App → Automatically exports health data from Apple Health
FastAPI Server → Receives webhooks, validates data with Pydantic models
PostgreSQL Database → Stores everything in a properly normalized schema

The best part? This is production code that I'm actually running. No toy examples here.

Setting Up Auto Export (The Easy Part)

First, grab the Auto Export app from the App Store. It's $2.99 and worth every penny.

health-auto-export-json+csv

Configuration Steps

Grant Health Access : The app needs permission to read your Health data > Important note here, many apps collect tons of data about you. Auto Export does not. Try contrasting the app store's warning between Facebook Messenger and Auto Export. 😂
Choose Your Data : Select which metrics you want exported (I export everything)
Set Up Webhook : Point it to https://your-domain.com/api/v1/sync
Schedule Exports : I have mine set to sync every hour

The app will send JSON payloads that look like this:

{
  "data": {
    "metrics": [
      {
        "name": "heart_rate",
        "units": "bpm",
        "data": [
          {
            "date": "2024-01-01T12:00:00+00:00",
            "avg": 72,
            "min": 68,
            "max": 85
          }
        ]
      }
    ],
    "workouts": []
  }
}

Pretty clean! Now let's build something to catch it.

FastAPI Server: The Real Work Begins

Here's where things get interesting. Health data are messy. Different metrics have completely different structures. Heart rate has min/avg/max values. Blood pressure has systolic/diastolic. Sleep analysis has start and end times. How do we handle all this variety?

Project Structure

app/
├── __init__.py
├── main.py
├── dependencies.py
├── api/
│   ├── models.py          # Pydantic models for validation
│   └── v1/
│       └── endpoints.py   # API routes
└── db/
    ├── database.py        # Database connection
    ├── session.py         # Session management
    ├── models.py          # SQLAlchemy models
    ├── insert_logic.py    # Data insertion logic
    └── schema.sql         # Database schema

Pydantic Models: Taming the Data Beast

The heart of our validation system lives in app/api/models.py . Here's the base model:

class TZBaseModel(BaseModel):
    model_config = ConfigDict(str_to_datetime_mode="iso8601", populate_by_name=True)

    def get_date(self) -> datetime:
        return getattr(self, "date", getattr(self, "timestamp", None))

This handles timezone-aware datetime parsing automatically.

Recently, I was discussing the importance of data validation with a front-end engineer. The argument was "it was not too important"--and to be clear, this engineer saw it as a important, however, not "very" important. This was a problem for me. I explained my perspective.

Data are what is gained. As a user interacts with an application data are generated. Those data help ensure the user does not have to remember everything they've done--as the database can tell the user they bought lingerie on October 12th, 2018. But only if those data are validated when they are inserted into the database. Otherwise, users may end up making a lot of purchases on 1970-01-01T00:00:00Z

Always validate your data. 💯

Specialized Models for Different Health Metrics

Different health metrics need different validation. Here's how we handle that:

class BloodPressure(TZBaseModel):
    date: datetime
    systolic: float
    diastolic: float

class HeartRate(TZBaseModel):
    date: datetime
    min: Optional[float] = Field(None, alias="Min")
    avg: Optional[float] = Field(None, alias="Avg")
    max: Optional[float] = Field(None, alias="Max")
    context: Optional[str] = None
    source: Optional[str] = None

class SleepAnalysis(TZBaseModel):
    start_date: datetime = Field(..., alias="startDate")
    end_date: datetime = Field(..., alias="endDate")
    value: Optional[str] = None
    qty: Optional[float] = None
    source: Optional[str] = None

Notice how we use Field aliases to handle the inconsistent naming from the iOS app. Some fields are Min , others are min . Pydantic handles all of this gracefully. And after syncing 10 years of Apple Health data several times, I can say Health data can be a wee bit messy. The sort of flexibility SQLAlchemy and Pydantic provide is invaluable.

The Dispatcher Pattern

Here's where it gets clever. We use a dispatcher to route different metric types to their specialized models:

SPECIALIZED_METRICS = {
    "blood_pressure": BloodPressure,
    "heart_rate": HeartRate,
    "sleep_analysis": SleepAnalysis,
    "blood_glucose": BloodGlucose,
    # ... more metrics
}

def parse_metric(metric: HealthMetric) -> List[TZBaseModel]:
    model_cls = SPECIALIZED_METRICS.get(metric.name.lower())
    parsed = []

    for entry in metric.data:
        try:
            # Use specialized model if available, otherwise default
            model = model_cls(**entry) if model_cls else QuantityTimestamp(**entry)
            parsed.append(model)
        except Exception as e:
            logger.warning(f"Skipping invalid entry: {e}")

    return parsed

This means when we get heart rate data, it automatically uses the HeartRate model. Blood pressure uses BloodPressure . Anything we don't have a specialized model for falls back to the generic QuantityTimestamp .

Handling DateTime Chaos

iOS datetime formatting is... inconsistent. Sometimes you get 2024-01-01 12:00:00 +0000 , sometimes 2024-01-01T12:00:00+00:00 . Our normalization function fixes this:

def normalize_datetime_strings(payload: dict) -> dict:
    def fix_datetime_string(dt_str: str) -> str:
        dt_str = dt_str.strip()

        # Convert space to T: "2024-01-01 12:00:00" -> "2024-01-01T12:00:00"
        dt_str = re.sub(r"^(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2})", r"\1T\2", dt_str)

        # Fix timezone: "+0000" -> "+00:00"
        dt_str = re.sub(r"([+-]\d{2})(\d{2})$", r"\1:\2", dt_str)

        # Remove extra spaces
        dt_str = re.sub(r"(T\d{2}:\d{2}:\d{2})\s+([+-]\d{2}:\d{2})", r"\1\2", dt_str)
        return dt_str

    def walk(obj):
        if isinstance(obj, dict):
            return {k: walk(v) for k, v in obj.items()}
        elif isinstance(obj, list):
            return [walk(i) for i in obj]
        elif isinstance(obj, str):
            if re.search(r"\d{2}:\d{2}:\d{2}", obj) and re.search(r"[+-]\d{2}:?\d{2}$", obj):
                return fix_datetime_string(obj)
        return obj

    return walk(payload)

This recursively walks through the entire payload and fixes any datetime strings it finds. Clean! However, may not be the must efficient.

The Main API Endpoint

Here's our webhook endpoint that receives the data:

@router.post("/sync", status_code=status.HTTP_201_CREATED)
async def receive_health_data(request: Request, db: AsyncSession = Depends(get_db)):
    try:
        raw = await request.json()
        normalized = normalize_datetime_strings(raw)
        parsed = api_models.parse_payload(normalized["data"])

        print(f"Received {len(parsed.metrics)} metrics and {len(parsed.workouts)} workouts")

        await insert_health_data(parsed, db)
        return {"message": "Data received and stored successfully"}

    except Exception as e:
        logger.exception("Failed to process /sync payload")
        return JSONResponse(status_code=400, content={"error": str(e)})

Simple and clean. Get the JSON, normalize the datetimes, parse with Pydantic, insert into the database.

PostgreSQL Schema: Where the Data Lives

Our database schema is designed around the hierarchical nature of the health data. Here's the core structure:

Core Tables

-- A batch of health data (one webhook call)
CREATE TABLE apple_health.health_payload (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    received_at TIMESTAMP WITH TIME ZONE DEFAULT now()
);

-- Individual metrics within a payload
CREATE TABLE apple_health.health_metric (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    payload_id UUID REFERENCES apple_health.health_payload(id) ON DELETE CASCADE,
    name TEXT NOT NULL,
    units TEXT NOT NULL
);

-- Default storage for simple quantity+timestamp data
CREATE TABLE apple_health.quantity_timestamp (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    metric_id UUID REFERENCES apple_health.health_metric(id) ON DELETE CASCADE,
    date TIMESTAMP WITH TIME ZONE NOT NULL,
    qty DOUBLE PRECISION NOT NULL,
    source TEXT
);

Specialized Tables

For complex metrics, we have dedicated tables:

-- Blood pressure readings
CREATE TABLE apple_health.blood_pressure(
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    metric_id UUID REFERENCES apple_health.health_metric(id) ON DELETE CASCADE,
    date TIMESTAMP WITH TIME ZONE NOT NULL,
    systolic DOUBLE PRECISION NOT NULL,
    diastolic DOUBLE PRECISION NOT NULL
);

-- Heart rate with min/avg/max
CREATE TABLE apple_health.heart_rate (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    metric_id UUID REFERENCES apple_health.health_metric(id) ON DELETE CASCADE,
    date TIMESTAMP WITH TIME ZONE NOT NULL,
    min DOUBLE PRECISION,
    avg DOUBLE PRECISION,
    max DOUBLE PRECISION,
    context TEXT,
    source TEXT
);

-- Sleep analysis with start/end times
CREATE TABLE apple_health.sleep_analysis (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    metric_id UUID REFERENCES apple_health.health_metric(id) ON DELETE CASCADE,
    start_date TIMESTAMPTZ NOT NULL,
    end_date TIMESTAMPTZ NOT NULL,
    qty DOUBLE PRECISION,
    value TEXT,
    source TEXT
);

The beauty of this design is that simple metrics (like step count) go into the generic quantity_timestamp table, while complex metrics get their own specialized tables with proper typing.

SQLAlchemy Models

Our SQLAlchemy models mirror the database structure:

class HealthPayload(Base, AppleHealthMixin):
    __tablename__ = "health_payload"

    id = Column(UUID, primary_key=True, server_default=func.gen_random_uuid())
    received_at = Column(DateTime(timezone=True), server_default=func.now())
    metrics = relationship("HealthMetric", back_populates="payload", cascade="all, delete-orphan")

class HealthMetric(Base, AppleHealthMixin):
    __tablename__ = "health_metric"

    id = Column(UUID, primary_key=True, server_default=func.gen_random_uuid())
    payload_id = Column(UUID, ForeignKey("apple_health.health_payload.id", ondelete="CASCADE"))
    name = Column(Text, nullable=False)
    units = Column(Text, nullable=False)

    payload = relationship("HealthPayload", back_populates="metrics")
    quantity_data = relationship("QuantityTimestamp", back_populates="metric", cascade="all, delete-orphan")
    blood_pressure = relationship("BloodPressure", backref="metric", cascade="all, delete-orphan")
    heart_rate = relationship("HeartRate", backref="metric", cascade="all, delete-orphan")

Notice the AppleHealthMixin that sets up the schema and UUID generation automatically.

Data Insertion: Making It All Work

The insertion logic is where everything comes together. Here's how we handle the variety of metric types:

async def insert_health_data(payload: HealthPayload, db: AsyncSession):
    # Create the main payload record
    payload_obj = HealthPayload()
    db.add(payload_obj)
    await db.flush()  # Get the ID

    for metric in payload.metrics:
        # Create metric record
        metric_obj = HealthMetric(
            id=uuid4(), 
            name=metric.name, 
            units=metric.units, 
            payload=payload_obj
        )
        db.add(metric_obj)

        if not metric.data:
            continue

        # Determine which database model to use
        metric_type = metric.name.lower()
        db_model_cls = SPECIALIZED_DB_MODELS.get(metric_type)

        try:
            if db_model_cls is None:
                # Default to QuantityTimestamp
                db_objs = [
                    QuantityTimestamp(
                        metric=metric_obj,
                        date=entry.get_date(),
                        qty=entry.qty,
                        source=entry.source,
                    )
                    for entry in metric.data
                ]
            else:
                # Use specialized model
                db_objs = [
                    db_model_cls(
                        metric=metric_obj,
                        **entry.model_dump(exclude={"id"}, exclude_unset=True),
                    )
                    for entry in metric.data
                ]

            db.add_all(db_objs)
            await db.flush()

        except Exception as e:
            logger.warning(f"Skipping metric '{metric.name}' due to error: {e}")

    await db.commit()

This creates the payload, then for each metric determines whether to use a specialized table or the default quantity_timestamp table.

FastAPI Setup and Dependencies

The main application setup is straightforward:

# app/main.py
from fastapi import FastAPI
from contextlib import asynccontextmanager

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Run database initialization on startup
    await db.create_tables(engine)
    yield

app = FastAPI(lifespan=lifespan)
app.include_router(api_router, prefix="/api/v1")

The database dependency uses async sessions:

# app/dependencies.py
async def get_db() -> AsyncGenerator[AsyncSession, None]:
    async with AsyncSessionLocal() as session:
        yield session

Clean and simple!

Running the System

Development Setup

Install dependencies:

pip install poetry
poetry install

Set up your environment variables in .env :

DATABASE_URL=postgresql+asyncpg://user:password@localhost:5432/health_db

Run the development server:

poetry run poe dev

This starts uvicorn with hot reloading on port 8000.

What Actually Happens

Once everything is running:

Auto Export sends a webhook every hour with your latest health data
FastAPI receives the JSON payload
Pydantic validates and parses the data into typed models
The data gets inserted into the appropriate PostgreSQL tables
You can now query your health data with SQL!

Querying Your Health Data

Now comes the fun part! With all your health data in PostgreSQL, you can run queries like:

-- Average heart rate by day
SELECT 
    DATE(date) as day,
    AVG(avg) as avg_heart_rate
FROM apple_health.heart_rate 
WHERE date > NOW() - INTERVAL '30 days'
GROUP BY DATE(date)
ORDER BY day;

-- Sleep duration trends
SELECT 
    DATE(start_date) as night,
    EXTRACT(EPOCH FROM (end_date - start_date))/3600 as hours_slept
FROM apple_health.sleep_analysis
WHERE start_date > NOW() - INTERVAL '90 days'
ORDER BY night;

This is the payoff! Your health data are now queryable, joinable, and ready for analysis.

Production Deployment: The Real-World Gotchas

Getting this running locally is one thing. Getting it running in production? That's where you hit the fun stuff that nobody tells you about in tutorials.

SSL Certificates: iOS is Picky

Here's something that cost me a few hours of debugging: iOS doesn't like all SSL certificates equally. I initially tried using Cloudflare Edge certificates, and the Auto Export app would just fail silently. No error messages, no logs, just... nothing.

The fix? Use Certbot with Let's Encrypt certificates instead:

sudo apt update
sudo apt install certbot python3-certbot-nginx

# Get a certificate for your domain
sudo certbot --nginx -d your-api-domain.com

iOS is apparently pickier about certificate issuers than browsers are. Who knew? Save yourself the debugging time and just use Let's Encrypt from the start.

Nginx Configuration: Handle Large Payloads

The Auto Export app can send pretty hefty JSON payloads, especially if you're exporting workout data with GPS routes. The default nginx configuration will reject these with a 413 error.

Here's the nginx config that actually works:

server {
    listen 443 ssl;
    server_name your-api-domain.com;
    client_max_body_size 2000M; # 2GB - health data can get big!

    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

After updating your nginx config, always test and reload:

sudo nginx -t && sudo systemctl reload nginx

That nginx -t command has saved me from taking down production more times than I'd like to admit.

The Auto Export App Configuration

Once your server is running with proper SSL, configure the Auto Export app:

Enabled : Make sure the webhook is enabled if you want it sync'ing in the background. Though, you might leave this off until you've verified everything is working.
Automation Type : Set to REST API
Webhook URL : https://your-api-domain.com/api/v1/sync
Select Health Metrics : I select everything, but you can choose specific metrics if you want to limit the data
Time Interval : I set it to 1 hour, but you can go as low as 5 minutes if you want more frequent updates
Content Type : application/json (this may not be needed according to the documentation)
Date Range : I set this to "Today" for now
Batch Requests : this one is important. Most likely, you have a lot of Health Data. Turning on batching will allow the Auto Export app to send chunks of your health data. This prevents your iOS device from crashing due to running out of memory and crashing.
Export Format : JSON (the app supports CSV, but JSON is more flexible for our needs)
Aggregate Data : I leave this unchecked, as I want raw data for analysis
Export Schedule : I use every hour, but you can go as frequent as every 5 minutes
Sync Cadence : I set this to 5 minutes, but you can adjust based on how often you want new data
Interval : I set to minutes

The first successful sync is magical. You'll see your FastAPI logs light up with incoming data, and suddenly your database starts filling with actual health metrics.

Manual Export

After you've setup the sync job, let's test it out. Go to Manual Export and tap it.

manual-auto-export

This should bring up the following screen. Ensure your FastAPI endpoint is ready. Then, set the date range to start and end on the same date. Lastly, hit the Begin Export button. This will begin sending your Apple Health data to your FastAPI endpoint. 🎉

Expectations

Apple Health data are high volume data. The initial sync may take several days. My recommendation is the following:

Before going to sleep, plug in your phone, ensure it is connected to the internet, bring the Auto Export app to the foreground, and then tap Manual Export . Leave it over night
Try to sync chunks of data by date, using a date range that's easy to remember to prevent redundant syncing
The REST API server and Postgres database will ensure duplicate data are not inserted, so you can safely re-run the sync if needed. Idempotent, they call it. Or so I'm told.

I'm exploring ways to make this faster on the REST API side, however, it appears the Auto Export app takes awhile to query the data, form them into JSON, and post them to our endpoint. Sadly, we can't do much about it, unless we want to write our own iOS app.

What's Next?

This system is working great for me, but there are plenty of directions to take it:

Add data visualization with something like Lightdash
Build automated health reports
Set up alerts for concerning trends
Add machine learning for health predictions
Integrate with other health data sources

The foundation is solid and extensible. Adding new metric types is as simple as creating a new Pydantic model and database table.

The Real Win

The best part about this system? It just works. I set it up months ago, and it's been quietly collecting and organizing my health data ever since. No maintenance, no babysitting, just reliable data collection.

As data engineers, we often get caught up in complex architectures and fancy technologies. But sometimes the best solution is just clean code, good validation, and a solid database. This pipeline proves that with the right tools—FastAPI, Pydantic, and PostgreSQL—you can build something robust and maintainable that solves a real problem.

Now I can finally answer questions like "How does my sleep quality correlate with my heart rate variability?" and "What's my stress pattern throughout the week?" All because we took the time to build a proper data pipeline.

Your health data doesn't have to stay trapped in the iOS Health app. With a little code and the right architecture, you can set it free and start generating real insights about your well-being.