Process Historian vs Time-Series Database

Two ways to store plant data, compared by storage cost, query, integration, and fit — and how to tell which one your operation actually needs.

Every plant that runs long enough ends up with the same argument in the control room. You have years of process data piling up at a few hundred or a few thousand tags a second, and two camps with two answers about where it should live. One says install a process historian, the way plants always have. The other points at the open-source time-series databases that have grown up around web and infrastructure monitoring and asks why you'd pay historian money for the same job. Both are right about something. Neither is right about everything.

The labels don't help, because a historian is a time-series database. It stores values stamped with a time and lets you read them back in order. The useful question isn't which one is the "real" database. It's which one fits the data you actually generate, the people who have to query it, and the network it has to survive on. So let's compare them the way you'd compare any two pieces of plant kit: by what they cost to run, how fast they answer, how they fail, and how much maintenance they demand.

What a process historian actually is

A process historian is purpose-built software for capturing operational measurements off control systems and keeping them for years. The category was shaped by tools like OSIsoft's PI System, and the design assumptions show. Data arrives as tags (one named signal per instrument or calculation), each tag carries a value, a timestamp, and a quality flag, and the server is tuned to swallow that firehose continuously without dropping points.

The part that defines a historian is what it does before it writes. PI and its peers apply two filters in sequence: exception, then compression. Exception throws away readings that haven't moved beyond a deadband at the interface, so a steady temperature doesn't generate a million identical points. Compression then applies a swinging-door algorithm that keeps only the points needed to reconstruct the trend within a set tolerance, discarding the ones a straight line between neighbors would already recover, per OSIsoft's PI Server documentation on exception and compression. The result is that an analog signal noisy enough to look like grass on a trend gets stored as a handful of points an hour, with the shape preserved. That's not a side feature. It's the reason a historian can keep decades of plant history on disk you can afford.

Historians also assume they live in an industrial stack. They speak to controllers and SCADA through OPC, and they expect to sit at Level 3 of the ISA-95 hierarchy, the manufacturing-operations layer that the ISA-95 / IEC 62264 standard defines as the bridge between Level 2 control and Level 4 business systems. Pulling history back out has its own standardized path: OPC UA Historical Access, Part 11 of IEC 62541, defines how a client reads stored time-stamped data and aggregates from any conforming server. So a historian arrives already knowing what a tag, a quality code, and an asset model are. That built-in context is most of what you're buying, and it's also what makes the category feel heavy to anyone coming from general databases.

What a time-series database is

A time-series database (TSDB) is the more general animal. It's a database optimized for data indexed by time, and the modern crop came out of monitoring servers, networks, and applications rather than turbines and boilers. The category has been the fastest-growing one in databases for years. Back in 2016 the DB-Engines popularity index already showed time-series systems up 26.7% over twelve months, more than double the next category, across a field it tracks of more than 300 database systems. That growth bought a lot of engineering, and plants now benefit from it.

The data model is different in a way that matters. Take InfluxDB, one of the common ones. A point has a measurement (think table), a set of tags, a set of fields, and a timestamp. Tags are indexed metadata and fields are the actual values, and that split is the whole performance story: per InfluxData's documentation, queries that filter on tags are fast because tags are indexed, while filtering on a field forces a scan of every matching value. Data goes in over a plain text line protocol, which makes writing from a script or a gateway trivial. Other TSDBs make other bets. TimescaleDB is a PostgreSQL extension that partitions a table into time-based chunks behind a "hypertable" abstraction, so you keep full SQL and the Postgres ecosystem. OpenTSDB stores its series on top of HBase and Hadoop and scales writes by adding nodes; its own docs cite a daemon handling around 2,000 new points per second per core on a 2006-era dual-core Xeon, which tells you the design target is horizontal scale, not single-box tuning.

What a general TSDB usually doesn't ship with is the plant context. There's no built-in notion of a quality code, no asset framework, no OPC server waiting to talk to your PLCs. You get a fast, queryable store and a blank schema. Whether that's freedom or a gap depends entirely on what you're building.

Write path and storage cost

Here's where the two diverge first, and it drives the cost line on your budget. A historian's exception-and-compression pipeline is lossy by design, and that's the point: it decides at ingest which points are worth keeping so the archive stays small for the same span of history. A general TSDB, by default, keeps what you send it. That's honest and reversible (you never lose a sample you might want later), but it means raw plant data at full instrument resolution lands on disk at full size.

So the storage question splits into two different jobs. If you want every raw sample retained forever, a TSDB does that without complaint, and you size your disks accordingly. If you want decades of trend you can still read at a glance without a storage budget that climbs with every new instrument, the historian's compression is doing real work you'd otherwise have to build. TSDBs answer the same need with retention and downsampling instead of ingest-time compression. In InfluxDB you attach a retention policy that expires raw data after, say, two hours, and a continuous query that rolls it into a coarser summary kept for a year, a pattern InfluxData spells out in its downsample-and-retain guide. The default retention policy keeps data forever until you say otherwise, so the discipline is on you to configure it. The historian decides what to throw away at the door; the TSDB decides what to throw away on a schedule. Same goal, opposite end of the pipe.

One practical consequence: a historian's compression assumes the signal is a physical process with a recoverable shape. Feed it event data, counters, or discrete states and the swinging door has nothing useful to do. A TSDB doesn't assume anything about your signal, which is why the same store happily holds analog tags, batch events, and lab results side by side.

Querying what you stored

Storing data is easy. Getting answers back is where engineers either thank you or curse you, so weigh this heavily. And the people asking the questions matter as much as the questions: an operator at a trend wall and a data scientist in a notebook want very different doors into the same archive.

So who actually queries your plant history? If the honest answer is "operators and process engineers, through a vendor's trend client," the historian's narrow path is fine. If it's "whoever in the company has a question and a SQL prompt," the answer changes.

Historians are built around the time-series questions an operator asks: give me this tag's interpolated value at 02:00 last Tuesday, the hourly average across that shift, the time-weighted total over the batch. OPC UA Historical Access standardizes exactly these aggregates so a client can ask any conforming server for them the same way. The catch is that the query languages tend to be tool-specific. You learn the historian's calculation engine and its client, and analytics outside that ecosystem reach the data through OPC or a connector rather than a query you'd write by hand.

TSDBs win on openness and lose on plant smarts. TimescaleDB gives you ordinary SQL, which means anyone who can query Postgres can query your process data, and any BI or data-science tool that speaks SQL is already connected. InfluxDB ships its own query language tuned for time math, with functions for windowing and aggregation that map onto the questions you actually ask of a trend. The cost is that the time-weighted, quality-aware aggregates a historian gives you for free are sometimes yours to assemble. If your data scientists live in SQL and Python, a TSDB removes a wall. If your value comes from operators slicing trends in a tool built for exactly that, the historian's narrower query path is a feature, not a limit.

Interoperability and fit in the stack

This is the criterion that quietly decides most real projects. A historian arrives already fluent in the plant. It connects to control systems over OPC, it understands tags and qualities, and it sits where ISA-95 says the operations layer belongs, feeding Level 4 systems without you writing the glue. If your problem is "I have a DCS and a SCADA system and I need their history in one trustworthy place that the rest of the plant already knows how to talk to," that fluency is most of the value.

A general TSDB starts from zero on all of that. There's no native OPC server in most of them; you bring the data in through a gateway, an MQTT bridge, or a custom collector, and you model tags and quality yourself. For a greenfield deployment with edge gateways already publishing clean data, that's not a burden, it's a clean slate. For a brownfield plant full of legacy controllers, it's a project. The honest read: the closer your data sits to raw control equipment, the more a historian's built-in interoperability earns its keep; the more your data already passes through modern edge infrastructure, the less that built-in plumbing matters and the more a TSDB's openness pays off.

Reliability and maintenance

Both classes of system can be made reliable. They ask for different work to get there.

A commercial historian is a supported product. You get store-and-forward buffering at the interfaces so a network drop doesn't lose data, an asset model that's maintained for you, and a vendor to call at 3 a.m. The maintenance burden is mostly configuration and licensing, not architecture. The flip side is that you operate inside the vendor's assumptions, and scaling or integrating beyond them can be slow.

A TSDB, especially an open-source one, hands you flexibility and the operational load that comes with it. You're responsible for the cluster, the backups, the retention jobs, the upgrades, and the buffering layer that store-and-forward gave you for free. (OpenTSDB's reliance on an HBase/Hadoop cluster is the clearest example: enormous scale, real operational weight.) Managed cloud offerings move that load off your team for a recurring fee. The trade is the usual one between a product that does it your way and a product that does it the vendor's way.

Security and compliance

Wherever the data lands, it's still operational-technology data, and the security model has to match that. NIST's Big Data definitions frame the challenge plainly: this is high-volume, high-velocity data, and velocity in an OT setting means the store sits close to live control. Whatever you connect to the historian or TSDB widens the path between IT and the process. The controls that matter are the standard OT ones, network segmentation per IEC 62443 zones and conduits, authenticated and encrypted transport, least-privilege access, and an architecture that follows NIST's Guide to Industrial Control Systems Security.

The two classes start from different defaults. Established historians ship with mature authentication, auditing, and role models, and they're designed to be the system of record, so the compliance story is largely bought. A self-hosted open-source TSDB gives you whatever you configure and nothing you don't, which is more rope: you can build a tighter system than an off-the-shelf historian, or a far looser one, depending on the discipline of whoever stands it up. The risk isn't the database. It's an analytics box quietly bridged across a segmentation boundary because it was easier than doing it right.

Cost and licensing

Money usually decides, so be clear-eyed about it. But "free" and "cheap" aren't the same word, and the license fee is rarely where the real money goes. A commercial historian is a licensed product, typically priced by tag count or server, plus annual support. You're paying for the compression engine, the OPC connectivity, the asset model, the support contract, and the decades of plant-specific engineering baked in. For a plant that needs all of that, it's often cheaper than rebuilding it.

Open-source TSDBs change the shape of the cost, not necessarily the total. TimescaleDB is offered under the Apache 2.0 license; InfluxDB and OpenTSDB have free open-source cores too. The license is free, but you pay in engineering: the integration, the schema, the retention strategy, the operations, and the on-call. A managed cloud TSDB swaps that for a usage-based bill. The right comparison is total cost of ownership over the life of the data, not the sticker on day one. A two-person team standing up a few thousand tags from existing edge gateways will likely come out ahead with a TSDB; a plant replacing the trust layer for its entire control system rarely does.

Side by side

CriterionProcess historianTime-series database
OriginIndustrial control (PI System and peers)IT/infrastructure monitoring, now general
Storage approachLossy by design: exception + swinging-door compression at ingestKeeps what you send; trims via retention + downsampling
Data modelTag, value, quality, asset framework built inGeneric measurement/series; you model plant context
Query pathTool-specific engine; OPC UA Historical Access aggregatesSQL (TimescaleDB) or time-tuned query language; open tooling
Plant integrationNative OPC, fits ISA-95 Level 3Via gateways/MQTT/collectors; build it yourself
MaintenanceVendor-supported product; config-heavyYou run the cluster, or pay for managed
LicensingPer-tag/server license + supportOpen-source core free; cost moves to engineering/ops

Which fits your plant

Start from the data and the people, not the brochure. If your signals come straight off a DCS and SCADA, if operators are the main consumers and they live in trend tools, and if you need a system of record the whole plant already trusts, a process historian is the lower-risk answer. Its compression, OPC fluency, and ISA-95 fit are exactly the work you'd otherwise have to fund yourself, and a vendor carries the support.

If your data already passes through modern edge gateways, if your main consumers are engineers and data scientists who want SQL and open tooling, and if you're mixing process tags with events, lab results, and model outputs in one place, a general time-series database fits better. You trade built-in plant context for openness and control, and you take on the operations in exchange.

In our work instrumenting processing plants, the split that holds up best isn't either-or. The edge captures and buffers full-resolution data, a TSDB serves the engineering and analytics layer where flexibility and open queries pay off, and a historian remains the operational system of record where one is already trusted and the OPC and asset model matter. That's the thinking behind our edge telemetry and analytics platform, and it's why "historian versus database" is usually the wrong frame. The real question is which store owns which job. Answer that, and the kit follows.

References

  1. NIST Big Data Interoperability Framework: Volume 1, Definitions (SP 1500-1r1)
  2. Time Series DBMS are the database category with the fastest increase in popularity
  3. InfluxDB Key Concepts
  4. InfluxDB: Downsample and retain data
  5. PI Server — exception and compression
  6. OpenTSDB FAQ and documentation
  7. TimescaleDB (project repository)
  8. OPC UA (IEC 62541), Historical Access (Part 11)
  9. ISA-95 / IEC 62264, Enterprise-Control System Integration
  10. Guide to Industrial Control Systems (ICS) Security (SP 800-82 Rev. 2)

Reuse & license

This article is published by Zoniax Innovations LLC under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. You are free to share and adapt it for any purpose, including commercially, as long as you give appropriate credit to Zoniax and link back to the original article.

Disclaimer

These Field Notes are general technical information, published as-is for industry peers. They are not professional, engineering, safety, legal, or financial advice, and nothing here is a recommendation to buy, sell, or act. Figures are cited from public sources believed reliable but are not independently guaranteed — verify them against the primary sources and your own plant conditions before acting. Zoniax Innovations LLC and the author accept no liability for decisions made from this content. Naming a standard, product, or vendor is not an endorsement.

Cite this article

Nõmm, A. (2019). Process Historian vs Time-Series Database. Zoniax. https://zoniax.com/blog/posts/time-series-database-vs-historian