A Time series file format for the Fluksometer [solved]

By https://www.flukso.net/files/presentations/flukso.20140425.pdf the FLM grand master introduces and discusses a time series storing concept that shall efficiently be implemented into the FLM world. "Efficiently" here denotes especially the capability to be compressed and easily accessed later on.
I'd like to jump on this train and instantly provide some further requirements that specifically deal with the MQTT capabilities of the FLM, allowing to mesh-up sensor data from different sources.
For "my" persistence in a database I use following table layout:

  1. sensor CHAR(32),
  2. timestamp CHAR(10) // timestamp TIMESTAMP
  3. value CHAR(5)
  4. unit CHAR(5)
  5. UNIQUE KEY (sensor, timestamp)

This layout allows to store the sensor data (value AND unit) together with the sensor's ID and timestamp of event occurrence. There shall be a unit as already with water and gas there are different units to consider beside Watts alone; Temperature (°C, °F) are further candidates.
Alternatively sensor meta data could be stored aside (sensor ID with description, unit, ...), yet a "foreign key" relation shall be introduced based in the sensor ID.
What else can you imagine? Or may an existing format be used, like for example expressed by the SML, the Smart Message Language?

Note: The solution to this thread is TMPO.

petur's picture

I have two remarks:
- avoid storing data that never changes with every datapoint, so a config table which points to a dataset table
- have the ability to cope with hardware replacement (FLM breaks or is upgraded, we want the measurements of the new unit to pick up where the old ones stopped and visualize seamless). Either by allowing multiple unique sensors for a dataset by retiring them (but keep history), or forcing old unique keys into the new FLM.

I'm not that familiar with current capabilities so the above may be pointless points.

gebhardm's picture

With a closer look at the pdf I see the ingenuity... Encoding is key to bring down the data volume, of course - so my table approach is rather naive... meta data, unchanged then is kept separate....

gebhardm's picture

REQ: Any time interval of sensor data shall be selectable without having to decode all data from the very first set... (or I didn't understand the encoding...)

icarus75's picture

Well, that didn't take very long. :)

Thanks for the feedback. Some more details not present in the presentation:
1/ Besides the "t"ime and "v"alue keys in the Tmpo JSON object, there will also be a "h"eader entry containing sensor metadata, e.g. name, uid, unit, etc.
2/ The FLM will store blocks in 2^8 = 256 secs intervals locally, denoted as a block8 in the code. There will be a compaction running in background to group 16x block8's into a single block12, 16x block12's into a block16 and 16x block16 into a block20. The latter will contain a sensor's data of the last +-18 days (= 2^20 secs). This will allow the FLM to run offline for quite a while, whilst storing its sensor readings persistently and in highest resolution. No downsampling.
3/ There will be a "garbage" (better: "block") collector running in background when the FLM's allocated flash blocks exceeds a certain threshold. I'm thinking about 80%'ish. The oldest level20 blocks will be collected first.
4/ Gzipped blocks will be published for each sensor on MQTT topic /sensor/[id]/tmpo/[level]/[bid] where level = {8, 12, 16, 20} and bid = block id = unix timestamp of the start of the respective block interval
5/ @petur Good second point. That should be taken care of at a higher level. The FLM itself has a fixed amount of pre-allocated unique sensor id's. They are never retired.
6/ @gebhardm We're using domain knowledge to format the data before offering it to gzip. Hence the high compression ratio (1/100). And JSON within a gzip can be directly served to the browser, e.g. ngx_http_gzip_static_module. No database to hit for the raw timeseries. No data conversion to JSON format needed either. It's already sitting there in a couple of pre-compressed files. Furthermore, the JSON decoding engines inside modern browsers are very fast. And then we'll probably throw in a bit of SPDY as well, so that we can fire off multiple tmpo GET's async into a single TCP connection.

Cheers
/Bart

mozreactor's picture

Bump.

Any update on this work in progress? (Beta firmware, etc?)

gebhardm's picture

With firmware 2.4.4 the tmpo daemon is active and persists readings (see source repository for details); yet missing is the retrieve/query functionality that would allow a programmatic access without sending requests to the filesystem.

icarus75's picture

The latest r24x firmware establishes an MQTT bridge to the flukso.net server. Syncing logic will make sure the tmpo blocks will be uploaded to the server. Tmpo blocks can be accessed through an HTTPS/REST API. Have a look at tmpo-py which wraps this API and stores new tmpo blocks to a local SQLite DB on your computer. Tmpo-py also makes it straightforward to get the sensor timeseries in a Pandas dataframe for further analysis.

gebhardm's picture

The tmpo-py (yet another programming language?) downloads via the flukso-api; will there be a local variant available as the corresponding "own" data is stored on the local FLM? Thx, Markus

icarus75's picture

Every time a tmpo block is generated by the tmpod, it will be published on the MQTT broker with a /sensor/sid/tmpo topic.

The first binding against the tmpo REST API was written in Python because besides being a popular general-purpose language, Python has also gained a lot of traction lately as a data analysis language with the advent of the Numpy and Pandas packages. What the tmpo-py lib does is wrap all the HTTP stuff, cache the tmpo blocks locally on your computer in a SQLite DB for fast future retrieval, and return the data as a Pandas time series or data frame. So you're basically all set to start analyzing your time series. Which is what we're working on in the opengrid project.

gebhardm's picture

Thanks - tmpo data is stored on the FLM and mqtt messages are published with QoS 0, no retention (topic /sensor/+/tmpo/# - what the payload looks like, I have to check as mosquitto_sub shows just a '?'), so when not received ad hoc there is no chance to get it again, thus have to wait for the next or connect to the flukso.api - which in a pure LAN environment (which the FLM is capable of) is not really convenient.
So, again the question: Will there be a method to retrieve the locally stored tmpo data directly? (I am still not convinced of Python and this huge numpy/panda stuff for dealing with simple time series - even though with the current python.org download packages and using pip installation of numpy and panda is "quite easy" after G**gling how to get it - documentation is for the dumb, isn't it?)

icarus75's picture

Tmpo blocks are stored directly in the FLM's flash file system. So a bash one-liner will do the job:

  1. ssh root@192.168.255.1 "tar -C /usr/share -c tmpo" | tar -x

gebhardm's picture

I regard a tar from the FLM as rather not web-like. I quote https://www.flukso.net/files/presentations/flukso.20140425.pdf - here it is said "serve as a plain text file from web server to browser" and "no server side translation required for rendering". Thus I state following requirement.

  1. Requirement TMPO-WS-BROWSER: Local tmpo-data shall be accessible/able to be queried from its source via the built-in FLM web server. Note: Preferably there is a channel utilizing the existing web-socket infrastructure.

As I have proven, there is a capability to supply real-time visualization already from the FLM itself with complete front-end computation; the same I regard feasible for the existing persisted tmpo data without the python-overhead of "analytical frameworks". For this just "some feasible querying" (I assume that I pass a time interval and get the nearest tmpo block(s) available) of the tmpo data is necessary, the rest can be done in Javascript in the displayed HTML page...

gebhardm's picture

Learn about tmpo with a simple JavaScript script... - see https://github.com/gebhardm/flmdisplay/tree/master/tmpo

Next step is convincing the tmpo daemon to publish queried tmpo blocks on request directly from the local FLM, thus get rid of the flukso.net storage and this bulky Python thingy (sorry, Bart, you haven't convinced me to like this)...

gebhardm's picture

end-of-thread
There is a sufficient solution provided with the TMPO storage format.
This can be utilized conveniently; see https://www.flukso.net/content/querying-local-tmpo-data for a PoC to even use the data directly from the local FLM.