Fast SFTP transfers with KDE / kio-extras / Dolphin

For a few years now, I’ve used Dolphin / Konqueror / Krusader for transferring files to or from my home server via SFTP. However when there were larger amounts to be transferred, I opened the shell and used rsync. That’s because I felt it was always somewhat slower with SFTP. I blamed it on the weak file server, on the network, or whatever.

Recently I again wanted to clean up my desktop’s home directory, where a bunch of raw or edited videos were sitting. All in all ~250 GiB. Because I wanted to sort the videos into different sub-directories on the server, I used Dolphin and SFTP once again. While I was watching the transfer, I began to doubt whether the network between my desktop and the home server was actually Gigabit, because I only saw 10-15 MiB/s. Quick check with rsync and scp: nope, 80+ MB/s is possible, just not with Dolphin.

That’s when I decided to get to the bottom of the issue. I discovered KDE bug 296526 – Dolphin is too slow when upload a file on a SSH server. It fit exactly what I was observing, even though it’s from 2012. That’s 14 years! I went on to read all the comments, and it became apparent that none of the users that had commented had spent the time to do a proper side-by-side comparison, and document it properly. So I took up that task and posted comment 40, where I wrote pretty much what I wrote here, but also the results of some tests that I had done with ‘libssh’. That’s the underlying library which kio-extras/sftp uses to interact with SSH/SFTP servers. And Dolphin / Konqueror / Krusader in turn use kio-extras/sftp to do SFTP transfers.

These tests showed very promising results: I could actually saturate my Gigabit link and transfer >750 Mbps. What I didn’t know at that time (September): I had tested a brand new version of libssh (0.11.0), released in August, that came with major changes. Namely a new async I/O API had been added. The transfers with Dolphin had still used libssh 0.10.x though.

Upon learning about these important changes, I opened a version bump request for libssh 0.11.0 in Gentoo’s bug tracker to make the Gentoo devs aware of this new libssh version. Fast forward two months, libssh 0.11.1 was available in Gentoo. I then erroneously tested with kio-extras linked against libssh 0.10.x (and thus disabled new async I/O API), even though I had 0.11.1 on my system. The reason was, that I hadn’t rebuilt kio-extras. That resulted in comment 45. A few minutes later I realized my error, rebuilt kio-extras (actually most of KDE, because an update was coming in anyway), and voilà: ~230 MiB/s or ~1840 Mbps (with peaks going >2 Gbps)! Hurray!

Side note: I had upgraded my network from Gigabit to 2.5 Gigabit in the meantime, otherwise that would obviously not be possible. But with the old libssh only around ~90 MiB/s or ~720 Mbps were possible over the same network, 2.5x slower than with the new libssh 0.11.x.

So, if you find that your SFTP transfers in KDE are slower than they should be, check which libssh version comes with your distribution. If it’s <0.11.0, you know you need to upgrade. With non-rolling binary distributions, you’ll probably have to wait a bit and then upgrade your whole distribution. For example Ubuntu will only get libssh 0.11.1 in “Plucky” aka version 25.04. For rolling distributions like Gentoo or Arch it’s already available.

This is probably the biggest single improvement to my Linux on the Desktop experience of the last years… hence this blog post 🙂

Heating the room for Science

Before I can get into the actual topic, I have to give some background.
TL;DR: I have some free electric power that I wanted to put to good use. You can skip to How to put the power to good use?

Background: Fuel cell produces more power than we need at times

We have a pretty fancy heating system that doesn’t just burn domestic gas to generate heat as most heating systems do, but instead it reforms the gas to hydrogen and uses it in a fuel cell to generate ~750 watts of electric power. The fuel cell’s “waste heat” is what’s used to warm the house and domestic hot water most of the time. On cold days, when the 1.1 kW thermal power are not enough, there is also still a traditional gas condensing boiler built in, which just burns as much gas as necessary. Another important component of the system is a 220 liters hot water tank which serves as a kind of energy storage over time. If you’re interested in this technology, here are some links:

Now 750 watts doesn’t sound like a lot, and often we need much more than that for short periods of time, e.g. while cooking or baking, when the washing machine is running, etc.. But then again, there are sometimes many hours where the whole house needs less than 300 watts:

The blue graph is power production, yellow is what we don’t use, red is what’s drawn from the grid. I’ve highlighted the area (green) where power was available from the fuel cell that we didn’t use.

The price we get for power exported to the grid (yellow graph) is very low – from an economic perspective it makes no sense to “waste” power by sending it to the grid. Which is weird, as everyone is talking about “Energiewende” (“turnaround in energy policy”), decentralizing power production and so on. But that’s how it is right now, not much I can do about it. Hopefully policies will improve soon and make it more attractive for anyone to send power to the grid, so that more coal-powered plants can be switched off.

How to put the power to good use?

So I was thinking about how I could put the electric power to good use, instead of “wasting it”. First I thought of a big battery, but these are rather expensive, heavy, degrade over time and will probably never pay off. A big battery in the form of a battery electric vehicle would make a lot of sense, but that’s also not an option right now. Then, one evening, sitting next to my beast of a computer and playing a 3D game, I noticed that the room was getting pretty warm, even though the floor heating was switched off for that room. That’s when it occurred to me: Why don’t I transform the surplus power to heat, and let the computer do something useful while generating it?

I remembered how – at the beginning of the COVID-19 pandemic – I had donated my old laptop’s meager CPU + GPU power to participate in the search for a vaccine by running the BOINC client. What is a BOINC client you ask? It stands for “Berkeley Open Infrastructure for Network Computing” client, and basically turns your computer into a part of a distributed super computer that scientists can use to solve computation-intensive problems. See this Wikipedia article for more info.

So I installed BOINC on my desktop machine, connected it to Science United and the machine started to hum. However when I checked the power meter, even though it was late evening and there were no big power consumers running, the house was drawing power from the grid. Turns out this beast – when all 12 cores and the GPU are crunching numbers as hard as they can – draws about 550 watts. Add on top the fridges and other infrastructure (home server, WiFi router/APs, switches, smart home stuff etc.) and ~750 watts from the fuel cell weren’t enough. So I had to come up with a plan to limit power usage of the computer somehow. I wanted to regulate the power consumption in a way so that BOINC would only run when it makes sense. I came up with the following criteria for running BOINC:

  • Currently unused fuel cell power must be greater than what the computer needs when it runs BOINC without GPU (more about GPU later), i.e. >180 watts
  • Outside temperature must be “cold” (I defined that as less than ~10 degrees Celsius for now) – for the simple reason that otherwise the small room would get uncomfortably warm and I would have to open a window, which I would consider a waste of energy.

I put these in code and tried it out. But something wasn’t right: BOINC would always run for two to four minutes, and then stop, only to start up again after two to four minutes. Thinking back how much power the machine was actually using at full CPU+GPU load, I realized I would have to regulate the power consumption in a more fine-grained way. As the GPU alone can use up to 350 watts, that’s where I saw the biggest leverage. I then (re)discovered the nvidia-smi tool, which allows to set a limit on how much power the GPU can use. So I enhanced my program to first start BOINC on CPU only. If after the next cycle there was still sufficient fuel cell power, I would switch on the GPU with its lowest possible wattage (100 watts). Then, every cycle, if there was still more than >50 watts available, I would increase GPU power further. I designed these control cycles to be two minutes long. The data to base the decisions on comes from InfluxDB and is averaged (mean) over these two minutes, so that short bursts of power consumption are flattened. Whenever hitting 0 watts of available fuel cell power, the GPU would first be suspended, and in the next cycle also the CPU.

There are a few more elaborate details I added after a few days, but I can say it’s rather smooth now. The room is warm and I’m contributing valuable computing power to science – at zero cost 😄

InfluxDB newcomer’s impressions

Three weeks ago I’ve decided to use InfluxDB for some Smart Home data. Here is a short article about my impressions.

Getting it to run on my home server was pretty easy:

$ docker run --rm --name influxdb -d -p 8086:8086 \
  --volume /data/influxdb/influxdb2:/var/lib/influxdb2 \
  --volume /data/influxdb/config.yml:/etc/influxdb2/config.yml \
  influxdb

Of course I first had to figure out these arguments, and how to get that config.yml, but it’s all rather well-documented here.

Once it was running, I started feeding in data from a Python script. That was also really easy, with the guided example directly from the InfluxDB web GUI. I’ve added the necessary code into a script that I already had running every minute – the relevant portions are:

Python code to publish data points into InfluxDB
import influxdb_client
from influxdb_client import InfluxDBClient, Point, WritePrecision
from influxdb_client.client.write_api import SYNCHRONOUS
import traceback, urllib3

influxdb_token = os.getenv('InfluxDB_token', '')
influxdb_org = os.getenv('InfluxDB_org', '')
influxdb_url = os.getenv('InfluxDB_URL', 'http://localhost:8086')
logger_version = "1"

# 'device' is populated by the PyViCare library that accesses
# my heating system's API - something for another blog post ;)
t = device.asFuelCell()

info = {
    "FuelCellPowerProductionCurrent": float(t.getFuelCellPowerProductionCurrent()),
    "FuelCellPowerPurchaseCurrent": float(t.getFuelCellPowerPurchaseCurrent()),
    "FuelCellPowerSoldCurrent": float(t.getFuelCellPowerSoldCurrent()),
    "HotWaterStorageTemperatureBottom": float(t.getHotWaterStorageTemperatureBottom()),
    "HotWaterStorageTemperatureTop": float(t.getHotWaterStorageTemperatureTop())
}

client = influxdb_client.InfluxDBClient(url=influxdb_url, token=influxdb_token, org=influxdb_org)
bucket="smarthome"
write_api = client.write_api(write_options=SYNCHRONOUS)

for field in info:
    point = (
    Point("heating")
    .tag("logger_version", logger_version)
    .field(field, info[field])
    )
    try:
        write_api.write(bucket=bucket, org=influxdb_org, record=point)
        print('- ' + field)
    except urllib3.exceptions.NewConnectionError as e:
        print('\nWARNING: InfluxDB seems to be down:')
        print(''.join(traceback.format_exception(None, e, e.__traceback__)))
        break
    except Exception as e:
        print('\nWARNING: Could not submit ' + field + ' to InfluxDB:')
        print(''.join(traceback.format_exception(None, e, e.__traceback__)))
        pass

Et voilà, my first queries with Data Explorer within InfluxDB returned results:

Using InfluxDB’s Data Explorer

After a bit of customizing and then saving the query on a dashboard, I have very beautiful graphs of the data, and so far it has been running very stable and reliably.

However there are two things that started to bother me, namely:

  • There is no way to access the dashboards on mobile. I can log in alright, but the dashboard is just empty. Looks like a major bug / missing functionality. If I switch on “desktop site” in the browser (Firefox on Android, but also tried Chrome), I get the same that I get on my desktop. While somehow working, it’s not a really good experience: for example when you zoom in you can no longer “scroll”, because touching a diagram/cell content will interact with it, e.g. moving the cell to another place instead of doing the desired scroll action. I’ve also searched for an app that would let me connect to my InfluxDB and offer a proper mobile experience, but couldn’t find one.
  • Downsampling of data doesn’t happen automatically for longer time ranges, and there is no “easy” way to get this working. I absolutely expected from a time series database with integrated visualization to be able to do this out of the box. Because as soon as you have more than a “handful” of data points in the database, queries over longer time ranges cause the client (i.e. the machine running the browser that accesses InfluxDB) to work really hard after executing a query. That’s because all the data points in that range from the database are actually sent to the graphing/rendering engine. So even when I do a “30 days” query for my 5 data points which I sample once a minute (= 216,000 data point entries), InfluxDB won’t reduce that to a more reasonable number. It actually transfers 216,000 data point entries to the browser that tries to graph them in the diagram. Depending on what hardware the browser runs on this can take a while. It would be better if there was a way to dynamically downsample the data, so its resolution matches the use case at hand.
    Example: If a graph is rendered on a 1080p screen and covering 50% screen width (1920/2 = 960 pixels), I would assume that 960 data point entries per graph line would suffice. That would reduce the 216,000 points to a mere 4,800. But in InfluxDB, you get 45 data points per pixel, who needs that? And that was only a one-month query. Once I have a year worth of data points, I’ll probably have to wait minutes for the browser to finish its number crunching.
    The proposed way is to set up a “cron task” that does the downsampling and copies the values into a separate bucket. Of course I could make this work, but seriously?!

Having said that, it’s still definitely a great tool, and I hope these two issues will be resolved in the near future (or someone tells me what I’ve been doing wrong, which is always a possibility 😉).

Edit (2022-12-12): Fixed downsampling example math, had forgotten that the 216,000 points are for five data points, and would thus be rendered as five graphs, not one.