create results file only when finished

create results file only when fully finished

Idea behind this trick is very simple yet powerful enough. It allows you to avoid unexpected consequences when some results file is being appended long enough by your process.

Say we have this script which writes data with 1 second intervals:

[root@linux ~]# cat long_running_script.sh
#!/bin/bash

for i in $(seq 1 10); do
  echo "$(date "+%Y-%m-%d %H:%M:%S") ${i}"
  sleep 1
done

exit 0
[root@linux ~]#

So imagine you now run it and redirect output to some “results” file. If only you will use that results file, there is no problem. Just wait for it to be created and once script finishes – use it, e.g.:

[root@linux ~]# ./long_running_script.sh > long_running_results.txt
[root@linux ~]# cat long_running_results.txt
2023-12-07 12:03:08 1
2023-12-07 12:03:09 2
2023-12-07 12:03:10 3
2023-12-07 12:03:11 4
2023-12-07 12:03:12 5
2023-12-07 12:03:13 6
2023-12-07 12:03:14 7
2023-12-07 12:03:15 8
2023-12-07 12:03:16 9
2023-12-07 12:03:17 10
[root@linux ~]#

But what if some other process periodically reads that results file? What if you can’t guarantee that it won’t read it at the moment when say only half of the results file is created? That would be especially bad if your results file has some structure (e.g. json file), which would mean that results file (when still incomplete) would be treated as broken. Solution for this is to write to temporary file and later on move it to desired results file, like:

[root@linux ~]# ./long_running_script.sh > long_running_results.txt.tmp && mv -f long_running_results.txt.tmp long_running_results.txt
[root@linux ~]#

That is how the results file will be there ready to be used only when process is fully finished. If both processes (the writing and the reading one) are run automatically, with this approach you don’t have to care at which point of time they run. The reading one will always have something correct to read since new data will be there only when fully ready.