So, in summary so far, there seem to be two main workflows proposed, each with variations:
1. The way most people seem to do it using two separate recording devices and two separate editors for audio and video. Described by Jay Metcalf in one of his videos above. This is the most flexible and powerful method, but clunky.
a. Record audio using the usual DAW and microphone (including backing track if used) on a computer, and record video simultaneously but separately using a phone or a tablet.
b. Add sound effects to the audio, mix as usual and export the combined audio-plus-backing as a single audio file.
c. Import the audio and the video into a video editor, synchronise them (Final Cut Pro can do this automatically), and replace the audio on the video track with the "improved" audio track.
d. Clip the combined media (e.g. top and tail and splice as necessary) and add any video effects.
e. Export.
A variation on this method is first to record the audio and then to record the video afterwards by playing along to the audio track.
2.
@BigMartin's suggestion using Reaper. This is a much simpler workflow, but does not allow adding video effects as far as I know. (I may be wrong about this.)
a. Record audio and video on a single device, such as a phone, possibly using an external microphone to get better sound quality.
b. Import into Reaper. (On a Mac, LogicPro can be used instead of Reaper.)
c. Add the backing track and synchronise.
d. Add sound effects and mix as usual.
e. Clip the combined media (e.g. top and tail and splice as necessary)
f. Export.
Both methods can work on Windows and Macintosh.
ScreenFlow, as proposed by @Guenne, can combine the tasks of audio capture, video capture, and editing in one piece of software, thus simplifying the workflow.
Correction: And it allows audio editing too - see post below.
Final Cut Pro (Macintosh video editor) allows the user to add audio effects to a video, but the consensus seems to be that it is very fiddly.