Integrating audio information with Mapillary image... OpenStreetMap diary

I run a Mapillary supplied BlackVue dashcam uploading my take via the mapillary_tools in two “stabs”. The first is usually same day for road speeds segments below 55kph, the second is up to 2-3 months later. This gives town/village streets and remote road intersections/signs coverage first, for other mappers. I am also taking 1FPS rear left (webcam) and right angle left images (Samsung phone) at less than 40kph for immediate geotag, process/upload. (after manually culling privates)

Waiting for images to appear on the Mapillary site for my own OSM work degrades and reduces the result. For some time I had processed and geotagged locally, so I could scroll through (1FPS) images at the end of every day. Unfortunately the camera doesn’t always capture sufficient detail and scrolling through thousands of images takes time!
I would use Geeqie (image viewer) and a “side” text file (gpscorrelate -m) of image filenames and lat/lon, then copy/paste these direct to the OSM ID editor search box.
Some months ago I started talking into the camera whilst driving. Eg street addresses, business names, features in rest areas etc. The effort is about rough cut removing “no data” and being able to find feature add/changes quickly/accurately. My action is now;
Process/geotag images as before, but also extract a (1 minute) wav file (ffmpeg) from the BlackVue mp4. This has the same file naming standard and thus indirectly geotag references against the associated image(s).
Move all images and mp4’s that have no movement (speed=0) for an entire 1 minute block out of future processing.
Run the wav files through a sox bandpass (sinc) filter to reduce road noise.
Run each wav through sox vad and remove those that have no voice on it. I have also modified certain timing constraints to reduce false detects. (Currently vad -t 12 -l 1000 -L 1000 -h 300 -H 300 -T 1)
Move the already tagged images in with their one minute block matching wavs.
The directory now only contains 1 minute blocks of audio/wav and images/jpg of voice only and vehicle moving.
All of the above steps are launched by a single bash script on the laptop near the BlackVue. I only need connect to the BlackVue WiFi and run it. It starts downloading the camera (curl/wget) and 1-2 hours later the action directory, gpx data and moving mp4’s are ready. This data is rsync’d to another laptop for the OSM processing.
Whilst the above is running I often do some OSM/ID work on the previous day’s data (2nd laptop) plus action the side camera culling and uploading.
Have setup 3 plugins (toolbar icons) in Geeqie; Launch the 1 minute wav file in Audacity associated with that image, copy the lat/lon to the clipboard, and move the entire 1 minute band of images and wav out of the active/processing directory.
Playing/viewing the 60 sec wav in Audacity (via the Geeqie icon) has 5 second tick marks that correlate with the image filename. Eg 20220504_142256_044.jpg will be at the 44 second mark. If there is only one voice peak on the Audacity display I can quickly scroll/roll to it, not having to view intervening images. If the audio track is very dense, like driving past a long row of shops I can pause audio and scroll/roll as needed. Passing vehicles and UHF FM chatter looks very different on the Audacity display, so these can be ignored.
The side camera files are similarly named so can be viewed for editor input. The raw >40kph take is also available, but is often too wide time spaced or blurred. They do however work very well on shopfront signs. These image files have also been geotagged, so the Geeqie copy lat/lon button also works.
Clicking on the lat/lon icon puts the current image position onto the clipboard, that then gets pasted to the ID editor search box.
I would suspect that I could save a further few seconds by URL launching “remote” on the browser, but not yet!
When I have completed the feature in ID I click on the Geeqie “move” icon and the entire 1 minute image and wav file set vanishes!

And of additional interest.

Voice detection isn’t perfect, but sox vad can be adjusted. Currently I select on sound peaks and get some false detects from the two way radio and passing vehicle slipstreams.

I have a GPS unit in the top left of the windscreen and the (centred) BlackVue creates *.gps files that can be converted to nmea. I can then therefore do a credible street alignment check on OSM by driving up and down, cutting the associated 4 tracks with Viking then combining (gpsbabel) for a OSM private trace upload. It is easy to see any accuracy issues by having so many detailed tracks superimposed.

It appears that gpx files created (gpsbabel) from the BlackVue *gps (nmea) file don’t have complete speed information. I therefore use the nmea data to tag (exiftool) with. As an interesting extra the BlackVue can be GPS unlocked for the first 1-3 minutes, so the second GPS unit nmea data can be used to fill the gap.

I also do a rough cut of the BlackVue mp4’s to remove those where the vehicle was not moving. Handy if I stop for lunch and keep the camera running. I only want to eventually upload mp4’s that have movement.

Always evolving…

Integrating audio information with Mapillary image handling and OSM input