Redis + Audio Analysis for near-realtime fun
Go to file
2020-09-29 15:57:46 +02:00
.gitignore [fix] update for librosa 0.7.2 2020-09-28 14:40:39 +02:00
README.md [fix] Readme typo 2020-09-29 15:57:46 +02:00
redilysis.py [fix] Better bpm 2020-09-29 15:50:00 +02:00
requirements.txt [fix] update for librosa 0.7.2 2020-09-28 14:40:39 +02:00

(Audio Analysis | redis ) == <3

Redilysis sends audio analysis to a redis server.

The idea is to share a single audio analysis to many Visual Jockey filters, in our case for lasers.

Redis Keys and Contents for end users

Each word in bold is a key which you can query the redis server for. Ex:
$ redis-cli get spectrum_1
"[2.21, 0.56, 0.51, 0.32, 0.27, 0.21, 0.18, 0.17, 0.18, 0.23]"

rms

  • Mode spectrum
  • Type float number
  • Length scalar(1)
  • Meaning Represents the root-mean-square -a mean value- for all frequencies between C0 and C9, e.g. between 12Hz and 8,372Hz.
  • Use A fairly basic information about the scene audio volume.
  • Example
    • "0.12"
    • The audio volume for the scene is pretty low.
    • It is obtained by averaging the RMS of every audio frame during the capture.

spectrum_10

  • Mode spectrum
  • Type array of float numbers (0.0-10.0)
  • Length 10
  • Meaning Represents the audio volume for the 10 octaves between C0 and C9, e.g. between 12Hz and 8,372Hz.
  • Use A simple and useful way to get a global idea of the sound landscape.
  • Example
    • "[2.21, 0.56, 0.51, 0.32, 0.27, 0.21, 0.18, 0.17, 0.18, 0.23]"
    • The audio volume for the C4 octave is spectrum_10[4].
    • That value being 0.27 is pretty low meaning almost no audio volume for that octave.
    • It is calculated by averaging the volume of the octave's notes, e.g. C4, D4, D#4, E4, F4, F#4, G4, G#4, A4, A#4, B4.

spectrum_120

  • Mode spectrum
  • Type array of float numbers (0.0-10.0)
  • Length 120
  • Meaning Represents the audio volume for the 120 notes between C0 and C9, e.g. between 12Hz and 8,372Hz.
  • Use More detailed than spectrum_10, it allows to find the standing out notes of the audio landscape.
  • Example
    • "[5.55, 2.61, 2.49, 1.79, 2.09, 4.35, 1.99, 1.57, 1.47, 0.77, 0.91, 0.89, 0.85, 0.56, 0.53, 0.73, 0.53, 0.46, 0.43, 0.44, 0.27, 0.45, 0.7, 0.81, 0.98, 0.7, 0.71, 0.6, 0.83, 0.51, 0.32, 0.31, 0.33, 0.24, 0.25, 0.33, 0.39, 0.43, 0.51, 0.28, 0.27, 0.25, 0.38, 0.25, 0.27, 0.3, 0.2, 0.27, 0.35, 0.29, 0.34, 0.3, 0.27, 0.27, 0.22, 0.21, 0.21, 0.29, 0.22, 0.28, 0.18, 0.19, 0.25, 0.26, 0.25, 0.24, 0.2, 0.21, 0.19, 0.18, 0.19, 0.17, 0.2, 0.17, 0.18, 0.17, 0.15, 0.17, 0.19, 0.18, 0.21, 0.16, 0.16, 0.18, 0.15, 0.13, 0.14, 0.16, 0.2, 0.17, 0.17, 0.2, 0.18, 0.16, 0.18, 0.15, 0.15, 0.16, 0.16, 0.19, 0.19, 0.19, 0.17, 0.18, 0.17, 0.19, 0.23, 0.23, 0.2, 0.23, 0.24, 0.36, 0.34, 0.23, 0.22, 0.2, 0.19, 0.18, 0.21, 0.21]"
    • The audio volume for the C2 note is spectrum_10[23] (12x2 - 1).
    • That value being 0.81 is average meaning there is some audio volume for that octave.

bpm_sample_interval

  • Mode bpm
  • Type float
  • Length scalar(1)
  • Meaning Represents the duration in milliseconds of the interval at which beat detection sample are done, in milliseconds. Beat detection require longer sampling duration than spectrum. The former requires intervals superior to 1s, while a 0.1s interval is sufficent for the latter.
  • Use This is useful only if you try to guess future beats.
  • Example:
    • "3000.0"
    • Each audio sample used to detect the beats is 3s long.

bpm_delay

  • Mode bpm
  • Type float
  • Length scalar(1)
  • Meaning Represents the duration in milliseconds of the time taken to make the sample and analyze it, excluding the time spent saving it to redis. In other words, it is (bpm_sample_interval + bpm treatment time).
  • Use This is useful only if you try to guess future beats.
  • Example
    • "3197.49093056"
    • The capture + detection time for this tempo detection was 3.19s
    • If the bpm_sample_interval is 3.0s, it took 0.19s to analyze and detect beats in the sample.

bpm

  • Mode bpm
  • Type float
  • Length scalar(1)
  • Meaning Represents the tempo of the audio landscape, in Beats Per Minute (BPM).
  • Use A simple way to know how fast the music goes, in musical environments
  • Expiration This is the only key with an expire value in milliseconds set using's Redis PEXPIREAT command.
    • Each time this key is saved to redis, it is set to expire in 2 * bpm_sample_interval.
      • For example the key will be set expire in 6 000ms if the audio sample lasts for 3s.
    • This is useful to know for how long the key has been in the redis by using the Redis PTTL command.
      • For example, if you run the following command redis-cli PTTL bpm which results in "3848" provided bpm_sample_interval is equal to 3000.0
      • It means you can compute when the bpm key was saved to redis i.e. 2 * 3000 - 3848 = 2352 milliseconds ago, plus the TCP transaction time of the redis query.
  • Example
    • "126.05"
    • There are ~126 beats per minutes.

beats

  • Mode bpm
  • Type list
  • Length variable
  • Meaning Represents the beats positions in the same
  • Use This is useful only if you try to guess future beats. And in this case, this is the key information.
  • Example
    • "[0.34829932 0.81269841 1.20743764 1.60217687 2.00852608 2.48453515]"
    • After the audio captured, the first beat was detected at 0.35 seconds, the second beat ath 0.81, and so on until the last beat at 2.48
    • Based on the BPM and this information, we can project future beats around 2.90, 3.30, 3.70, etc.
    • See below for beat projection

Calculating the next beats at user times

This computation requires the following values

  • bpm_sample_interval
  • bpm_delay
  • bpm
  • pttl_delta : Double the bpm_sample_interval minus (-) the PTTL value of the bpm key in redis minus
  • last_beat : last beat time in sample

Redis_latency times are not considered here, but could be

Examples are given based on previous values for redis keys.

  1. How many second per beats? * seconds_per_beat = 60 / bpm * 60 / 126.05 =~ 0.4760
  2. When did the capture start? * total_delay = bpm_delay + pttl_delta * 3197.49093056 + 2352 =~ 5549.5 i.e. the capture started 5.5 seconds ago
  3. When was the last beat (in milliseconds)? * last_beat_delay = bpm_delay - last_beat*1000 + pttl_delta * 3197.49093056 - (2.48453515*1000) + 2352 =~ 3064.96 i.e. the last beat was 3.1 seconds ago
  4. How many beats were there between the last beat and the redis get bpm query? * count_past_beats = floor( (last_beat_delay / 1000) / seconds_per_beat) * (3064.96/1000) / 0.4760 = 6.439 e.g. there were at least 6 beats
  5. When are the next beats relative to the redis bpm key retrieval time (in milliseconds)? * next_beats = f(i){ i * seconds_per_beat * 1000 - last_beat_delay } where i >= count_past_beats * f(i){ i * 0.4760 * 1000 - 3064.96 } where i >= 4 * f(6) = 6 * 0.4760 * 1000 - 3064.96 = -208.9600 * f(7) = 7 * 0.4760 * 1000 - 3064.96 = 267.0400
    • The next beat is in 267 milliseconds after the time we got the redis key * f(8) = 8 * 0.4760 * 1000 - 3064.96 = 743.0400 * etc. until you have f(i) > bpm_sample_interval

Requirements and installation

  • python 2.7
  • audio card
  • redis server

To be honest, in my experience installation on Debian 9,10,11 is a mess, due to mandatory LLVM's version when compiling the numba library for librosa.

sudo apt install python-pyaudio python
git clone https://git.interhacker.space/tmplab/redilysis.git
cd redilysis
pip install -r requirements.txt
python redilysis.py --help

Running redilysis: Common parameters

Two modes are available (see below for SPECTRUM and BPM), so you might need to run two processes for full analysis.

Here are the commmon parameters for both modes.

Get the help

python redilysis.py -h

Run with debug info

python redilysis.py -v

Get a list of audio devices

python redilysis.py -L

Run with a given audio device

python redilysis.py -v -d 5

Run with a sampling frequency of 0.5s

python redilysis.py -v -s 0.5

Connect to redis on address 192.168.2.20 and port 6379

python redilysis.py -v -i 192.168.2.20 -p 6379

Change the internals of capture: run at 22000Hz with 2200 frames per buffer and 2 channels

python redilysis.py -v -r 22000 -f 2200 -c 2

Running redilysis in Spectrum Mode

Choosing the spectrum mode

python redilysis.py -v -m spectrum -s 0.1

This is the default mode.

It performs some frequency analysis (Fast Fourier Transform) to detect "energy" in the human audition bandwidths.

It will record if there is sound and at which frequencies.

It can run at sub-second frequency (100ms) with no problem.

It reports realistic data: spectrum analysis is the easy part.

Running redilysis in BPM Mode

Choosing the BPM mode

python redilysis.py -v -m bpm -s 3

Choosing a minimum and maximum BPM

python redilysis.py -v -m bpm -s 3 --bpm-min 100 --bpm-max 200

This mode is less sure that the spectrum mode.

It must absolutely run with multiple seconds interval to work well. Three seconds is a correct minimum.

It attempts to detect beats based on audio "jumps" in intensity and energy.

To correct a well-known error called the "octave error" where the detected tempo is twice/half or thrice/third of the real tempo, you can use the Min/Max BPM. When the calculated tempo is outside of the range, it will attempt to find more legitimate values.