Sensible audio data compression in MCU's?

Started by anotherjim, October 14, 2015, 11:54:52 AM

Previous topic - Next topic

anotherjim

Well, searching for ideas on the topic hasn't given me much.

So, an MCU with 10 or 12bit converters and you want it to sample audio and do some basic mixing. Given that the chip has some speed, around 16mips, but with on-board RAM at a premium, what's an efficient way of storing none byte sized samples?
Storing to 16 bits is easy, but wasteful. I think a data compression method to 8bit would be a good solution, but the maths involved would seem maybe too much if the sampling rate is up around 32Khz.

Anyone got any ideas?

Digital Larry

Not a compression expert, but I looked at u law (mu-law) and A-law.  Both require you to calculate a few logs per data point or have a massive transform table fixed in memory.  Or you could just go linear and not worry about it.  I'd be really tempted to do the latter.
Digital Larry
Want to quickly design your own effects patches for the Spin FV-1 DSP chip?
https://github.com/HolyCityAudio/SpinCAD-Designer

anotherjim

Well, I did find 8 bit ADPCM stuff, but struggle to understand it. Can't find anything about it in plain language.


samhay

While this might be hypothetical for the moment, what type of MCU do you have in mind - 8bit (probably not), 16bit, 32bit...?
I'm a refugee of the great dropbox purge of '17.
Project details (schematics, layouts, etc) are slowly being added here: http://samdump.wordpress.com

anotherjim

8 bit MCU.
Plenty ADPCM info /code for both AVR and PIC but aimed at low rate speech storage with high code overhead. 300+ cycles. I'm thinking of a multi-tap delay, probably in a 16Mhz Mega328 & at least 3 taps. At 32Khz sample rate, that's about 300 cycles to do 1 encode and x3 decodes! Mixing would be external analog so it won't have much else to do apart from manage one cyclic buffer.

Even the info I'm finding for u-law/A-law log encoding doesn't make sense to me! There will be plenty of program flash for look-up tables, but I don't see how using a byte to point to the tables (which miss intermediate values) can be any better than simply losing lsb's and going 8 bit.  There has to be something clever going on that I'm missing.
I suppose if you have 256 word table values, but they progressively have larger steps between them with increasing amplitude, then you achieve compression by making higher amplitude samples lose their lsb's while the lesser amplitude ones keep most of theirs. If that's all there is to it, then it shouldn't be hard to implement.



potul


If I understood correctly, the idea of u-law/a-law is that you get better resolution in the area where data is more relevant.
So if you take your 12 bits signal, applly u-law and then downsample to 8 bits (just ignoring lsb's), you get a better representation of the signal than just ignoring lsb's directly.
Of course, you will need to decode the signal later on to undo the u-law...

anotherjim

First encode!
This text from a TI MCU app-note...
"A-Law and mu-Law Companding Implementations Using the TMS320C54x

During compression, the least significant bits of large amplitude
values are discarded. The number of insignificant bits deleted is
encoded into a special field of the compressed code format, called
the chord. Each chord of the piece-wise linear approximation is
divided into equally sized quantization intervals called steps. The
step size between adjacent codewords is doubled in each
succeeding chord. Also encoded is the sign of the original integer.
The polarity bit is set to 1 for positive integer values. Thus, an 8 bit
m-255 codeword is composed of 1 polarity bit concatenated with a
3-bit chord concatenated with a 4-bit step.
Before chord determination, the sign of the original integer is
removed and a bias of 33 is added to the absolute value of the
integer. Due to the bias, the magnitude of the largest valid sample
is reduced to 8159 and the minimum step size is reduced to
2/8159. The added bias enables the endpoints of each chord to
become powers of two, which in turn simplifies the determination
of the chord and step. Chord determination may be reduced to
finding the most significant 1 bit of the binary representation of the
biased integer value, while the step equals the four bits following
the most significant 1."

...is the nearest I've yet found to plain speaking - and it reads to me, as something completely different from others on the same subject!
After reading through it at least 5 times I think I might be getting it !

ElectricDruid

Quote from: anotherjim on October 16, 2015, 02:29:45 PM
First encode!
This text from a TI MCU app-note...
"A-Law and mu-Law Companding Implementations Using the TMS320C54x

During compression, the least significant bits of large amplitude
values are discarded. The number of insignificant bits deleted is
encoded into a special field of the compressed code format, called
the chord. Each chord of the piece-wise linear approximation is
divided into equally sized quantization intervals called steps. The
step size between adjacent codewords is doubled in each
succeeding chord. Also encoded is the sign of the original integer.
The polarity bit is set to 1 for positive integer values. Thus, an 8 bit
m-255 codeword is composed of 1 polarity bit concatenated with a
3-bit chord concatenated with a 4-bit step.
Before chord determination, the sign of the original integer is
removed and a bias of 33 is added to the absolute value of the
integer. Due to the bias, the magnitude of the largest valid sample
is reduced to 8159 and the minimum step size is reduced to
2/8159. The added bias enables the endpoints of each chord to
become powers of two, which in turn simplifies the determination
of the chord and step. Chord determination may be reduced to
finding the most significant 1 bit of the binary representation of the
biased integer value, while the step equals the four bits following
the most significant 1."

...is the nearest I've yet found to plain speaking - and it reads to me, as something completely different from others on the same subject!
After reading through it at least 5 times I think I might be getting it !

At the risk of confusing you more, another way to think of these exponentially-scaled encodings is as a floating-point representation. Whereas most floating point representations are done with loads of bits, here we're only using eight. So encode positive/negative into one bit, encode the power into another three bits and then encode the value in four bits. Going back to linear at the other end is easy because the power bits basically tell you how many place left to shift the four value bits before you output them.

I'm still thinking about TI's "add offset 33" trick. I'll post again if I see what they're on about there. They're thinking about the exponentially curve as being approximated by a piecewise linear function, and then each linear section is approximated by the sixteen steps of the four-bit value. It's all the same thing, but that's *another* way to look at it.

HTH,
Tom

anotherjim

#8
Yep, I make it an exponential and no lookup tables required.
How I make it thus far...
The 3 exponent bits are not the value of the exponent, but an offset that points to the position of the most significant "set" bit in the most significant 8 bits of the sample (after the sign bit). The 4 "data" bits are the 4 bits immediately after the most significant bit. Any lesser bits are discarded.

The added "bias" value is the minimum needed to ensure there will be a set bit in the 8bit exponent field and may help retain the "weight" of some lesser bits by shifting them up in value . It is subtracted at decode. 
They also appear to set the next least significant bit after the data bits at encode, presumably to ensure no underflow when the bias is subtracted?

Edit - this bit may be just a delimiter added during decode to aid the process and have no value in the result.

I think the need for the above actions may depend on sample size. I'm going to have a play (on paper) with simplifying it for 12bit samples. Might even work better in this case to have 2:5 instead of 3:4.

Practical...
The Mega328 ADC is only 12bit AND does not have a signed result mode (which surprised me when I checked the datasheet). Compression has to have a signed value to work from. Conversion between unsigned and signed is trivial, but would have to be reversed to use PWM output.
Of course, I could use an external codec - but that's not the point ;)

g_u_e_s_t

Quote from: anotherjim on October 19, 2015, 07:08:21 AM
Practical...
The Mega328 ADC is only 12bit AND does not have a signed result mode (which surprised me when I checked the datasheet). Compression has to have a signed value to work from. Conversion between unsigned and signed is trivial, but would have to be reversed to use PWM output.
Of course, I could use an external codec - but that's not the point ;)

the meag328 is only 10b in theory, and closer to 9.5bits in practice (at 32ksps).  signed integers are much better for doing math on, but for things like delays, where its mostly just adds/subtracts, unsigned is fine.  but, when you do compression, you throw away information, so your really small signals end up being mid-range values when you use unsigned values (0 input = 1/2Vcc = 512).  this will confuse the compression algorithm.

the system you describe will work, but reduces your resolution to 5b across the board (or 6b for a 10b input).  lookup tables are extremely fast, and with only 10b input, thats just 1k of program memory, which is nothing for the m328.  so you could just do a 1:1 mapping of values and be done with it.  another possibility is to just pack your 10b values into memory as efficiently as possible.  each time you load/grab a value from memory, you grab 2 values, and you keep track of which of those 16b are the 10b you want.

g_u_e_s_t

just thought id add that there would be another 8b lookup table to do the uncompression (512B of memory if you didnt pack it).

ElectricDruid

Quote from: g_u_e_s_t on October 19, 2015, 04:27:44 PM
just thought id add that there would be another 8b lookup table to do the uncompression (512B of memory if you didnt pack it).

They're right, you know. It's a K-and-a-half of program memory, which is nothing. There's no question - do it with lookups in both directions.

Tom

anotherjim

Yes, tables would be sooo much easier.

But which table? I know calculators for sine - so how about sine? I get a table very nearly linear about the zero crossing and most compression at the peaks, where the signal will rarely go.

Packing uncompressed 10bit data is probably practical. I can imagine have 2bits from four samples in a byte in a smaller buffer, rotating them through in step with the 8bit part in the main buffer, but that's using more RAM = less delay. As it is, I'm expecting a maximum tap at 60ms, which is room sized.


ElectricDruid

Quote from: anotherjim on October 20, 2015, 12:05:51 PM
Yes, tables would be sooo much easier.

But which table?

Well, you were trying a exponential-based floating point scheme, so why not try that? The beauty of tables is that it wouldn't actually be that hard to write the code and then try out different flavours of tables afterwards. some of them will preserve lower volumes better and reduce noise on trailing decays, others will reduce distortion on peaks. It's going to be a trade-off, as ever. And at 8-bit accuracy, it's going to be a pretty frickin' big trade-off, let's make no mistake about that!

But that's not to say it isn't worth doing. My own experience with 8-bit oscillators made me think that actually 8-bit sound doesn't sound "bad", it just has "character"! So try it, and naysayers be damned!

Tom