how we sang and how we listened?

Started by 12afael, April 12, 2006, 02:20:03 PM

Previous topic - Next topic

12afael

my first post on this side of the forum.

I`m working on a pitch correction algorithm for my titulation project(like autotune), I`m on the stage of search info and I have some troubles with the fisiological part.

I need info about
how much we can distinguish a out of tune voice?
we can listen differences of how many herz?
is this dependent of the pitch of the tone?
is this dependent of the spectrum of the tone?(a sine wave v/s square wave)

how many time we need to know the pitch of a note?the resolution/time of the spectrum analisys (DSP) is applicable to the human ear?

I need some info about frecuency masking.

if someone have some info or comments are welcome!

Peter Snowberg

Pitch modification without artifacts is already close enough to voodoo.

I don't mean to be short, but the answers to the questions you ask require several years of study just to understand the problem and possible ways of approaching a solution.

This one is not easy.
Eschew paradigm obfuscation

12afael

QuotePitch modification without artifacts is already close enough to voodoo.

I don`t understand what you call artifacts. but pitch change is already do it . even a whammy do the work.

there is a lot of software (plug ins) that make pitch correction, autotune, yamaha pitch correction , tc helicon,
etc. even there is a clouple of rack prosessors that can do the work.

the problem is not how , is how much.

I`m doing some experiments generating tones with cool edit pro. I found that we can discrimintate diferences of less than 0.05Hz.

and a sine wave is more dificult to discrimintate than a square wave. that mean , if we have a more complex spectrum our ear work better.
that could explain that the resolution/time of a spectrum analyzer is aplicable to our ear.

about masking I found that we need some time to listen a diference of a few of hertz , if we move the tunning peg of a guitar we need some fractions of seconds to listen the change  .

this is a long work for have a complete perspective of what I need , maybe a doctor or someone with more experience on dsp could have more hints.
all this things are already studied but where !

SeanCostello

A Whammy is not voodoo, but it IS hard. And patented, for that matter.

For both the pitch correction and the pitch shifting algorithm, you will need a good pitch detector. Autocorrelation has been found to work well. You might want to try some center clipping of the waveform, as well as lowpass filtering. Don't forget that autocorrelation can be performed as taking a segment of the signal, generating two FFTs of the signal (one of the straight signal, one of the time reversed version of the signal), and multiply the FFTs together.

Once you have the pitch detector, PSOLA is the algorithm of choice for pitch shifting.

As for how much pitch correction is needed, it all depends on the vibrato, the intonation, etc. Simply quantizing everything to an equal tempered scale will sound like Cher in that "Believe" song, as that is EXACTLY how that song was produced.

A few books you should have:

DAFX: Digital Audio Effects, edited by Udo Zolzer. Expensive, but really good. MATLAB code for the algorithms you wish to perform. It goes pretty heavy into the math, but there is no dodging this for what you want to do.

Digital Processing of Speech Signals, by L.R. Rabiner and R.W. Schaefer. Old-school speech processing. This has lots of neat tricks for preconditioning a waveform for pitch detection.

Quatieri has a good modern speech processing book that would be useful, but I would start with DAFX, and go from there.

Sean Costello

David

Quote from: 12afael on April 14, 2006, 04:13:38 PM
QuotePitch modification without artifacts is already close enough to voodoo.

I don`t understand what you call artifacts. but pitch change is already do it . even a whammy do the work.

there is a lot of software (plug ins) that make pitch correction, autotune, yamaha pitch correction , tc helicon,
etc. even there is a clouple of rack prosessors that can do the work.

the problem is not how , is how much.

I`m doing some experiments generating tones with cool edit pro. I found that we can discrimintate diferences of less than 0.05Hz.

and a sine wave is more dificult to discrimintate than a square wave. that mean , if we have a more complex spectrum our ear work better.
that could explain that the resolution/time of a spectrum analyzer is aplicable to our ear.

about masking I found that we need some time to listen a diference of a few of hertz , if we move the tunning peg of a guitar we need some fractions of seconds to listen the change  .

this is a long work for have a complete perspective of what I need , maybe a doctor or someone with more experience on dsp could have more hints.
all this things are already studied but where !

"Artifacts" are unwanted tonalities induced into your signal by the processing algorithm.  This is why the Whammy and the other pitch-modifying devices are expensive and complicated.  You're correct that these things have been studied.  The results are in patented and encrypted machine code in these devices.

Transmogrifox

QuoteYou're correct that these things have been studied.  The results are in patented and encrypted machine code in these devices.

Even so, they're grainey and unnatural sounding.  The pitch correction devices for small deviations such as fixing an out-of-tune vocalist are not bad, but pitch shifters and electronic whammies are very sterile sounding to me.

I think that's why this was compared to voodoo.  It is not really a hobby project, it's a career.
trans·mog·ri·fy
tr.v. trans·mog·ri·fied, trans·mog·ri·fy·ing, trans·mog·ri·fies To change into a different shape or form, especially one that is fantastic or bizarre.

Peter Snowberg

Quote from: TransmogrifoxIt is not really a hobby project, it's a career.

Exactly my feelings.

Maybe in several years this will be a simple hobby project, but right now it's a very complex problem that takes quite a bit of knowledge and a lot of processing power to accomplish; thus the voodoo analogy.

Quote from: Arthur C. ClarkeAny sufficiently advanced technology is indistinguishable from magic.


There is a good reason why Eventide products cost what they do.

http://en.wikipedia.org/wiki/PSOLA

I wrote a simple shifter a bunch of years ago using the simplest method of them all, resampling. It was pretty useless for guitar but I did find out that in The Chipmunks, Alvin is Dave and Dave is Alvin.  :icon_biggrin: 

PS: We're VERY lucky to have somebody as knowledgeable as Sean Costello on these topics! 8) 8) 8)
Eschew paradigm obfuscation

12afael

I know the problem of the artifacts, you can feel it on autotune . provably is like simulate a tube amp, we never will get a real tube sound with a digital simulation but a near sound is enought for a lot of aplications.

a analog pitch shifter is produced on some radio receivers where the local oscilator is out of tune , it have articacts too, but if the pitch change is small the artificial sound is small too.

thanx Sean , I read the article of www.dspdimension.com about the use of the DFT for pitch detector. I think the use of wavelets could be better here. I will look the autocorrelation algorithm.
I didn`t know the PSOLA algorithm, I will check it.

QuoteIt is not really a hobby project, it's a career.
yeah you are right. I have 8 month to make it work . it`s not a hobby, is more funny make stompboxes...  :icon_mrgreen:

I don`t want create a new technology or even make something better to antares or eventide systems, I just need something that works moderately well. Seems to me, nevertheless, that the problem of the correction signal is something in which is possible to work. how detect a vibrato or prevent the cher "belive" effect is more important for me than a good sound , a future develop could make the artifact problem obsolet on a few of years.

years ago if somebody had said to me that in the future anybody could sing in tune probably would have doubted it, now even we can imagine in some years more that the voices modeling allows us to sing like elvis for example.



Peter Snowberg

With 8 months at least you have time.  :icon_biggrin:

What DSP or signal processing environment are you using?

Best wishes! 8) 8) 8)
Eschew paradigm obfuscation

12afael

my goal is not use a dsp hardware, so real time processing is not nesessary.
often processes that took day, with the advance of the computers are made now in a pair of seconds.
my original idea is write a vst plug in on c++, but that will depend on the advance of the algorithm. provably I will write some code on matlab.
for a long time I have been wanting to introduce me in audio plug ins , I believe that there are several audio plug ins that could be improved.

there is too much analog diyers, I have to specialize me on another things to make some money. :icon_confused:

puretube

Quotethere is too much analog diyers, I have to specialize me on another things to make some money. 

oops,... I`m on the wrong boat  :icon_question:  :icon_eek:




The Tone God

I must resist the urge to rant. Must...Resist...Urge...To...Rant. MUST...RESIST...RANT!!!

Andrew

SeanCostello

Quote from: 12afael on April 16, 2006, 05:58:31 PM
there is too much analog diyers, I have to specialize me on another things to make some money. :icon_confused:

Excuse me a second...(lights cigar with burning $100 bill, adjusts monacle, straightens waistcoat, settles back in overstuffed leather chair)...

There IS money in DSP, but it is a pretty difficult market. The plugin market is tough: you have lots of free plugins to compete with, and piracy makes earning a profit on your plugins difficult. Many plugin developers are moving to hardware solutions, and a lot of them admit that this is due to piracy (a PCI card with a DSP or embedded x86 CPU is a great dongle). Pedals will not have the piracy issue, but read further down in this forum for some of the issues in creating a stand-alone DSP platform - especially the FCC testing.

As far as a realistic sounding pitch corrector: Have you really tested AutoTune and the like to make sure that they can't do what you are talking about? We've all heard songs on the radio where we can identify AutoTune, but I have a sneaking suspicion that there are far more songs where it is used well, such that we can't hear it. Being able to draw in pitch contours is a very powerful feature. A year or so ago, there was some Shania Twain song (disclaimer: my wife was flipping around the channels and turned on CMT), where her voice was doing some really weird thing, and I am convinced that her hubby (the Def Leppard producer guy) was drawing in some really strange pitch scoop and used that as a hook.

In my opinion, the pitch detection is the most difficult part. Once you have that, the PSOLA is basically a simple granular synthesis algorithm working off of a delay buffer. Identifying vibrato, note scoops, etc., would go under the field of "Feature Detection." Xavier Serra's group in Barcelona has done a lot of work on that.

Sean Costello

12afael

sorry for reply to this old post but I must say thanx to Mr.Sean Costello , before study different option your advice about autocorrelation and psola were what I finally use.
the algoritm work well with some small details but it was far enough for my teachers so Finally I AM an electronic engineer
it took me a lot of years, a lot of pain, sweat and some tears.

I must say thanx too to all the comunities of electronic and audio diy because they were an active part of my formation as engineer.

so thanx to :
diystompboxes
groupdiy
ampage
plexilandia
and all the others

rise a  beer  for me!
and thanx a lot again Sean!



Rafael Castillo
Electronic Engineer ;D

bioroids

Eramos tan pobres!

CGDARK

¡Felicidades! (Congratulations!)

CG ;D