help on matlab algorithm -pitch detection-

Started by 12afael, September 08, 2006, 06:13:00 PM

Previous topic - Next topic

12afael


well, some months back I began with this project to create a pitch correction algorithm  for vocals.
http://www.diystompboxes.com/smfforum/index.php?topic=44241.0

I´m on the stage of detect pitch. it work well but too slow, so if someone with more experience on matlab can give me a hand, here is a description of how it work.

first I take a window of a multiple of the smaller frequency 50hz. then I do an autocorrelation of it, I select the positive delays and take a window within which are the interest frequencies(50 - 1260 Hz). Then I use the max instruction to get the maximum of that window (which is the period of the fundamental frequency). in order to have a better precision I apply spline around the maximum and again apply max . finally I take the period to get a new window for the next loop.
I use the amplitude returned of the max function to discard no periodic waves(using a threshold value).

I wonder if some of the functions that I use are slow or I have a trouble on other place.
any help/comment are welcome.

Quote% get a section of vowel
%a diferencia de la vercion anterior este algoritmo solo aplica la
%interpolacion spline a los valores mas cercanos a el maximo entontrado a
%sampleo normal. esto para tratar de reducir el tiempo de proceso.



[c,fs]=wavread('150.wav');  % use here a short wav
ms20=fs/50                  % minimum speech Fx at 50hz 882 a 44100=0,02seg
ventana=3*ms20;             % este valor indica el largo de la ventana de analisis de frecuencia
presicion=8                 % este valor mejora en este factor la presicion en la deteccion aumentando la resolucion por spline.
contador =1;                % este valor entrega el numero de ciclos.
threshold=0.1               % este valor indica que niveles de la amplitud de la autocorrelacion se tomara como señal no periodica (entre 0 y 1)

largo = length(c);
frecuenci=0;                % se definen para las matrices resultados
posicion=[0];
posicionabsoluta=0;
num=1;
a=1;

while    largo-posicionabsoluta > ventana             % ************ inicio del loop

x=c(posicionabsoluta(contador)+1:posicionabsoluta(contador)+ventana);        %selecciona trozo
contador= contador+1;           %incrementa contador
%plot waveform
t=(0:length(c)-1)/fs;           %time of sampling instants
subplot(3,1,1);

plot(t,c)
legend('waveform');
xlabel('time (s)');
ylabel('amplitude');

% calcula autocorrelacion
r=xcorr(x,ms20,'coeff');        % en este caso ms20 es la frec minima

%plot autocorrelacion
d=(-ms20:ms20)/fs;  % times of delay da el vector para el grafico debe ser de largo ms20 que aparece en r   
       
subplot(3,1,2);
plot(d,r);
legend('autocorrelacion');
xlabel('delay (s)');
ylabel('correlation coeff')

%deteccion de peaks
ms2=fs/1260;                 % 1100 maximum speech fx at 500hz ahora a 1100hz oooooojjjjjoooo debe ser entero para acelerar el proceso
ms20=fs/50;                  % minimum speech fx at 50hz

%just look at region corresponding to positive delays

r=r(ms20+1:(2*ms20+1));     % toma estos valores de la autocorrelacion (ms20+1 es el centro???)
tr=r(ms2:ms20);             % seccion seleccionada entre frecuencias
[pmax,px]=max(tr);          % este maximo de la correlacion original es necesario para calcular la posicion en forma presisa a la frecuencia de sampleo original.
        u=(px-2:px+2);%/fs;   entrega los valores(samples numero) cercanos al punto maximo
        tr=tr(px-2:px+2);
        xx =px-2:1/presicion:px+2; %px-2/fs:1/(fs*presicion):px+2/fs; %el factor central es en cuanto se divide las muestras en este caso(0.25) entre 4
        yy = spline(u,tr,xx);
        %plot(xx,yy,'.')

[rmax,tx]=max(yy);
frecuencia=1/((((ms2+(px-2))-1)/fs)+((tx-1)/(presicion*fs))); %el 2 de px-2 depende de cuantos elementos se tomaron para el spline   


%Threshold, este control elimina los errores por letras sin frecuencias
%como la S . usando el valor de amplitud del maximo de la autocorrelacion. si el valor es menos al valor de threshold la frecuencia sera 0 
if pmax<threshold
frecuencia=0
pk=ms2

end
% matriz de resultados
frecuenci=[frecuenci,frecuencia];   % esta es la matriz de frecuencias
posicion=[posicion,(px)];              % px es la matriz de posicion(px es la posicion en resolucion fs
num=[num,contador];                 % esta es la matriz de posicion
posicionabsoluta=[posicionabsoluta,sum(posicion)];

end                                 %********************* fin del while

subplot(3,1,3);
plot(posicionabsoluta,frecuenci,'.')

sorry for the spanish notes

12afael

haha I´m a newie on this :icon_redface: :icon_rolleyes:, I do two plots on each loop now it take the 10% of the time :icon_mrgreen:

SeanCostello

Hi:

I'm not super skilled with MATLAB, but I do know that if you can express things as a vector or matrix operation, as opposed to using a for or while loop, things go MUCH faster.

Sean Costello

Transmogrifox

One up there on Sean's advice.

Additionally, MATLAB code is not known for its hardware efficiency.  This would best be done in C++ at the sacrifice of the simplicity of MATLAB functions.  However, using matrix and vector forms will greatly improve the efficiency of MATLAB since MATLAB "thinks" in matrices.  MATLAB means "Matrix Laboratory".

I personally would use the FFT function from the signal processing toolbox.  You will have to do some reading to learn how to interpret the output of the FFT, but it will be of great value to you.  Then you can create a gaussian or a raised cosine windowing vector, multiply by a section of the signal vector (same length segment), then perform the FFT.

You'll have to play with the windowing length to optimize the response to a reasonable interval for this application.  Too wide, response will be too slow, too narrow, frequency resolution will be lacking. This is not a technique I would use in real time, but for processing a wav file it is a very powerful tool.

Another approach that may work is this:
Create a matrix where each row is a sine wave of a desired frequency.  If you want to correct pitch to A440, then base the frequencies on that.   Window each of those functions with a guassian or raised cosine envelope, then go through a table and multiply each with the signal.  Search for the largest DC level (average the function over the entire windowing period).  This gives you the fundamental.  This type of approach is similar in principle to how an analog spectrum analyzer works.

trans·mog·ri·fy
tr.v. trans·mog·ri·fied, trans·mog·ri·fy·ing, trans·mog·ri·fies To change into a different shape or form, especially one that is fantastic or bizarre.

12afael

actually the algorithm work with matrix and vectors. the while loop is just to process every period separately.

the main advantage to use autocorrelation on time domain algoritm is that you don`t need a large sample to get enougth frequency resolution.

I can get the prequency of each period just using 2 or 3 periods  , that is I need just 30ms to get the frequency of a 100Hz wave and with a better precision than the human ear. you can`t do that with FFT. 

now the algorithm use a lot less time , still I can compile the m file to get more speed.

another thing that help to get speed is use the matlab operations and not try to re implement them.
an autocorrelation on matlab is a precompiled operation and take less time.

Gabriel Simoes

Well, but applying Fourier Transform for time domain singals may give you more than enought resolution with small windows too.
There are thons of interpolations algorythms outhere that perform well and uses minimum computation ..

In my graduation project (a software for automatic transcription of monophonic melodies to partiture using Lilypond description code) I used FFT plus Grandke interpolation method with the Hann windowinhg and it gives you less than 3% error.

50 Hz in a 44100 frequency sampling force you to use truncate and window a minimum of 1764 samples and since FFTs works best with root of 2 numbers, 2048 samples, and for sure an overlap of 50% is enought work.

There are more precise interpolations you can use, even with other windowing methods, and other algorithms properly designed for singing voice or that may be adapted from monophonic transcription.

About the computational time it takes to work ... well, welcome to matlab.
Maybe you can port it to c or c++ and optimize your code using simd instructions.
Nice work,
Gabriel Simões

Gabriel Simoes

Well, I'd like to ask you sorry for my bad english,
I was in a hurry yesterday and typed my ideas too quick, and then you can read the result in the post before.
I don't know why but the system is not letting me edit it so all I can do is apoligize and say english is not my country's main language.
Anyway, if any doubt appears I can write it again ...

12afael

Hi Gabriel, Do you think is posible to get the frequency just with two cycles? 20mS large window for a 100Hz signal. I think you need a very high sampling frequency for reduce the error and using a interpolation algorithm is just useful if your originals point have a small error.

I don´t know the grandke interpolation, maybe that´s the magic tool that breaks the theorem of the uncertainty .

I´m not an expert I will try to study your approach to the problem .
thanks

12afael

Gabriel Simoes

Well,

It depends on the precision you are looking for ...
Grandke´s interpolation method gives you a 3% error aproximation wich is more than enougth for things like automatic transcription (my main research topic). The idea behind using 2*T as the window size is because real signals oscilate and using a window of T size may not be enougth for certain moments when the lower frequency can go a little down 1/T, but in thesys T should be the minimum, but it takes you to a poor adjacent bins resolution, etc ...

There are better interpolation methods that require a little more computational resources like Quinn´s or Marchand´s .... you just need to take a look, but well, if you are going fine with autocorrelation, it may be the best path for you.
An other thing you could look for is for an article written by a brazilian student called Adriano Mitre, about precise frequency estimation using fft .... where he shows a method of re-estimating the f0 of a monophonic melody by using Klapuri´s psycoacoustic motivate critical band function.
What I can tell you is that if you´re going well with autocorrelation, maybe this is the way you shoul keep on going .... all I wanted here was showing you that there are other paths to archive you main objective.

I working on my next project, an evolution of my first automatic transcrtiption system to work integrated with Lilypond since it uses lilypond´s code description for transcribing the melodies to partiture.
Good luck and keep working on it =)
Gabriel