OT - Pitchshifting Question for the DSP guys (Peter?)

Started by travissk, July 24, 2004, 03:10:31 PM

Previous topic - Next topic

travissk

Here’s my understanding of a “quick and dirty” pitch shifting algorithm: you read samples into a circular buffer, which has write and read pointers. These move along at different speeds, the write pointer moving at the device’s sampling rate, and the read pointer moves at a speed that’s either faster (pitch up) or slower (pitch down). This means that they will cross each other one way or another, but it provides the easiest way to “resample” a live signal.

I know there were a couple of catches, the main one being that two read pointers are used  instead of one because you can get a pretty bad noise (Pop? Click?)  when the read and write pointers cross. I’m not sure of the actual formulas, but I know that you don’t want to use the “read head” that’s colliding with the “write head”, so the mix is 100%/0% at that point. Another catch is that the formants are shifted along with the pitch (as it’s just resampling), creating the chipmunk effect. From what I’ve heard from the Whammy patches on my RP-12 (and in recordings), that seems to be what’s going on as the timbre of my guitars gets mutilated.

So I guess my question is: is that quasi-resampling all there is to it? Something tells me there is more going on, especially because everyone talks about “tracking” and the fact that shifters often behave differently before or after distortion. I read a paper that dealt with voice shifting using physical modeling, which might be too much for a small DSP system, and I’m still not sure any effort is made to move the formants back to the original place, but there must be extra stuff you can do that will differentiate a $400+ Whammy I (or an Eventide) from one of ten effects on the $69.99 XV-amp.

I guess while I’m asking DSP questions, what method of pitch detection is commonly used in a realtime hardware app? From what I remember, you find harmonic frequencies, calculate the distance between them and then use a heuristic to pick out the fundamental (the loudest one isn’t always the right one). To do this, do you use FFT’s (which is my guess), autocorrelation or something else? I’ve seen math for doing both, but I’m not sure what to use in a realtime environment.

Thanks; again I’m just curious; while I’d like to get into DIY hardware DSP eventually I doubt I’ll have the time for a while--I’m not even nearly caught up on the analog side of things :). I also have plans to get through the (free!) dspguide.com book one of these days, so if the answer is detailed in there feel free to say “go look it up” :D

Thanks again,
-Travis

downweverything

I think your idea is basically the same thing as changing the sampling rate on the output, which im not sure it would work as the output would be constantly behind the input making the delay longer and longer.  And im not exactly sure how you can make the read move faster than the write unless it can magically predict whats coming next.  

Maybe there is some kind of algorithm to amplitude modulate (multiply by another wave form) to shift the freq spectrum in reference to logs (how we hear pitches).  I know its as simple as multiplying by a cosine to shift a digital frequency spectrum.  However that is linearly (?), and it shifts in both the postitive and negative directions too.  You can maybe devise a simple filter to get rid of one of the direction or just change the math.  Just some ideas.

Of course you can always do it the Fourier way which is probably what the "tracking devices" do (I really don't know just speculation).  You can take the fourier, find where the pitch is, do a bunch of math to calculate the old/new harmonics and move the pitch to a new fundamental, then do the inverse fourier to put it back in the time domain.  however this prob wont work too great with a bunch of instruments at once whereas something along the lines of the amplitude modulation would.

Let me repeat, I have no idea what they do in commercial effects, these are just some ideas on how to maybe achieve this after taking a few DSP courses.  Cheers.

travissk

Thanks for your suggestions - I've had a couple DSP courses, but they've been textbook and math-based and provided you with the math so that when you got into DSP hardware you would understand what's going on. No code was given; it was all math. This was in the EE department; our music department has a signal processing course that's a little dumbed-down but deals with programing and realtime considerations. I was just wondering which approach was fastest/best given today's low-to-middle-end DSP technology - the sort that you would find in a DIY box.

The way the easy approach works is sort of a "resampling hack". You're right that you can't predict the future, so this method sort of gets around that by using some information from the past. Here's the empty buffer: the W is the write pointer
(W---------------)
now, first the buffer needs to be filled up, then the Read head can start
(========W--)
(R=======W--)
The write head advances one step, the read head goes two (octave-up)
(==R======W-)
Same thing
(====R=====W)
And again, but the write head loops back
(W=====R====)
(#W======R==)
(##W=======R)
(#R#W=======)
(###RW======)
and on the next step the write ptr runs into the read ptr; this is why you'd use two read pointers to avoid some ugly noise. It continues as you'd expect
(#####X=====)
(######WR===)
(#######W=R=)
(R#######W==)
(##R######W=)
(####R#####W)
(W#####R####)
($W######R##)
($$W#######R)
etc

Essentially, you're using a little bit of the past as your resampling material.

I read some articles this morning, and it seems that with some better DSP hardware the better methods are indeed possible:
Phase Vocoder (frequency shifting)
Time Domain Harmonic Scaling - (involves fundamental pitch detection)

I'm now guessing one of those two approaches is used, because it would explain how cheap shifters go crazy when placed after distortion, or when you play multiple notes. It also explains that "shimmering" error that can sound sort of cool (i.e. on Subterranean Homesick Alien by Radiohead)

And now, at this point I'm guessing you use phase vocoding on granular overlapping windows of sound, crossfading the results.

christian

These are all done with weird Fourier transforms which make the algorithm sound so wacky that you´d never think that it would bend the pitch.
But you could check out the analog version on Mark Hammers page (URL?)
you could do that digitally too. Just use two memory banks in parallel, each read heads are "modulated" so that they go faster/lower while the other goes lower/faster. Then just switch for the one thats faster so at one output you get pitch up, other output switches for the lower ones causing the pitch down.

I´ve been thinking of another method few days now(weird coincident this thread BTW!).
Basically when you are sampling the input, you always test the "previous" sample and "this" sample and calculate the frequency between them, or just the voltage difference. You take this voltage difference and amplify it by the factor that is the pitch amount you want to bend. Add this to output voltage that is initialized with 0.
Then you have a funky system that checks if the voltage difference is positive or negative, then you compare it to last difference and check if the waveform has gone another way, in this case you have a peak. Save this peak voltage and now you have a "reset" voltage for the output. So once the output voltage goes over this voltage, you change it´s direction. I thought this for DSP´ing, but I think it´s actually possible to do this in analog(at least in semi-analog)..

Check this page out if you´re into DSP:
Music-DSP archive
who loves rain?

Christ.

downweverything

im not getting your buffer demonstration,  certain parts of the buffer are getting read twice before being wrote to again.  seems like you would have some REALLY abnormal distortions in your read waveform.  I think that some sort of amplitude modulation would be the way to go but then again you said "quick and dirty" and your idea would prob do just that.  im interested in listening to the result... maybe ill throw the idea in matlab when i have some spare time.  ill let you know if i do.

travissk

Yes, that is correct that some parts get read twice, and some parts never get read at all, depending on what pitch you're bending to.

I was interested in hearing what this sounds like as well, specifically if any strange artifacts are created; I'm not at school so I don't have my computer with Matlab on it (that'd be the easiest way), but it would also be easy for me to create in pd (open-source version of Max/MSP) or as a VST. I'll post here if I get around to creating it.

Thanks for the link; I mentioned it above but I'll repeat here that there is a huge book available for free at www.dspguide.com . I've been meaning to read it and see if it's any better than my DSP textbooks, but hey, you can't argue with free :)

btw, Mark's site is at http://hammer.ampage.org/

christian

I saw that book! Looks promising, but I feel way too stupid these days to understand even a bit of it  :shock:

I doubt that any pitch-shifter actually detects the basic frequency of incoming signal, but that´s some sort of lo-fi approach. Zero-crossing with digital samples is easy as what, but guitars specially don´t have a steady waveform, it throws a lot, so you wouldn´t get a decent frequency out of it. Unless you´d use peak detector(which is easy too) and a little system that "qualifies" these peaks as actual peaks of the waveform. Then you´d calculate the frequency out of peaks/sec (or peaks/n*samples) and you´d get a frequency which you would multiply or divide and use this value to drive an waveform generator.
who loves rain?

Christ.