NEW YORK, 3:20 PM, TUE MAY 13 | 49 POSTS IN THE LAST 24 HOURS | tips@gizmodo.com | SUBMIT A TIP | RSS
UK | FR | NL | IT | DE | SP | JP | AU

Recording Compressed to 1,000 Times MP3 Rate Could Be the Future of Music Playback

The University of Rochester has just devised a way of reproducing music in a file that's compressed 1,000 times smaller than an MP3 file. The way they do it—physically modeling an instrument in a computer and then feeding it input variables (breath, tongue, fingers) in order to generate the output tone—seems super obvious. People were making music with MOD files by recording one tone and generating different notes with it back in the '90s. But actually reproducing the instrument wholesale? That's amazing.

Instead of recording music like we do now, we can just model the instrument the performer uses and what they do with their hands/mouths/feet. This way you can get a (theoretically) 1:1 reproduction of music even years after the original recording is gone. And why stop at instruments? Why not model a guy's vocal chords, allowing Sinatra to croon on about how it's tough to find love when you're stuck in a casket in the year 2525. Putting words into his mouth, in essence. Well, not his, since he's not around to model, but you get the point.

The processing power needed to play this is going to be pretty intimidating, but this is what we see happening for iPods and other playback devices in a few decades. So says Gizmodo. [Eurekalert via Hypebot via Tech Digest]

3:20 PM on Thu Apr 3 2008
By Jason Chen
9,953 views
55 comments

Comments

  • The real question is, when can we translate this to porn. My 1.5TB external hard drive is close to getting full.

  • Sweet! More compression! We hate dynamic range!

  • Just in time for alien archeologists before the Death Star blows us out of the face of Earth:
    [www.space.com]


  • Image of nutbastard nutbastard at 03:28 PM on 04/03/08 *

    How is this different than MIDI?

    Seems like the same thing just with a bit more flexability.

    The problem is that they'll have a hell of a time replicating complex electric guitar, and vocals have yet to be synthesized in a reasonable way.

  • @ILikeMacsWhatAboutIT: easy, just think of the porn stars as instruments, one for the girls vocal cord, one for the wang (going in and out) and bam, you have it, but you still have to deal with that messy video stuff...

  • If I hear them "reproduce" Sinatra singing "Touch My Body" I'm going to lose it!

  • @nutbastard: Exactly... I was really excited then I was like "oh wait this is already done... it's called a midi file".

    The way they do it-physically modeling an instrument in a computer and then feeding it input variables (breath, tongue, fingers) in order to generate the output tone

    Yeah those are called VSTi (virtual instruments) AU (audio units) DXi (direct-x instruments). This is not new--this is how keyboards have been working for decades.

  • Hartmann has been doing something similar with some pricey synthesizers for a while. You load samples of an intstrument into a PC that creates a model that you load into the synth. Authentic sounds can be reproduced and altered in dramatic way. Hans Zimmer (composer for a lot of the Bruckheimer flicks) is a fan of their Neuron tech.

    [www.hartmann-music.com]

  • I'm no audiophile by any means, and I try to keep my music around 156kbps. At least i've noticed if I keep it there I don't hear anything being lost as opposed to iTunes 90-something faint "air noise". so with this compression will music sound flat and like it's in a wind tunnel?

  • @rainfever: I guess this is why some of the greatest minds are using their talents for sex robots....SELF-CLEANING sex robots.

  • I agree with everyone that says this sounds *a lot* like MIDI 2.0.

  • @Joseph: true-dat. This sounds like it's a midi-type playback, not "compressed" audio.

  • It would seem to me that the only thing "smaller" would be the "instruction set," not the total file. The paper roll didn't have a lot of information, but it was worthless without the player piano. Are we going to be able to fit an entire "player orchestra" into the firmware of our iPod? I guess eventually we will, but this doesn't seem too exciting (or new, for that matter). And what about vocals? Will every singer need to be computer modeled, and will we need to carry all of those around with us, too? Yikes!

  • @avconsumer: This is compression of data, not dynamic compression processing. Two different things.

    Also, this method does sound like something akin to MOD, MIDI or VSTi data, but if they're talking compression to "1000 times MP3 rate," it would probably have to be a procedurally generated model with a number of control and input variables. Something like this is actually being done in the demoscene already: Remember kkrieger, the 96 kilobyte FPS game? All the assets for that were procedurally generated, including the music, and I think that's what they're talking about here. And yes, that takes a lot of processing power to perform well in realtime.

  • Wouldn't that mean that every MP3K (or MIDI 2.0) player would need a model of every instrument and every possible human vocal cord shape in order to interpret the file? And you think WinAmp is bloatware now ...

  • "... [I]n a few decades," the article states. Well in a few decades, we would not need a 1000-to-1 compression ratio for audio since we will have wearable computers with many petabytes of storage. Now, unless we are going to turn into a society that completely devalues music to the point where people feel the need to carry literally hundreds of thousands of albums in their 14th gen iPods, I just don't see the big deal. And its not like this technology can be applied to video compression or data compression in general, making it even more useless.

    I think I'll stick to my reliable 192kbps cbr mp3 encoder, thank you very much.

  • Storage is so cheap (and getting cheaper so fast) that by the time this becomes possible to encode and decode at a reasonable rate our 22nd gen iPod Touch Micro Implant will have 100 Exabytes of ultra-flash memory and a petabit always-on wifi connection to Wikimusic.org, the distributed repository of all music ever created, lossless.

  • @Brock: No, most likely it would mean that the instructions to create each instrument model are encoded within the file itself, and they are procedurally generated either at the beginning of the track and fed control data from that point forward, or the entire track is procedurally generated as it plays.

    From my experience with MOD and MIDI files, this would have some interesting challenges in things like jumping smoothly to the midpoint of a track.

  • @Joseph: Not only that, but clarinets, electric guitars, and the human voice have on the order of a bajillion different variables...

    Take the guitar, for instance:
    01. how hard is the note plucked?
    02. is the note plucked by pick or finger or fingernail?
    03. is the pluck subsequently muted by a finger [and how long a delay before mutation]?
    04. is the pick scraped against the strings [and how far and how fast]?
    05. how hard is the string pressed against the fretboard?
    06. is the note bent [and how far]?
    07. is a whammy bar pressed [and how far]?
    08. is a whammy bar pulled [and how far]?
    09. does the finger pressing the against the fretboard vibrate [and how far and how fast]?
    10. does the finger on the fretboard do a hammer-on?
    11. does the finger on the fretboard do a pull-off?
    12. does the finger on the fretboard slide from one note to another [and how far and how fast]?
    13. if a pick is used, what thickness?
    14. what gauge strings?
    15. what alloy strings?
    + amplifier settings
    + which [of the hundreds of available] stomp boxes are used & all their settings

    And those are just off the top of my head [and from normal playing, not including oddball stuff like using a bow, pressing the string above the nut, picking with your teeth, or setting your instrument on fire]...

    Creating a simulation of an entire universe requires a simulation the size of a universe.

  • Oh, and plus...

    Does encoding music via this system mean we need to throw out the last hundred twenty years of recorded music and start fresh?

    [Can anyone go back in time and model Robert Johnson's guitar?]

  • @ideaman2020: In spite of what everything else I've said in the thread might imply, I wholeheartedly agree. Some instruments can be *adequately* (meaning, not with 100% precision, but with enough that they sound fitting and accurate) synthesized, but wind instruments, horns, guitars, the human voice, and many other physical instruments have such a large variety of sounds, and such an immense number of variables shaping those sounds, that I doubt we'll be able to make truly decent-sounding models of them for quite some time - perhaps never at all. Either way, I would much prefer a 10MB recording of Tom Waits's actual voice to a 10KB synthesized version of it, no matter how precise the synthesis might be. (Even though it would be hilarious to hear Tom Waits singing "Still Alive.")

  • This is different from compression. In compression, you remove information that is not audible and save the compressed signal. From the description, it sounds like they are doing some music transcription. The difference from this and a MIDI is that MIDIs are created by either sheetmusic, or midi events passed in through a midi device. Or possibly a music transcription program. I'm unfamiliar with MIDI 2.0 but from what I understand, MIDI (1.0?) pretty much keep track of time, intensity and duration per note. The modeling of the generating process of the music can probably create more realistic sounds. And unlike compression, you can regenerate the signal (to be similar to original if the model works well).

    As for the porno guys...if someone comes up with a way to take some snaps of video and determine the 3D objects in the shot, I'm sure they can build a generative model for video as well.

  • @ideaman2020: Oh, and furthermore...

    All that does not take into account the room size, shape, and echoiness [vs. dampened].

    Keith Richards once noted that he can't read music, but he once had someone play one of his songs from a piece of sheet music and it didn't sound at all [to Keith] like what he wrote. I imagine a similar kind of translation [and associated loss therein] would be happening with this method.

  • With storage getting cheaper all the time, I'm inclined to think that music compression is a technology which will fade away completely long before this approach becomes a reality. I could store my entire CD collection (about 800 CDs) in raw WAV format at CD sampling rates on less than a terabyte of storage today. I'd guess a TB drive will drop below $100 dollars within two years and similar drops in price and increases can be expected for FLASH memory SSDs, so why bother with complex, CPU intensive compression schemes at all?

  • Two players playing the same horn will make different sounds. The same player playing the same horn on two different nights will make different sounds. This technology may be a reasonable way to produce a performance, but it ain't going to reproduce anyone else's performance

  • @Ubik2501:

    Either way, I would much prefer a 10MB recording of Tom Waits's actual voice to a 10KB synthesized version of it, no matter how precise the synthesis might be. (Even though it would be hilarious to hear Tom Waits singing "Still Alive.")

    I dunno. It might be very funny in a bizarro humor way. Like this story: [www.songpoemmusic.com]

  • As some others have pointed out, this is just glorified midi and there is no way to playback real music with it.

    The amount of information you would need for each of the variables they mention to acceptably model all the subtle things musicians do when playing music would be a huge amount of information. There are subtle changes occurring in breath, tongue, fingering and more throughout an entire song. It's not just a few pieces of information for each note. Real music has constant changes to all the variables. This could not really capture the true sound and feel of real music without having huge amounts of data.

    Anything done with this method where the compression is 1000 times smaller than an mp3 would be very sterile sounding and would have absolutely no feeling. It would be like listening to a computer generate voice. It wouldn't be worth listening to.

    Also, how would you really record what a musician is doing with all his fingers, breath, tongure, etc. Just capturing that data or trying to convert it from recorded sound, would be an almost impossible task. And it still wouldn't come out even close to sounding right even if you tried to capture all the subtlety and trying to do so would probably generate a file bigger than the sound recording itself.

    This is completely useless in terms of real music playback.

  • @nikster: you make a good point. Top iPods now hold more than 30 times what they did when they were first introduced, and storage equivalent to the original iPod now easily fits on something smaller than a postage stamp. Re-synthesized music of this sort won't ever sound like real music, more like the musical equivalent of a text-to-speech engine.

  • "this is what we see happening for iPods and other playback devices in a few decades. So says Gizmodo."

    Honestly Giz, April's fools ended 2 days ago. You are joking right?

  • better for movies and porn. who can listen to over 1 million albums in a lifetime and actually say they like all of them? I don't think so.

  • Comment on Recording Compressed to 1,000 Times MP3 Rate Could Be the Future of Music Playback What a waste of resources of a university. I can make a program that plays a midi and mp3 file synchronously, great job inventing the wheel again.

  • THIS ALREADY EXISTS! at even more ratio
    I guess you want me to prove it... think outside the box!
    so you want to archive sinatra? here, I already done this for you:

    www.youtube.com/results?search_query=Sinatra&search_type=
    or
    www.google.ro/search?hl=ro&q=sinatra&meta=

    hope to not be banned for the inks...





  • One of the only real uses I see for this technology would be to recreate musical styles and/or artists that are no longer extant, in an economical fashion. You could put the computer into "Big Band Mode," and record a new song without hiring a 30pc band. Musicians and studio time will continue to get more expensive, and at some point this tech will be cheaper, and virtually indistinguishable. The big breakthrough will be when the computer can analyze "extinct" styles and recreate them. Someone as untalented as me could just come up with no more than a simple tune and end up with a symphony. I don't necessarily think this is a good thing.

  • All of you are missing the REAL market for highly compressed music ... (wait for it)...... (patient, now)...GREETING CARDS!

  • so does this mean my Super Mario Bros. soundtrack can be remixed, remastered and re-released on my NES.... God can't wait for that....NOT

  • Image of frigg frigg at 05:55 PM on 04/03/08 *

    @nutbastard:MIDI is a simple control language to trigger synthesized or sampled (i.e. recorded) sounds. The modeling described in this article is a form of synthesis that replicates sounds through elegant little algorithms instead of large files of audio data. Therefore, you'd use a MIDI controller or sequencer to trigger sounds created by these algorithms.

    However, there's so many variables in reproducing actual instruments, that while this approach is an inevitable replacement for sampled sound libraries as everything gets better, faster, cheaper, smaller, it's still got a long way to go... before it passes a musical Turing Test.

    It's also nothing new. There are a number of algorithmic musical models already in the wild.

    For example, Jason speculates about replicating Sinatra's voice in the future. Yamaha actually makes something called Vocaloid [www.vocaloid.com] that models male and female voices today... just add lyrics. Synful [www.synful.com] is a collection of orchestral algorithms that does exactly what this story is talking about, and you can also download it today. It's OK, but still cheesy compared to even a modest sample library which is still cheesy compared to the real thing (although some samples come pretty close).

    Actually, acoustic music is so complex, it's hard to imagine anything short of HAL-level chaos-based computing systems convincingly replicating instruments through algorithms.

  • Image of frigg frigg at 05:59 PM on 04/03/08 *

    @Joseph: "Yeah those are called VSTi (virtual instruments) AU (audio units) DXi (direct-x instruments). This is not new--this is how keyboards have been working for decades."

    VST, AU, and that sort of thing are just plugin formats. Sampling libraries or synthesis engines (like this one) could be packaged in whatever plugin format the developer uses.

  • Image of frigg frigg at 06:11 PM on 04/03/08 *

    @ideaman2020: "Not only that, but clarinets, electric guitars, and the human voice have on the order of a bajillion different variables..."

    I agree. Given all the genuine variables, sympathetic vibrations, harmonic interaction, the vagaries of human performance, etc., bajillion is an understatement.

    "Creating a simulation of an entire universe requires a simulation the size of a universe."

    ...which is why one theory of the universe is that it is a simulation of the universe, a giant computer in which everything from dark matter to the graphic design department of The Gap is part of an ongoing (massively multithreaded) application.

  • so they over-engineered sheet music. impressive? no.

  • @escargot:
    What exactly would you touch to control these Ipod Touch implants?

  • Alright, so the EXTREMELY close parallel between this "new" technology and MID or VST has been pretty much beaten to death. Despite this, I find the broad concepts addressed here to be very interesting from a broader view. Within the realm of innovation, be it industry R&D or academia, there is a distinct divide between the sciences and the arts. This work strives to bridge that divide. Herein rests the real problem.

    Technology of this sort is (and likely always will be) developed by engineers (computer scientists, electrical engineers, etc)... many of which have little music training beyond 8th grade band and/or their iTunes collection. While I believe in the power of technology like this, I'm convinced that it will never be acceptable until these two divides are able to work closely together to develop a quality piece of work from both an engineering and musical standpoint.

    PS: I'm both an engineering researcher and a musician, with a strong interest in VST instrumentation (a la Vienna Symphonic Library).

  • Great, so now every saxophone or violin or piano will sound exactly the same... reminds me of - oh yeah - MIDI.

  • @Ubik2501: If they were encoded in the file, there is no way it could be 1000/th of 4 megs or 4kilobytes. Brock is right, whatever algorithms that would be used to generate the audio would need to be embedded as a standard in every audio device that would support this type of playback.

    This would be awesome cuz then you could make your own favorite song with your favorite artists with your own made up lyrics without having to pay em for it!

    @ideaman2020: Yeah, I could see the deivce turning into a major headache. I mean I have 156 Gigabytes of the East West Quantum Leap Sound Library. Its sampled from a real orchestra in 5.1 surround sound and that is just 4 libraries. To recreate every single sound, there would have to be some procedural ingenious formula that would create a single plugin or synth that could create every sound.

  • Voice reproductions will make one more thing for record companies to copyright and control (in addition to the recording and arrangement).

  • I often work with Garritan Personal Orchestra with a reverb plugin, and just a small orchestra with one reverb plugin is really taxing on my CPU (albeit a modest one-2.4 GHz Pentium 4 Prescott), to the point that it sometimes can't keep up and skips after a while. I'd hate to see how much processing power generating the sound itself would take in real-time.

    And like it was mentioned, there are way too many variables and constant changes in these variables. It'd be an insanely large amount of information. Furthermore, how would one capture all of this data?

  • Image of johnnyabnormal johnnyabnormal at 09:13 PM on 04/03/08 *

    @ideaman2020: Seems so much easier to just record a damn guitar, right? :)

  • Image of johnnyabnormal johnnyabnormal at 09:22 PM on 04/03/08 *

    @Joseph: I have that library too... I think my sound library is quickly approaching 1.5 TB. I have friends who are in the 90 TB range too. I have yet to hear a software instrument that can replicate a orchestra as good as a live performance, no matter how well it was programmed/sequenced or mixed. It's getting REALLY close though.

  • @nutbastard: agreed.

    midi revisited, anyone?

  • I agree this is MIDI revisted! also what about when someone comes us with a new sound, and how will ambient noise ( eg. crowds, noise, applause) be recorded in this scenario...MIDI sucked at that, plus a whole lotta other stuff