When Qualcomm unveiled their new Snapdragon 855 cellular platform, they touted substantial enhancements in all elements of cellular computing. For the common consumer, it’s straightforward to know the finish consumer advantages behind the CPU and GPU enhancements that Qualcomm made in the Snapdragon 855. Higher CPU efficiency interprets to apps loading quicker and higher GPU efficiency interprets to raised framerates when gaming. What’s much less intuitive for customers to know is AI (synthetic intelligence), and particularly, machine studying methods that app and providers more and more undertake akin to synthetic neural networks. Qualcomm made large features in AI workload efficiency with the Snapdragon 855 because of enhancements in computing throughout the board, and particularly as a consequence of the revamped Hexagon 690 DSP. We sat down with Gary Brotman, Head of AI and Machine Studying Technique and Product Planning at Qualcomm, and Ziad Asghar, Vice President of Snapdragon Roadmap Planning and AI, XR, Aggressive Technique at Qualcomm, to study extra about the enhancements Qualcomm made in AI workloads.
Mario Serrafero: “So, the new DSP. Last year, I asked you about the attack strategy with respect to how Qualcomm pushed, promoted, marketed, and communicated the DSP and HVX, in particular. At the time, as an AI block, it was still relatively new to most readers and consumers. So we’re wondering how you’ve seen this evolve since then with the further promotion of the 845.”
Gary Brotman: “First and foremost, once we began doing this again with the 820, it was nonetheless very CPU and GPU centric, and leveraging the DSP and the vector processing capabilities for that basically happened because of the place Google is making an attempt to go with TensorFlow and Eight-bit math. In order that’s the place we actually stretched our legs in DSP, or let’s say the vector processors. Given the maturity of the vector processor that we have now in Hexagon and the approach we have been capable of advance that roadmap so shortly in the subsequent two generations, and the use instances that we noticed, which at the time, primary classification networks have been fairly simple with not a variety of heft. They will run effective with Eight-bit math. A devoted accelerator, even final yr, was a danger for principally allocating space to one thing that will not get used. The confluence to be used instances, and it’s something out of your normal single digital camera, tremendous decision, or segmentation in actual time. This stuff occurring in some instances, concurrently, the demand for having no less than some degree of devoted acceleration you’ll be able to wall off and nonetheless learn cycles on the vector processor and even the GPU. It was the proper time.
It’s definitely one thing we needed to plan for a lot sooner than once we talked final time, however I feel everyone on this enterprise is putting a guess that they know precisely, or shut to precisely, what these workloads are going to be. What sort of precision must be essential, and in the event you did or didn’t finances sufficient compute to fulfill that confluence of use instances which are coming. We’re fairly deliberate in that—Qualcomm’s all the time been use case centric—and we didn’t need to run the danger of getting devoted acceleration that wouldn’t be used as a result of it could possibly be outdated in the final cycle. We see sufficient when it comes to common convolution alone that a devoted accelerator can do a incredible job of. Once more, liberating up the cycles elsewhere. When it comes to the technique that we have now with this new accelerator: It’s devoted, it’s a brand new structure. It’s not a Hexagon by-product. But when you consider a internet immediately, there are specific nonlinearity features that don’t run properly on a few of the devoted acceleration -“
Mario Serrafero: “Yeah, sigmoid, ReLU -“
Gary Brotman: “Exactly, Softmax. And you have to punt them elsewhere, or to the CPU. But in our case, the way that we’ve kind of engineered this under the hood, the DSP is actually the control. It determines where the net runs and where the layers run and can decide if there’s certain things that should run on the DSP as a fallback versus run on the tensor processor. So that pairing actually made a lot of sense to us. But that doesn’t detract from our beliefs and our strategy that every primary core in our SoC has a role, so we optimize across the board, yet there’s still a lot of variability and that’s going to continue.”
Mario Serrafero: “Another topic that we want to talk about is use cases. Like you said, Qualcomm is very use case centric, we’ve seen AI come to mobile in three main areas: speech recognition, sequence prediction like with strings and typing, and obviously computer vision like AI filters, [and object recognition]. Computer vision exploded, now you see it everywhere. I’ve seen with speech recognition, everyone’s got their own AI assistant, everyone’s got their own assistant. Now, all that can be done at the edge with small latency and perfect security. But what’s next for use cases of machine learning, and are all those use cases going to be developed by the big companies in the world – all the Snapchats in the world, the Facebooks out there? How do you see that rolling?”
Gary Brotman: “I don’t think I can point out a killer use case. But the capabilities allow for more computational complexity and in the case of vision, the input resolution can be higher. You’re not working on low resolution images to do bokeh. There was a discussion earlier in the other interview we had around 4K streaming as an example. I’m not going to predict that that’s possible, but the developers that we work with, whether it’s big companies like Google or our software development partners who are actually building the algorithms that are driving a lot of these mobile features, they just want to push more. They want to go farther. If there’s anything that I would see in terms of next steps, it would probably be less about what’s happening above the line or at the app level, and more about what’s happening in the system like improving the way the product works, power management, and even in the camera pipeline, not just on top of it. You mentioned audio, and how many keywords you’re going to support or if you could do noise cancellation on-device. The keyword thing is interesting because it’s not easy to build up the library—you’re memory constrained. So there’s still going to be a balance between what’s local and what’s going to happen in the cloud.”
Ziad Asghar: “I can add a little. So at least the two domains where it’s kind of growing a lot are audio and imaging, today. We can see it having a lot of use cases. Jack talked about it from a camera perspective, we’ve had the AI engine where you can leverage a lot of that for imaging use cases. Some of the ones that were shown today. And then if you look at audio, we didn’t talk as much about it, but we actually added some audio capabilities to the audio block as well. We’re able to do better voice activation in more noisy environments. We’re able to do better noise cancellation [in imaging]. All of those abilities are basically already happening. There are the partners that Gary showed today for the ISP, there are a lot more of those coming. So I think those are the two dimensions that we are more focused on today.”
Gary Brotman: “And then the next step—I’m not going to forecast when this happens—is there is enough compute now where on-device learning and experimentation around actual learning on the device will likely happen in this next cycle.”
Mario Serrafero: “This is probably a topic that’s more fun to discuss, and it’s the fact that Qualcomm is sticking with the Hexagon DSP moniker and HVX while other companies are opting for “neural” so and so. How does Qualcomm see this discrepancy and these totally different methods and approaches with primarily the advertising, however we will go right into a bit later about the heterogeneous compute versus particular block bits as nicely.”
Gary Brotman: “Because Hexagon already has equity built up in DSP, that one would immediately gravitate towards thinking they we’re just extending our DSP strategy. Actually on brand, if you look at all three processors, your scalar, your vector, and now your dedicated tensor accelerator, they’re not all DSP. Hexagon is really a higher level brand than just DSP. There is a handful of DSPs. I think the marketing questions are probably a little bit more difficult to answer because each region is different. China’s very NPU-centric because that is a moniker that had been introduced last year, and that seems to be one that has taken root. I wouldn’t say that that’s worked elsewhere around the globe. Google has a tensor processor, and tensor seems to resonate.”
Mario Serrafero: “A lot of people have their own different names.”
Gary Brotman: “Ultimately, it comes down to what the OEM wants to do. If that matters to their customers, then it’s incumbent upon them to figure out how they can leverage that processing capability and differentiate upon it in terms of capabilities. Our engine, and I think a great deal of the processing capability that we have, would still be very vector and tensor-centric in terms of the overall mix. The dedicated processing itself, the way it does matrix multiplication, it’s the same sort of dedicated processor that an NPU would be [using]. The marketing question is an interesting one, and I forget, what was Keith’s answer?”
Ziad Asghar: “His answer was, ‘you can call it whatever you want to, to be able to sell more product.’”
Gary Brotman: “That was pretty much it; that was right, it was a very blunt answer.”
Ziad Asghar: “I think Gary covered it really well. Some of the people using that moniker as a term in a way that almost states or implies that it’s only limiting it to that block. But what we see is that this whole heterogeneous approach of being able to use the CPU, or a GPU, or a Hexagon tensor vector, gives you different trade-offs in a whole spectrum of precision on power and performance, and that’s what you need today. Because we don’t know what application requires what degree of precision, what requires sustained performance, or what doesn’t require it. So we believe it’s that a full, overall solution because that’s how you get the best experience”
Gary Brotman: “And that’s never changed in any of our conversations, even with a dedicated accelerator. It’s an addition, it’s not a replacement.”
Mario Serrafero: “Yeah, I think it was Keith last year who said, ‘where there’s compute, there’ll be AI.’ And now there’s more compute.”
Gary Brotman: “More compute in every block, that’s exactly right.”
Mario Serrafero: “Now that we are on the subject, we’ve heard many comparisons with a “mysterious” 7nm competitor on Android. Yeah, we nonetheless haven’t any clue who that’s.” (spoken in jest)
Gary Brotman: “No idea.” (spoken in jest)
Mario Serrafero: “But, could you clue us in on these comparisons? How were they measured? What caveats are worth considering? Any other comments that maybe you guys didn’t have time to expand upon in the slides or in the Q&A? I know it’s kind of hard to measure [and communicate] because of the variety of models, so I think it’s an interesting subject to expand upon to let people know why it’s not that easy to make those comparisons.”
Gary Brotman: “It’s actually quite simple. I’ll give you a very simple answer on one specific metric; we’re going to be doing more benchmarking in January. We’ll talk more about the different nets that are used to measure the numbers that we’re basing it on, and that would be standard Inception v3. That’s where we’re deriving that performance and our understanding of where the competition ranks. But in terms of the one that has announced and is out with products in the market, that’s where the 2x and the 3x comes from—well the 3x was against what we had in 845, while the 2x is their measure of performance and state of performance relative to ours.”
Ziad Asghar: “You have devices available, you can actually acquire them and do some of that testing yourself. But I think the only thing I would guard against, it is kind of a Wild West of benchmarking AI. Some people are using some very generalized terms, or mixes of networks which might benefit them in a particular way or not. “Will that align well with a modal workload?” isn’t one thing that individuals are maintaining into consideration. A few of the benchmarks which were floating round do much more of that, and we’re very shut so we all know there are people who find themselves making these benchmarks sway a method or one other relying on what favors them. That’s why it’s much more about precise use instances. It’s additionally much more about the best-in-class efficiency for that use case, and then it’s about getting it completed quickest. I feel these are all the elements that we take a look at. However I feel it can grow to be higher, it can converge. Proper now, there’s quite a lot of totally different choices on the market. I feel you’ll have sure benchmarks keep that make extra sense. At this time, perhaps you can argue Inception v3 is comparatively higher at this time limit.”
Gary Brotman: “In terms of networks, there’s a handful. There’s ResNet, VGG, segmentation nets, super resolution nets—raw performance you could measure these with. The point to take away in terms of benchmarks like companies or entities that are doing AI benchmarking, and they have mixtures of precisions, networks, and formulas that are variable, they’re so variable the results change week-to-week. That’s where it’s truly the Wild West, and we’re keeping an arm’s length. We’re not placing our bets anywhere, because there is so much variability when it comes to the actual performance by some of these networks that are used in use cases, we feel confident that we’re still definitely ranking up there in terms of performance relative to the competition. I should say not ranking but the doubling that we talked about, raw performance.”
Mario Serrafero: “One of the subjects that we are interested in as a site primarily for developers is the democratization of machine learning. Obviously, we have open source libraries which are great, everyone’s offering these amazing SDKs as well, and there’s plenty of education. And now Android NN is available and Google just released ML Kit which simplifies the process. You just call an API, feed it your input, they use a trained model, you don’t have to worry about it, you don’t have to think about it, you don’t have to know any stats or any vector calculus. How do you see that the landscape has evolved in this regard in making it more accessible, simplifying the API, simplifying the documentation, the SDKs, and promoting the inclusion of third-party developers, not just big companies?”
Gary Brotman: “It’s funny when we actually focus on big companies, it’s helping the smaller developers as well. We started off with more of a proprietary stack when it came to programming for Snapdragon, specifically for running AI. But over time, and in the last couple of generations, we’ve added more tools. We are trying to strike a balance between high-level abstraction and ease of use, and lower-level access, which requires somebody to be much more savvy especially when it comes to dealing with some of our proprietary cores like the vector processor or the NPU. We see it evolving from a democratization standpoint. We have the basic building blocks like Hexagon and Qualcomm math libraries, but maybe a slightly higher level API that abstracts at least some of that heavy lifting, but gives enough of a flexibility to the developer to be able to use their own custom operators, or be able to tweak a little bit in terms of performance at the lower level. So the portfolio will continue to involve more tools, and certainly things like NN API where Onyx is an example for being able to basically say “here’s what you’re programming, what you’re expressing your network in.” So long as the hardware helps it, you’re good.
As I discussed in our presentation, we’re chargeable for a multi-OS panorama. There’s Home windows, there’s Linux, there’s Android, so it’s not nearly Android. Once we take a look at this, if we’re going to assemble some type of an API that’s going to be SoC, cross-SoC, or cross-platform from an OS standpoint, we now have to look and see how you can discover commonality in what we construct underneath the hood. The stack with libraries and operator help and having that be capable of plug into NN API or Home windows ML, for instance. However definitely, we’ve gone from the pendulum being over right here the place no one actually is aware of what to do, like actually, not understanding. “I don’t know what framework to use. Do I use TensorFlow, or should I use Caffe or Torch?” And never understanding what to do to optimize at the decrease degree. So everyone’s proud of an API name. Now, inside only a matter of a few years, it’s straightforward to go deeper. So the instruments are there, whether or not they’re widespread open supply instruments, and even in a portfolio like we provide or rivals supply, these instruments have gotten extra simply accessible and simpler to make use of.”
Mario Serrafero: “Speaking of developer communities. Last time we had mentioned one of the most mature communities that we have is the gaming community, and Qualcomm’s pretty well embedded in that. Now, we see that more than ever with the partnerships with the game engines that are being promoted and marketed. So we were talking about that in the context of AI and how it’s emerging there.”
Mishaal Rahman: “You were talking about how you wanted to invest more over the next 12 months. This was back during the last time we were here.”
Mario Serrafero: “In specifically the gaming developer community, kind of expanding upon that and what we see today.”
Gary Brotman: “I don’t remember the specific comment about investing in the gaming community, but if you look at a category that we saw driving the need for dedicated acceleration, and gaming is a component of this, but it’s not the primary use case necessarily—VR as an example. In a rich, immersive VR experience, every core is basically leveraged. You’re doing graphics processing on the GPU, visual processing on the vector processor, and the need to take one or many nets and run them separately on a dedicated accelerator without the worry of concurrency impact. That’s one of the reasons that drove us down the path of having dedicated acceleration. I don’t have a lot of information with respect to how AI is being leveraged in games today. There’s a lot of work with agents—developing agents to combat against or teach you.”
Mario Serrafero: “Like the traditional AI in games.”
Gary Brotman: “Exactly, right. But being more neural network based.”
Mario Serrafero: “Yeah, not the Minimax.”
Gary Brotman: “Part of Ziad’s responsibility too is driving XR strategy.”
Ziad Asghar: “XR clever, when you take a look at it as we speak, we have now launched new units which might be all-in-one HMDs with full 6DOF enablement. Units like the Oculus Quest that launched truly with the Snapdragon 835, so we’re beginning to get to an excellent level when it comes to truly harnessing the full capability of XR units. In the previous, a few of the units have been not likely giving that pristine expertise as a result of some individuals haven’t gotten the greatest expertise out of it. I feel XR is now doing nice. What we’re additionally taking a look at in the future because it combines with 5G, is it lets you now be capable of take your gadget that’s truly far more cellular which suggests you possibly can envision that you simply’re truly strolling on a road. After which having a hyperlink like 5G signifies that like the demo that Gary confirmed of Google Lens. Now think about that when you have been sporting some type of Google Glasses or one thing like that and you’re capable of truly usher in info all in the direction of what you’re taking a look at by way of your eyes, now you have got a use case that basically might be very compelling. I feel that’s the place the long-term funding that you simply’re speaking about, that’s sort of the course that it goes.
However proper now, we really feel we’re in an excellent state when it comes to XR and all the totally different corporations which have launched with XR. Oculus Go can also be based mostly on Snapdragon 820, so I feel we’re beginning to get to an excellent level the place individuals are choosing it up and doing a variety of issues with it. And the subsequent stage like I discussed is we begin to usher in 5G connectivity which we’ll do and then past that in fact AR and some issues that may even require far more when it comes to efficiency, but restricted on energy. And that’s going to be extraordinarily difficult, and I feel with what we talked about right now, Qualcomm might be the greatest when it comes to doing any of those use instances when it comes to energy. Should you take a look at graphics, should you benchmark any of the rivals you’ll see our performance-per-unit-power is best-in-class. And as a consequence of that, the thermals, the sustained efficiency is what issues in XR, and in that regard we’re actually forward—that’s the cause why individuals use us for XR.”
Mario Serrafero: “Since last year, we’ve seen the Hexagon 685 DSP finally hit the premium mid-range with the 710 and the proper mid-range with the 670 and 675. So now we’re getting the Hexagon Vector Extensions making their way downstream whereas other competitors are not quite doing that with their neural processing units. How do you see that extending the reach of these experiences, and I wanted to ask whether, in the past, you saw the performance discrepancies in AI make a difference at all? Because we still kind of are in the early adoption of AI.”
Ziad Asghar: “I take a look at the general roadmap. Should you’re in search of the pristine best-in-class efficiency, it’s going to be in the premium tier. What we’re doing is we’re selectively taking a few of the Hexagon capabilities and bringing it decrease. The primary AI engine, or the first Hexagon, was began with the Snapdragon 820. So we’ve introduced it right down to the Snapdragon 660 and into 670, and 710 has it additionally. So, our plan is to see the way it breaks into the potential experiences.
As an AI engine, we have now primary previous elements: CPU, GPUs, Hexagon tensor, Hexagon vector, and scalar. What we do is we selectively convey elements of that additional down into the roadmap as we see that these talents are coming down and going into decrease tier headsets. You will notice truly, as we go additional in the yr. you’ll see we’ll do extra of that. We launched Snapdragon 675 at the 4G/5G Summit. We talked about that coming down with the 675, and what you will notice is, as these use instances have gotten extra prevalent, as we confirmed with ArcSoft and all these different guys in the present day, we’ll truly deliver these capabilities decrease. In the decrease tier it is possible for you to to run that use case, however to have the ability to get that proper energy profile like I talked about earlier, if you wish to have that sustained efficiency, you need that specific block to be coming decrease. So once more, best-in-class efficiency might be up prime, however as you go decrease there will probably be an awesome degradation or gradation of…”
Mario Serrafero: “Gradient descent, you could say.” (spoken in jest)
Ziad Asghar: “Yeah, exactly. That’s kind of how we do with other technologies also on the roadmap and AI is not going to be very different in that sense. It is probably one difference, perhaps where you’re coming from, as it is probably coming down faster through other technologies that we have brought down in the roadmap, so that observation I’d agree with.”
Should you’re enthusiastic about studying extra about AI in Qualcomm’s cellular platforms, we advocate studying our interview from final yr with Gary Brotman. You can begin with half 1 of our interview or go to half 2.
All graphics proven on this interview are sourced from Gary Brotman’s presentation throughout the Snapdragon Tech Summit. You possibly can view the slides right here.
Need extra posts like this delivered to your inbox? Enter your e mail to be subscribed to our publication.
fbq(‘init’, ‘403489180002579’); // Insert your pixel ID right here.