Production-Ready NPU Architecture.
Not a Research Project.
34,726 lines of verified RTL. 6 architecture phases complete. Power characterized at 0.21 pJ/MAC.
Everything we claim is measured, not modeled.
The Problem We Solved
Most NPU IP falls into two categories: academic research that needs 18+ months of productization work, or vendor IP that locks you into their ecosystem with royalties and minimum commitments.
We built something different: an NPU architecture that's actually ready for production, with licensing terms that make sense for teams that want to own their silicon destiny.
What we prioritized:
- Verification completeness over feature count
- Measured performance over projected specs
- Integration simplicity over architectural elegance
- Honest documentation over marketing materials
The Result:
An NPU that can go from license signing to synthesis in weeks, not months. With support from the engineer who designed it, not a ticket queue.
Measured Results
Not projections. Not models. Actual measurements from synthesis and power analysis.
| Metric | NeuraEdge NPU | Context |
|---|---|---|
| PE Energy Efficiency | 0.21 pJ/MAC (8-bit) | At 0.9V, TT corner, 1GHz target |
| Energy per Operation | 0.21 pJ | Single MAC operation, register-to-register |
| System-Level Projection | < 2 TOPS/W | Conservative estimate including memory hierarchy |
| Peak Throughput | 1 TOPS per tile | 32×32 PE array at 1GHz, 8-bit operations |
| Sparsity Acceleration | Up to 4× effective throughput | With 75% weight sparsity, typical for pruned models |
| Power Reduction | 75%+ dynamic power savings | During sparse operations via clock gating |
Important Note: These numbers are from our characterization at TSMC 28nm. Your results will vary based on your target process, operating conditions, and integration choices. We'll help you develop realistic projections for your specific situation during technical discussions.
Engineering Rigor
The difference between IP that works in simulation and IP that tapes out successfully.
| Aspect | NeuraEdge Approach | Industry Typical |
|---|---|---|
| Verification Approach | Formal property checking + simulation | Simulation only |
| Coverage Target | > 95% functional coverage | "Good enough" coverage |
| Power Analysis | PrimeTime PX with real switching activity | Estimated from synthesis |
| Timing Closure | Multi-corner, multi-mode signoff | Single corner, hope for the best |
| Documentation | Complete integration guides, register maps | README and "ask if you have questions" |
Architecture Philosophy
Design decisions optimized for production, not publications.
Tile-Based Processing Element Array
32×32 PE array with local accumulation, designed for efficient matrix operations with minimal data movement.
Hardware-Accelerated Sparsity
Zero-skip logic that actually works—verified across 1000+ test patterns with 4× throughput improvement at 75% sparsity.
Deterministic Interconnect
Weight distribution and result collection networks with guaranteed timing, not statistical arbitration.
Production Control Plane
APB interface with full register access, interrupt management, and runtime reconfiguration.
Complete Memory Subsystem
Three-tier hierarchy with unified buffer, weight buffer, and output accumulation. No external memory controller required for inference.
Competitive Position
Honest assessment of where we fit.
| Category | NeuraEdge | Academic IP | Big Vendor IP |
|---|---|---|---|
| Architecture Maturity | Production-ready RTL | Research prototype | Battle-tested |
| Licensing Terms | Perpetual, no royalties | Often unclear | Royalties + minimums |
| Support Model | Direct engineer access | Best effort | Tiered support tickets |
| Customization | Full source + modification rights | Usually available | Configuration only |
| Integration Risk | Proven methodology | High (you figure it out) | Low (if you follow their rules) |
What Customers Receive
Complete package for production integration.
| Deliverable | Details |
|---|---|
| Complete RTL | All Verilog/SystemVerilog source files, synthesizable and verified |
| Verification Suite | UVM testbenches, coverage models, regression scripts |
| Synthesis Scripts | Synopsys DC scripts with proven constraints and optimization settings |
| Integration Package | Example SoC integration, bus adapters, interrupt controllers |
| Technical Documentation | Architecture specification, integration guide, register reference |
| Engineering Support | Direct access to the architect who designed it (see Support page for tiers) |