Intel® Math
Kernel Library (Intel® MKL) 10.3 Release Notes
This document provides a general summary of new features and important notes
about the Intel® Math Kernel Library (Intel® MKL) software product.
Please see the following links to the online resources and documents for the
latest information regarding Intel MKL:
Links to documentation, help, and code
samples can be found on the main Intel MKL product
page. For technical support visit the Intel
MKL technical support forum and review the articles in the Intel
MKL knowledgebase.
Please register your
product using your preferred email address. This helps Intel recognize
you as a valued customer in the support forum and insures that you will be
notified of product updates. You can read Intel's
Online Privacy Notice Summary if you have any questions regarding the
use of your email address for software product registration.
What's New in Intel® MKL 10.3 update 8
- Data Fitting component: Added a set of new data fitting functions covering
one-dimensional algorithms for vector spline construction, cell or bin search,
and evaluation, differentiation, and integration of the spline interpolants.
Includes support for:
- Linear, quadratic, cubic, step-wise const, and user-defined splines
- Cell search with configuration parameters for optimal performance
- User-defined interpolation and extrapolation
- Vector-valued functions
- Column- and row-major storage formats
- Sparse BLAS: Improved compressed sparse row matrix-vector multiply
(?CSRMV) performance for very sparse matrices on high core counts supporting
Intel Advanced Vector Extensions (AVX)
- FFTs: Improved the performance of the 1D double precision FFTs on systems
supporting Intel AVX
- Statistics functions: Improved the performance and scalability for
computing the Variance-Covariance and Correlation matrices (FAST method) on
Intel® Core processors
- Added a Microsoft* Visual Studio* project tool for building custom DLLs
from static library files
- Bug
fixes
What's New in Intel® MKL 10.3 update 7
- BLAS: Improved DSYRK/SSYRK threaded performance for small output matrices
and large outer products (i.e., rectangular input matrices), on all recent
Intel® Xeon® processors
- BLAS: Improved ?GEMM performance for small problems (<10) where beta =1
on all recent Intel Xeon processors
- BLAS: Improved DSCAL performance for small problems and for cases where
INCX=1 on 32-bit programs running on Intel Xeon processors 5500, 5600, and
7500 series
- BLAS-like extensions: Improved threading and cache utilization of in-place
transposition of square matrices
- PARDISO: Introduced an independent threading control for PARDISO; use
MKL_DOMAIN_PARDISO with the mkl_domain_set_num_threads() function
- Poisson Library: Added support for 2D and 3D periodic boundary conditions
- Included the Link Line Advisor in the documentation directory
- Added a command line link tool for use with scripting tools such as
libtool
- Added C header files with stdcall prototypes for functions in the
following components: BLAS, Sparse BLAS, LAPACK, PARDISO/DSS, RCI Iterative
Solvers, Vector Mathematical Functions, Vector Statistical Functions, and the
support functions
- Changed the names of constants used to specify the domain in the
mkl_domain_set_num_threads() function (e.g., MKL_BLAS has becomes
MKL_DOMAIN_BLAS); the old names still exist with the exception of MKL_PARDISO
- Bug
fixes
What's New in Intel® MKL 10.3 update 6
- Sparse BLAS: Added a new option to the mkl_?csrbsr converter function
allowing detection and removal of zero elements when converting from the BSR
format to the CSR format
- Changed DLL loading behavior on Windows*: Intel MKL DLLs can no longer be
in separate directories on the PATH—they must all be in the same directory
with the executable or in a directory specified in the PATH environment
variable
- Bug
fixes
What's New in Intel® MKL 10.3 update 5
- BLAS: Improved performance: {S,C,Z}TRSM for processors with Intel®
Advanced Vector Extensions (Intel® AVX); {S,D}GEM2VU for processors with Intel
AVX as well as the Intel® Core™ i7 processor and the Intel® Xeon® processor
5500 series
- BLAS: Improved scaling: ?TRMV for large matrices on all architectures;
DGEMM for odd numbers of threads on Intel® Xeon® processor 5400 series
- LAPACK: Included LAPACK 3.3.1 extensions and the respective LAPACKE
interfaces
- LAPACK: Improved the performance of ?SYGST and ?HEGST used in generalized
eigenvalue problems
- LAPACK: Improved the performance of the inverse of an LU factored matrix
(?GETRI)
- PARDISO: Added transpose and conjugate transpose solve capability
(ATx=b and AHx=b); facilitates compressed sparse column
(CSC) format support
- PARDISO: Improve out-of-core PARDISO performance when the memory
requirements slightly exceed available memory using
MKL_PARDISO_OOC_MAX_SWAP_SIZE environment variable and in-core PARDISO
- Optimization Solvers: Added Inf and NaN checks in the RCI Trust-Region
solvers
- FFTs: Improved the performance of 3D FFTs on small cubes from 2x2x2 to
10x10x10 for all supported precisions and types on all Intel® processors
supporting Intel® SSE3 and later
- FFT examples: Re-designed example programs to cover common use cases for
Intel MKL DFTI and FFTW
- VSL: Improved the performance of the single precision MT19937 and MT2203
basic random number generators on the Intel® Core™ i7-2600 processor on 64-bit
operating systems
- VSL: Improved the performance of the integer version of the SOBOL
quasi-random number generator on the Intel® Core™ i7-2600 processor and Intel®
Xeon® processor 5400 series
What's New in Intel® MKL 10.3 update 4
- BLAS: Improved DTRMM performance on Intel® Xeon® processors 5400 and later
- BLAS: Improved DTRSM performance on all 64-bit enabled processors,
especially processors with Intel® Advanced Vector Extensions (Intel® AVX)
- LAPACK: Incorporated bug fixes from the LAPACK 3.3.1 release
- OOC PARDISO: Improved the estimate of the amount of memory needed in
out-of-core operation
- FFT: Improved 1D real FFT scaling through improved threading
- FFT: Updated C and Fortran FFT examples to use the new single dynamic
library linking model
- VML: Improved performance of the single precision Enhanced Performance
version of the real Hypot and complex Abs functions and of the complex Arg,
Div, Mul, MulByConj functions for all accuracy modes on Intel® Xeon®
processors 5600 and 7500 series, and the Intel® Core™ i7-2600 processor
- Service functions: Improvements and additions to the Intel MKL service
functions (see the online
release notes for more information)
- Bug
fixes
What's New in Intel® MKL 10.3 update 3
- BLAS: Improved multi-threaded performance of DSYRK, DTRSM, and DGEMM on
Intel® Xeon® processor 5400 series running 32-bit Windows*
- LAPACK: Implemented LAPACK 3.3 from netlib including Cosine-Sine
decomposition, improved linear equations solvers for symmetric and Hermitian
matrices and auxiliary functions
- PARDISO: 0-based permutation vectors are now allowed at input
- PARDISO: Documentation for the pardisoinit() routine
- PARDISO: Improved performance of serial PARDISO with multiple right-hand
sides (RHS)
- PARDISO: Independent control for parallelism in the solve step for
improved performance on small matrices—see description of iparm(25)
- PARDISO: Reduced backward substitution—allows partial solution computation
for a full RHS—see description of iparm(31)
- FFT: Implemented Real FFT transforms for 3 to 7 dimensions
- FFT: Parallelized multi-dimensional complex transforms using split-complex
data represented as two real arrays
- Cluster FFTs: Extended FORTRAN 90 interface to real-to-complex transforms
and included new examples
- VML: Added new complex Pack/Unpack functions and real Gamma/LGamma
functions
- VML: Improved performance on Intel® Xeon® processor 5600 series and
processors supporting Intel® Advanced Vector Extensions (Intel® AVX) for the
following: all functions when operating on short vectors (<100), all
functions when operating on unaligned input vectors, the sPow2o3 function, and
the enhanced performance (EP) version of complex Add and Sub
- VSL: Functions for saving/restoring random number generator (RNG) streams
to/from memory
- VSL: Added new UniformBits32 and UniformBits64 functions
- VSL: Extended the number of unique streams supported by the MT2203 Basic
RNG from 1024 to 6024
- Bug
fixes
What's New in Intel® MKL 10.3 update 2
- BLAS: Improved performance of transposition functions on the Intel® Xeon®
processor 5600 series
- BLAS: Added examples for transposition routines
- FFT: Added Fortran examples showing how to reduce application footprint by
linking only functions with the desired precision
- FFT: Added check for stride consistency on in-place real transforms with
CCE storage
- FFT: Expanded threading to new cases for multi-dimensional transforms
- VSL: Improved performance of Multivariate Gaussian random number generator
for single- and double-precision on 4-core Intel® Xeon® processors 5500 series
- VML: Improved performance of in-place operation of Add, Mul, and Sub
functions on the Intel® Xeon® processor 5500 series
- Bug
fixes
What's New in Intel® MKL 10.3 update 1
- PARDISO/DSS: Added true F90 overloaded API (see the Intel MKL reference
manual for more information)
- PARDISO: Improved the statistical reporting to be more reader friendly
- Sparse BLAS: Improved performance of ?BSRMM functions on Intel® Core™ i7
processors
- FFTs: Support for negative strides
- FFT examples: Added examples for split-complex FFTs in C and Fortran using
both the DFTI and FFTW3 interfaces
- VML: Improved performance of real in-place Add/Sub/Mul/Sqr functions on
systems supporting SSE2 and SSE3
- Poisson Library: Changed the default behavior of the Poisson library
functions from sequential to threaded operation
- Bug
fixes
What's New in Intel® MKL 10.3
- BLAS
- New functions for computing 2 matrix-vector products at once:
[D/S]GEM2VU, [Z/C]GEM2VC
- New functions for computing mixed precision general matrix-vector
products: [DZ/SC]GEMV
- New function for computing the sum of two scaled vectors: *AXPBY
- Intel® AVX optimizations in key functions: SMP LINPACK, level 3 BLAS,
DDOT, DAXPY
- LAPACK
- New C interfaces for LAPACK supporting row-major ordering
- Integrated Netlib LAPACK 3.2.2 including one new computational routine
(*GEQRFP) and two new auxiliary routines (*GEQR2P and *LARFGP) and the
earlier LAPACK 3.2.1 update
- Intel® AVX optimizations in key functions: DGETRF, DPOTRF, DGEQRF
- PARDISO
- Improved performance of factor and solve steps in multi-core
environments
- Introduced the ability to solve for sparse right-hand sides and perform
partial solves—produces partial solution vector
- Improved performance of the out-of-core (OOC) factorization step
- Support for zero-based (C-style) array indexing
- Zeros on the diagonal of the matrix are no longer required in sparse
data structures for symmetric matrices
- New ILP64 PARDISO interface allows the use of both LP64 and ILP64
versions when linked to the LP64 libraries
- The memory required for storing files on the disk in OOC mode can now be
estimated just after reordering
- Sparse BLAS
- Format conversion functions now support all data types (single and
double precision for real and complex data) and can return sorted or
unsorted arrays
- FFTs
- New MPI FFTW 3.3alpha1 wrappers cover new cluster functionality
- Improved load-balancing of cluster FFTs provides improved performance
- Intel AVX optimizations in all 1D/2D/3D FFTs
- Improved performance of 2D and 3D mixed-radix FFTs for single and double
precision data for all systems supporting the SSE4.2 instruction set
- Support for split-complex data represented as two real arrays introduced
for 2D/3D FFTs
- Support for 1D complex-to-complex transforms of large prime lengths
- Introduced Hybrid parallelism (MPI + OpenMP*) on cluster 1D complex
transforms and increased performance on vector lengths which are a multiple
of the number of MPI processes
- VML
- A new function for computing (ax+b)/(cy+d) where a, b, c, and d are
scalars, and x and y are real vectors: v[s/d]LinearFrac()
- Intel AVX optimizations for real functions
- A new mode for setting denormals to zero, overflow support for complex
vectors, and for every VML function a new function with an additional
parameter for setting the accuracy mode
- VSL
- A set of new Summary Statistics functions was added covering basic
statistics, covariance and correlation, pooled, group, partial, and robust
covariance/correlation, quantiles and streaming quantiles, outliers
detection algorithm, and missing values support
- Performance optimized algorithms: MI algorithm for support of missing
values, TBS algorithm for computation of robust covariance, BACON
algorithm for detection of outliers, ZW algorithm for computation of
quantiles (streaming data case), and 1PASS algorithm for computation of
pooled covariance
- Improved performance of SFMT19937 Basic Random Number Generator (BRNG)
- Intel® AVX optimizations: MT19937 and MT2203 BRNGs
- Documentation: Product documentation is available in the Microsoft Help
Viewer* 1.x format that integrates with Microsoft Visual Studio* 2010
- Added runtime dispatching dynamic libraries allowing link to a single
interface library which loads dependent libraries dynamically at runtime
depending on runtime CPU detection and/or library function calls
- The custom dynamic libraries builder now uses the runtime dispatching
dynamic libraries on the Linux* and Mac OS* X operating systems
- A new directory structure has been established to simplify integration of
Intel MKL with the Intel® Parallel Studio XE family of products and
directories formerly designated as "em64t" are now designated by the "intel64"
tag
- Intel® Itanium® architecture (IA-64) support is not included in this
release. Intel® MKL 10.2 is the latest release for IA-64
- The sparse solver functionality has been fully integrated into the core
Intel MKL libraries and the libraries with "solver" in the filename have been
removed from the product
Notices
- The Intel MKL GNU Multiple Precision* (GMP) function interfaces will be
removed in a future library release.
- The timing function
mkl_set_cpu_frequency() is deprecated.
Please use mkl_get_max_cpu_frequency(),
mkl_get_clocks_frequency(), and
mkl_get_cpu_frequency() as described in the Intel® MKL Reference
Manual.
- The MKL_PARDISO constant defined to specify the PARDISO domain should no
longer be used with the mkl_domain_set_num_threads() function; please use
MKL_DOMAIN_PARDISO instead.
- The static OpenMP* runtime library has been deprecated and there are plans
to remove this in a future version of Intel MKL for Windows*.
Product Contents
The Intel® Math Kernel Library (Intel® MKL) version 10.3 and updates consists
of three installation packages: one package for both IA-32 and Intel® 64
architectures, one for IA-32 only, and one for Intel® 64 architecture only.
Technical Support
If you did not register your Intel software product during installation,
please do so now at the Intel® Software Development
Products Registration Center. Registration entitles you to free
technical support, product updates and upgrades for the duration of the support
term.
For general information about Intel technical support, product updates, user
forums, FAQs, tips and tricks and other support questions, please visit http://www.intel.com/software/products/support/.
Note: If your distributor provides technical support
for this product, please contact them rather than Intel.
For technical information about Intel MKL, including FAQ's, tips and tricks,
and other support information, please visit the Intel MKL forum: http://software.intel.com/en-us/forums/intel-math-kernel-library/
and browse the Intel MKL knowledge base: http://software.intel.com/en-us/articles/intel-mkl-kb/all/.
Attributions
As referenced in the End User License Agreement, attribution requires, at a
minimum, prominently displaying the full Intel product name (e.g. "Intel® Math
Kernel Library") and providing a link/URL to the Intel® MKL homepage (http://www.intel.com/software/products/mkl)
in both the product documentation and website.
The original versions of the BLAS from which that part of Intel® MKL was
derived can be obtained from http://www.netlib.org/blas/index.html.
The original versions of LAPACK from which that part of Intel® MKL was
derived can be obtained from http://www.netlib.org/lapack/index.html.
The authors of LAPACK are E. Anderson, Z. Bai, C. Bischof, S. Blackford, J.
Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and
D. Sorensen. Our FORTRAN 90/95 interfaces to LAPACK are similar to those in the
LAPACK95 package at http://www.netlib.org/lapack95/index.html.
All interfaces are provided for pure procedures.
The original versions of ScaLAPACK from which that part of Intel® MKL was
derived can be obtained from http://www.netlib.org/scalapack/index.html.
The authors of ScaLAPACK are L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo,
J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K.
Stanley, D. Walker, and R. C. Whaley.
PARDISO in Intel® MKL is compliant with the 3.2 release of PARDISO that is
freely distributed by the University of Basel. It can be obtained at http://www.pardiso-project.org/.
Some FFT functions in this release of Intel® MKL have been generated by the
SPIRAL software generation system (http://www.spiral.net/) under license from
Carnegie Mellon University. The Authors of SPIRAL are Markus Puschel, Jose
Moura, Jeremy Johnson, David Padua, Manuela Veloso, Bryan Singer, Jianxin Xiong,
Franz Franchetti, Aca Gacic, Yevgen Voronenko, Kang Chen, Robert W. Johnson, and
Nick Rizzolo.
License Definitions
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL(R)
PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY
INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN
INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO
LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY,
RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES
RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT
OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT
DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL
PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.
Intel may make changes to specifications and product descriptions at any
time, without notice. Designers must not rely on the absence or characteristics
of any features or instructions marked "reserved" or "undefined." Intel reserves
these for future definition and shall have no responsibility whatsoever for
conflicts or incompatibilities arising from future changes to them. The
information here is subject to change without notice. Do not finalize a design
with this information.
The products described in this document may contain design defects or errors
known as errata which may cause the product to deviate from published
specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the
latest specifications and before placing your product order.
Copies of documents which have an order number and are referenced in this
document, or other Intel literature, may be obtained by calling 1-800-548-4725,
or by visiting Intel's Web Site.
Intel processor numbers are not a measure of performance. Processor numbers
differentiate features within each processor family, not across different
processor families. See http://www.intel.com/products/processor_number for
details.
BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Atom, Centrino Atom
Inside, Centrino Inside, Centrino logo, Core Inside, FlashFile, i960, InstantIP,
Intel, Intel logo, Intel386, Intel486, IntelDX2, IntelDX4, IntelSX2, Intel Atom,
Intel Atom Inside, Intel Core, Intel Inside, Intel Inside logo, Intel. Leap
ahead., Intel. Leap ahead. logo, Intel NetBurst, Intel NetMerge, Intel
NetStructure, Intel SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel
Viiv, Intel vPro, Intel XScale, Itanium, Itanium Inside, MCS, MMX, Oplus,
OverDrive, PDCharm, Pentium, Pentium Inside, skoool, Sound Mark, The Journey
Inside, Viiv Inside, vPro Inside, VTune, Xeon, and Xeon Inside are trademarks of
Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others.
| Optimization Notice |
|
Intel's compilers may or may not optimize to the same degree for
non-Intel microprocessors for optimizations that are not unique to Intel
microprocessors. These optimizations include SSE2, SSE3, and SSSE3
instruction sets and other optimizations. Intel does not guarantee the
availability, functionality, or effectiveness of any optimization on
microprocessors not manufactured by Intel. Microprocessor-dependent
optimizations in this product are intended for use with Intel
microprocessors. Certain optimizations not specific to Intel
microarchitecture are reserved for Intel microprocessors. Please refer to
the applicable product User and Reference Guides for more information
regarding the specific instruction sets covered by this notice.
Notice revision #20110804 |
Copyright © 2002-2011, Intel Corporation. All rights
reserved.