====== ifort for linux ======

  * linux版は非商用に限り無償（**アカデミックの用途に使う場合はアカデミックライセンスが必要**）
  * ライセンスの発行のために登録が必要。
  * MKLも一緒に入ってくる。
 

===== インストール方法 =====
==== ubuntu 10.10(64bit)にインストールしたときのメモ ====


  * openjdk-6-jreやia32-libsをあらかじめ入れておけば大丈夫そう．FIXME
  * [[http://software.intel.com/en-us/articles/non-commercial-software-download/]]にアクセス．（あるいは，ifort non-commercialなんかでgoogle検索）
  * 非商用版を使うためにメールアドレスが必要なら入力．
  * Intel® Fortran Composer XE 2011 for Linuxをダウンロード
<code bash>
tar xzvf l_fcompxe_intel64_2011.1.107.tgz
cd l_fcompxe_intel64_2011.1.107/
sudo ./install.sh 
</code>

bash使いの人は~/.bashrcに以下を記述．
<code>
source /opt/intel/bin/compilervars.sh intel64
</code>
==== ubuntu 9.10 on Xeon にintel fortran compilerをインストールしたときのメモ ====

インストールに必要なopenjdk-6-jre，ia32-libsをインストール．
<code bash>
sudo apt-get install openjdk-6-jre ia32-libs
</code>
openjdk-6-jreにしたのは気分．sunのjavaでも可．

http://bootstrapping.wordpress.com/2009/11/25/missing-libstdc-so-5-in-ubuntu-9-10-karmic/
に書いてあるとおりにlibstdc++5をインストール．
<code bash> 
wget http://security.ubuntu.com/ubuntu/pool/universe/i/ia32-libs/ia32-libs_2.7ubuntu6.1_amd64.deb
dpkg-deb -x ia32-libs_2.7ubuntu6.1_amd64.deb ia32-libs
sudo cp ia32-libs/usr/lib32/libstdc++.so.5.0.7 /usr/lib32/
cd /usr/lib32
sudo ln -s libstdc++.so.5.0.7 libstdc++.so.5
</code>
これがないとifortのインストール途中で下記のような文句を言われる．
<code>
32-bit libraries not found on this system.
This product release requires the presence of 32-bit compatibility libraries
when running on Intel(R) 64 architecture systems. One or more of these libraries
could not be found:
    libstdc++
    libstdc++5
    glibc
    libgcc
Without these libraries, the compiler will not function properly.  Please refer 
to Release Notes for more information.
</code>

http://software.intel.com/en-us/articles/non-commercial-software-download/
 よりソースをダウンロード．Xeonの場合はintel64を選択。
<code bash>
tar xzvf l_cprof_p_11.1.064_intel64.tgz
cd l_cprof_p_11.1.064_intel64
sudo ./install.sh
</code>
選択はデフォルトでOK．


入ったと思いきや、
<code>
/opt/intel/Compiler/11.1/069/bin/intel64/fortcom: error while loading shared libraries: libstdc++.so.5: cannot open shared object file: No such file or directory
ifort: error #10273: /opt/intel/Compiler/11.1/069/bin/intel64/fortcom の致命的なエラー、0x7f で終了しました。
</code>
というエラーが発生。すべての9.10で起きたわけではないので、ハードウェア依存??FIXME

仕方がないので、他の方法でlibstdc++5を再インストール
http://hsmak.wordpress.com/2009/12/01/how-to-fix-libstdc5-dependency-problem-in-ubuntu-9-10/
を参考に
<code bash>
sudo rm /usr/lib32/libstdc++.so.5*
wget http://ftp.riken.go.jp/pub/Linux/ubuntu/pool/universe/g/gcc-3.3/libstdc++5_3.3.6-17ubuntu1_amd64.deb
sudo dpkg -i libstdc++5_3.3.6-17ubuntu1_amd64.deb
</code>

==== ubuntu 9.04にintel fortranコンパイラをインストール ====


 === 1.インテルのページからifortのパッケージをダウンロード ===
    * [[wpjp>IA32]],[[wpjp>INTEL64]],[[wpjp>IA64]]の中からシステムに応じたものを選択する。core2はintel64, core soloはia32などFIXME。
    * [[http://software.intel.com/en-us/articles/non-commercial-software-download/]]
 === 2.install.sh の実行 ==
ここでは、ver11.0.074 intel64版をインストール。
<code>
$ tar xzvf l_cprof_p_11.0.074_intel64.tgz
$ cd l_cprof_p_11.0.074_intel64
$ sudo ./install.sh
</code>
言われるがままに進む。途中パスワードの認証（activation）が必要。
インストール直前にbinutilsあたりでヴァージョンの違いについて文句を言われる。
おそらくifortが要求するヴァージョンよりもubuntuにインストールされているパッケージのヴァージョンの方が新しいので問題ないFIXME。
パッケージが不足している場合インストール中にメッセージが出る。 
メッセージを読んで必要なものを入れる((sun-java6-jreはjavaを要求されるからインストールしてみた。
正しいかどうか不明FIXME。binutilsは最初から入ってるかも。))。
<code>
$ sudo apt-get install libstdc++5 g++ binutils sun-java6-jre
</code>
64bitの場合、32bitがどうのこうのというエラーが出るかもしれないので、ia32-libsもインストール。
<code>
$ sudo apt-get install ia32-libs
</code>

=== 3.環境変数の設定を行う。 ===

===== 環境変数の設定 =====
まず、ifortのパスを通したり、ライブラリのパスを通すため、環境変数の設定を行う。
=== bashの場合 ===
使用前に端末で以下を入力(([ia32|intel64|ia64]は[[google.jp>正規表現]]))。
<code>
$ . /opt/intel/Compiler/11.0/074/bin/ifortvars.sh [ia32|intel64|ia64]
</code>
あるいは、~/.bashrcの最後の行に以下を追記。
<code>
$ . /opt/intel/Compiler/11.0/074/bin/ifortvars.sh [ia32|intel64|ia64]
</code>
=== csh系の場合 ===
<code>
$ . /opt/intel/Compiler/11.0/074/bin/ifortvars.csh [ia32|intel64|ia64]
</code>
とするか、~/.cshrcを編集FIXME
=== zshの場合 ===
<code>
$ . /opt/intel/Compiler/11.0/074/bin/ifortvars.sh [ia32|intel64|ia64]
</code>
とすると、
<code>
/opt/intel/Compiler/11.0/074/bin/ifortvars.sh:82: = not found
</code>
なんてエラーが出る。
おそらくbashとzshの文法の違い。
エラーが出たシェルの == を = に書き直せば、うまく動作する。
例えば、intel64の場合、/opt/intel/Compiler/11.0/074/bin/ifortvars.shと/opt/intel/Compiler/11.0/074/bin/intel64/ifortvars_intel64.shの2つを書き換えればよい。複数のユーザーで使用している場合は、*.shを*.zshにコピーするなどの変更を加えるなどして、zsh専用のファイルを作るのがお行儀が良いかも。
以下の2ファイルを作成すればよい。
  * /opt/intel/Compiler/11.0/074/bin/ifortvars.zsh
  * /opt/intel/Compiler/11.0/074/bin/intel64/ifortvars_intel64.zsh

こんな感じかしら(32bit版の場合)FIXME
<code>
$ sudo sh -c "sed 's/==/=/g' /opt/intel/Compiler/11.0/074/bin/ifortvars.sh > /opt/intel/Compiler/11.0/074/bin/ifortvars.zsh"\\
$ sudo sh -c "sed 's/==/=/g' /opt/intel/Compiler/11.0/074/bin/ia32/ifortvars_ia32.sh > /opt/intel/Compiler/11.0/074/bin/ia32/ifortvars_ia32.zsh"\\
$ sudo sh -c "sed -i -e 's/.sh/.zsh/g' /opt/intel/Compiler/11.0/074/bin/ifortvars.zsh"
</code>
sudo sh -c " "はsudoでリダイレクトするときの常套手段。

===== コンパイラオプション =====
[[http://accc.riken.jp/HPC/training/text.html ]]

[[http://www.k.mei.titech.ac.jp/~stamura/NumericalComputation-Tips.html]]
これくらいオプションをつけて実行すれば，だいたいエラーは検出されそう．
<code bash>
ifort -check all -warn declarations -CB -fpe0 -traceback
</code>

[[fortran#コンパイラオプション]]も参照のこと

===== デフォルトのスタックサイズが小さすぎる =====
デフォルトのスタックサイズが小さすぎて、-openmpをやるときはスタックサイズを増やしてあげないと、頻繁にセグ落ちする。
<code fortran>
!$OMP parallel
write(*,*) KMP_GET_STACKSIZE_S()
!$OMP end parallel
</code>
とやると、各スレッドのスタックサイズを返す。これを増やす場合は、最初の!$OMPの前に、
<code fortran>
CALL KMP_SET_STACKSIZE_S(size)
</code>
とやれば良い。sizeは整数型の変数。所望のスタックサイズ(byte)を書けばよい。
===== 改行の抑制 =====
ifortでは出力時に勝手に改行する仕様になっている。
改行を抑制するためには、Format文を使用すればよい。

適当なやり方。
<code fortran>
    write(*,'(100f)') a(:)
</code>

きちんとしたやり方。以下2chより引用。

[[http://pc12.2ch.net/test/read.cgi/tech/1163319215/532]]
<code>
532 名前：デフォルトの名無しさん [sage]： 2009/03/27(金) 05:59:41  
亀だけど、Ifortなら<>がお勧め。
多次元配列の最初の数を入れることが多いです
例
program main
implicit none
integer,parameter :: num = 9
integer :: ii,jj
real :: arry(num,num)
do ii=1,num
do jj = 1,num
arry(ii,jj) = ii*jj
enddo
enddo

write(6,'(<num>F)') arry
end program 
</code>

====== MKL ======

詳しくは、
[[http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/SGIAltix/Doc/mkl10/doc/userguide.pdf|Intel(R) Math Kernel Library for the Linux* OS User's Guide]]を参照。

===== 概要 =====
^パッケージ名^^
|[[ifort#lapack]] |密行列やバンド行列を直説法で解く |
|[[ifort#lapack95]] |密行列やバンド行列を直説法で解く |
|[[ifort#blas95]]   |マトベク演算や内積演算を行う     |
|[[ifort#sparse_blas95]]   |疎行列のマトベク演算行う     |
|[[ifort#pardiso]]  |疎行列を直説法で解く。（条件数の大きい問題向き）   |
|[[ifort#fgmres]]   |疎行列をFlexibleGMRESで解く。    |


MKLのコンパイラオプションは以下のとおり。
<file>
-L<MKL path> -I<MKL path>
[-lmkl_lapack95] [-lmkl_blas95]
[cluster components]
[{-lmkl_{intel, intel_ilp64, intel_lp64, intel_sp2dp, gf, gf_ilp64, gf_lp64}]
[-lmkl_{intel_thread, sequential}]
[{-lmkl_solver, -lmkl_solver_lp64, -lmkl_solver_ilp64}]
{{[-lmkl_lapack] -lmkl_{ia32, em64t, ipf}},-lmkl_core}}
[{-lguide, -liomp5}] [-lpthread] [-lm]
</file>
[ ]はあってもなくても良い、{ }はどちらか選択という意味。
-L<MKL path> -I<MKL path>は pathが通っていれば明示する必要なし。

[[http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/SGIAltix/Doc/mkl10/doc/userguide.pdf|Intel(R) Math Kernel Library for the Linux* OS User's Guide]]に以下のような記述あり。
<file>
You are strongly encouraged to dynamically link in Intel® Legacy OpenMP* run-time
library libguide and Intel® Compatibility OpenMP* run-time library libiomp. Linking to
static OpenMP run-time library is not recommended, as it is very easy with layered
software to link in more than one copy of the library. This causes performance problems
(too many threads) and may cause correctness problems if more than one copy is
initialized.
You are advised to link with libguide and libiomp dynamically even if other libraries are
linked statically.
</file>
<file>
The second relevant component is the Compiler Support RTL Layer. Prior to Intel MKL 10.0,
this layer only included the Intel® Legacy OpenMP* run-time compiler library libguide.
Now you have a new choice to use the Intel® Compatibility OpenMP* run-time compiler
library libiomp. The Compatibility library provides support for one additional threading
compiler on Linux (gnu). That is, a program threaded with a gnu compiler can safely be
linked with Intel MKL and libiomp and execute efficiently and effectively.
</file>
libguideとlibiompはどちらかを選べば良いのだが、並列化をするのであれば、libiompを選択した方がよいかも。FIXME


===== lapack =====
  * ifortと一緒にインストールされる．
  * 環境変数の設定をしておけばパスを通す必要はない．

ifort Version 12.0では以下のようにすれば動いた．FIXME
<code bash>
ifort hoge.f90 -lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread
</code>

ver 11くらい
<code>
ifort file.f90  -lmkl -lmkl_lapack  -lmkl_em64t -lguide -lmkl_solver
</code>
===== lapack95 =====

  * [[http://www.netlib.org/lapack95/|lapack95]]
  * [[http://www.netlib.org/lapack95/lug95/|lapack95のhtmlドキュメント]]
  * 線形方程式を解くパッケージ。
  * 固有値の計算や特異値分解も可能。
  * mklのlapack95はlapack90のラッパーらしい。
  * 自分でmakeする必要あり。

==== ライブラリの作成 ====
  * libem64t, intel64の部分は環境に応じて適当に。オプションも適当に。デフォルトで十分そうなので、オプションはなくても良さそう。
<code>
$ cd /opt/intel/Compiler/11.0/074/mkl/examples/lapack95/
$ sudo gnome-terminal
</code>
新しく開いた端末で作業((sudo make libem64tとすると、「ifort: コマンドが見つかりませんでした」と怒られる。そのため、スーパーユーザー用の端末を開き、パスを通してから作業。sudoで端末を開くのは、ちょっとお行儀悪いかも。sudoに環境変数を引き継ぐやり方もありそう…))。
<code> 
# . /opt/intel/Compiler/11.0/074/bin/ifortvars.sh intel64
# make libem64t 
# exit
</code>

==== サンプルコード　====

コードのサンプルは /opt/intel/mkl/10.1.0.015/examples/lapack95/source/ にあり。ここではgesv.f90を参考にしながら説明。
=== コンパイルオプション(intel64) ===

(2010.06.22追記）最近のver.はライブラリの場所が変わったようだ．（というか今まで間違ったファイルを使ってだけかも...．）
Fortran90でLapack95使ったときのコンパイルコマンド．file.f90のところは適当に修正して下さい．
<code bash>
ifort file.f90 -L/opt/intel/Compiler/11.1/069/mkl/lib/em64t/ -I /opt/intel/Compiler/11.1/069/mkl/include/em64t/lp64/ -lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lguide -lpthread -O2
</code>

以下，古い情報
<code bash>
$ ifort /opt/intel/Compiler/11.0/074/mkl/examples/lapack95/source/gesv.f90 -L/opt/intel/Compiler/11.0/074/mkl/examples/lapack95/lib/em64t/ -I /opt/intel/Compiler/11.0/074/mkl/examples/lapack95/lib/em64t -lmkl_lapack95 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lguide -lpthread
$ ./a.out < /opt/intel/Compiler/11.0/074/mkl/examples/lapack95/data/gesv.d
</code>
=== コンパイルオプション(ia32) ===
<code bash>
ifort /opt/intel/Compiler/11.0/074/mkl/examples/lapack95/source/gesv.f90 -L/opt/intel/Compiler/11.0/074/mkl/examples/lapack95/lib/32  -I /opt/intel/Compiler/11.0/074/mkl/examples/lapack95/lib/32/ -lmkl_lapack95 -lmkl_intel -lmkl_intel_thread -lmkl_core -lguide -lpthread
$ ./a.out < /opt/intel/Compiler/11.0/074/mkl/examples/lapack95/data/gesv.d
</code>
=== 使い方 ===
/opt/intel/mkl/10.1.0.015/examples/lapack95/source/gesv.f90 の最初
<code fortran>
      USE MKL95_PRECISION, ONLY: WP => SP
      USE MKL95_LAPACK, ONLY: GESV
</code>
  * 単精度はSP、倍精度はDPを使用する。
  * 呼び出すサブルーチン名をONLYの後に書く。

/opt/intel/mkl/10.1.0.015/examples/lapack95/source/gesv.f90 の中頃
<code fortran>
      CALL GESV(  A, B )
</code>
<code fortran>
      CALL GESV(  AA, BB(:,1), IPIV, INFO )
</code>
   * 引数の数が違うのはfortran90のoptional属性によるもの。

  * LIBRARY_PATHやINCLUDEの環境変数を追加するか，libやmodを適切な位置に動かすのが良いかも．（makeほげほげで設定できないのかしら．未確認．FIXME）

===== blas95 =====

  * マトベク演算やらをしてくれるパッケージ。
  * matmulよりも高速。（多分)
  * 自分でmakeする必要あり。

==== ライブラリの作成 ====
  * libem64t, intel64の部分は環境に応じて適当に。オプションも適当に。デフォルトで十分そうなので、オプションはなくても良さそう。
  * ライブラリの作成方法はlapack95と同じ。
<code>
$ cd /opt/intel/Compiler/11.0/074/mkl/interfaces/blas95
$ sudo gnome-terminal
</code>
<code>
# . /opt/intel/Compiler/11.0/074/bin/ifortvars.sh intel64
# make libem64t 
# exit
</code>

==== サンプルコード　====
=== コンパイルオプション ===
コードのサンプルは /opt/intel/Compiler/11.0/074/mkl/examples/blas95/source/ にあり。ここでは dgemmx.f90を参考にしながら説明。
<code>
$ ifort /opt/intel/Compiler/11.0/074/mkl/include/mkl_blas.f90  /opt/intel/Compiler/11.0/074/mkl/examples/blas95/source/dgemmx.f90   /opt/intel/Compiler/11.0/074/mkl/examples/blas95/source/common_func.f -lmkl_blas95 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread
$ ./a.out < /opt/intel/Compiler/11.0/074/mkl/examples/blas/data/dgemmx.d
</code>
  * /opt/intel/Compiler/11.0/074/mkl/include/mkl_blas.f90 はインターフェース
  * /opt/intel/Compiler/11.0/074/mkl/examples/blas95/source/dgemmx.f90 がメインプログラム
  * /opt/intel/Compiler/11.0/074/mkl/examples/blas95/source/common_func.f はdgemmxが使用している副プログラム（サブルーチン）

=== コード ===
/opt/intel/Compiler/11.0/074/mkl/examples/blas95/source/dgemmx.f90 のはじめのあたり。
<code fortran>
      use mkl95_precision, only: wp => dp
      use mkl95_blas, only: gemm
</code>
  * lapack95と同様にspで単精度、dpで倍精度。
  * gemmは呼び出すルーチン名
===== sparse_blas95 =====

===== pardiso/MKL =====

  * [[http://www.pardiso-project.org]]
  * 疎行列で解く。
  * 直接法で解く。-> 条件数の大きい問題に使える。
  * [[http://dx.doi.org/10.1016/S0167-739X(00)00076-5|理論については論文を参照]]\\


==== コンパイルオプション ====
  * 77でも90でもコンパイル方法は同じ。（ファイルの拡張子が異なるだけ）
  * includeする場合はパスの設定が必要．（ifortvars.*shを読み込んでいる場合は，設定済み)
=== intel64の場合 ===
<code>
$ ifort hoge.f90 -lmkl_solver_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core   -liomp5 -lpthread
</code>

===  ia32の場合 ===
<code>
$ ifort hoge.f90 -lmkl_solver -lmkl_intel -lmkl_intel_thread -lmkl_core -lguide -lpthread
</code>

=== コメント ===
  * mkl_pardiso.f90をincludeせずに，mkl_pardiso.f90と一緒にコンパイルするやり方もあり。
<code>
$ifort pardiso.f90 /opt/intel/Compiler/11.0/074/mkl/include/mkl_pardiso.f90 -lmkl_solver -lmkl_intel -lmkl_intel_thread -lmkl_core -lguide -lpthread
</code>


==== サンプル　コード ====
f77のコードは /opt/intel/Compiler/11.0/074/mkl/examples/solver/source/ 内にあり。

=== fortran90 ===
<code fortran>
include 'mkl_pardiso.f90'

program pardiso_test
  use MKL_PARDISO
  implicit none
  real(8),allocatable::a(:),b(:),x(:)
  integer::maxfct=1,mnum=1,mtype,phase=13,n,nzero,nrhs=1,iparm(64),msglvl=0,error
  type(mkl_pardiso_handle) :: pt(64)
  integer i,j
  integer,allocatable::ia(:),ja(:),perm(:)
!!! this matrix is quoted by slatec document...
!       |11 12  0  0 15|   A: 11 12 15 | 21 22 | 33 35 | 44 | 51 53 55
!       |21 22  0  0  0|  IA:  1       |  4    |  6    |  8 |  9       | 12
!       | 0  0 33  0 35|  JA:  1  2  5 |  1  2 |  3  5 |  4 |  1  3  5 
!       | 0  0  0 44  0|
!       |51  0 53  0 55|
!!! set parameter of pardiso
  pt(:)=mkl_pardiso_handle(0)
  iparm(1)=0
  mtype=11
!!! assign values to variables
  n=5
  nzero=11
  allocate(a(nzero),b(n),x(n),ia(n+1),ja(nzero),perm(n))
  ia(:)=(/1,4,6,8,9,12/)
  a(:)=(/11.d0,12.d0,15.d0,21.d0,22.d0,33.d0,35.d0,44.d0,51.d0,53.d0,55.d0/)
  ja(:)=(/1,2,5,1,2,3,5,4,1,3,5/)
!!! calculate b when x=(/1.d0,2.d0,3.d0,4.d0,5.d0/)
  b(:)=0.d0
  do i=1,5
     do j=ia(i),ia(i+1)-1
        b(i)=b(i)+a(j)*ja(j)
     end do
  end do
!!! solve linear equation by pardiso
  call pardiso(pt,maxfct,mnum,mtype,phase,n,a,ia,ja,perm,nrhs,iparm,msglvl,b,x,error)
  if(error /= 0) then 
     write(*,*) error 
     stop
  end if
  write(*,*) x
end program pardiso_test
</code>

=== fortran77 ===
fortran90のサンプルを書き換え。
<code fortran>
      program pardiso_test
      implicit none
      integer lzero,l
      parameter (lzero=11,l=5)
      real*8 a(lzero),b(l),x(l)
      integer maxfct,mnum,mtype,phase,n,nzero,nrhs,iparm(64),
     $     msglvl,error,pt(64),ia(l+1),ja(lzero),perm(l),i,j
      data ia /1,4,6,8,9,12/
      data ja /1,2,5,1,2,3,5,4,1,3,5/
      data a /11.d0,12.d0,15.d0,21.d0,22.d0,33.d0,35.d0,
     $     44.d0,51.d0,53.d0,55.d0/
ccc   this matrix is quoted by slatec document...
c     |11 12  0  0 15|   A: 11 12 15 | 21 22 | 33 35 | 44 | 51 53 55
c     |21 22  0  0  0|  IA:  1       |  4    |  6    |  8 |  9       | 12
c     | 0  0 33  0 35|  JA:  1  2  5 |  1  2 |  3  5 |  4 |  1  3  5 
c     | 0  0  0 44  0|
c     |51  0 53  0 55|
ccc   set parameter of pardiso
      do i=1,64
         pt(i)=0
      end do
      iparm(1)=0
      mtype=11
      maxfct=1
      mnum=1
      phase=13
      nrhs=1
      msglvl=0
ccc   assign values to variables
      n=5
      nzero=11
ccc   calculate b when x=(/1.d0,2.d0,3.d0,4.d0,5.d0/)
      b(:)=0.d0
      do i=1,5
         do j=ia(i),ia(i+1)-1
            b(i)=b(i)+a(j)*ja(j)
         end do
      end do
ccc   solve linear equation by pardiso
      call pardiso(pt,maxfct,mnum,mtype,phase,n,a,ia,ja,perm,nrhs,
     $     iparm,msglvl,b,x,error)
      if(error /= 0) then 
         write(*,*) error 
         stop
      end if
      write(*,*) x
      end program
</code>


===== fgmres =====
前処理を効率的に行えるようGMRESを改良したもの。前処理に反復解法が使える。

FIXME