Merge branch 'master' of igit.ific.uv.es:alramos/latticegpu.jl

This commit is contained in:
Nicolas Lang 2025-02-25 16:58:16 +01:00
commit 3c59b9251a
41 changed files with 3467 additions and 902 deletions

View file

@ -9,6 +9,13 @@ makedocs(sitename="LatticeGPU", modules=[LatticeGPU], doctest=true,
"LatticeGPU.jl" => "index.md", "LatticeGPU.jl" => "index.md",
"Space-time" => "space.md", "Space-time" => "space.md",
"Groups and algebras" => "groups.md", "Groups and algebras" => "groups.md",
"Fields" => "fields.md" "Fields" => "fields.md",
"Yang-Mills" => "ym.md",
"Gradient flow" => "flow.md",
"Schrödinger Functional" => "sf.md",
"Spinors" => "spinors.md",
"Dirac" => "dirac.md",
"Solvers" => "solvers.md",
"Input Output" => "io.md"
], ],
repo = "https://igit.ific.uv.es/alramos/latticegpu.jl") repo = "https://igit.ific.uv.es/alramos/latticegpu.jl")

107
docs/src/dirac.md Normal file
View file

@ -0,0 +1,107 @@
# Dirac operator
The module `Dirac` has the necessary structures and functions
to simulate non-dynamical 4-dimensional Wilson fermions.
There are two main data structures in this module, the structure [`DiracParam`](@ref)
```@docs
DiracParam
```
and the workspace [`DiracWorkspace`](@ref)
```@docs
DiracWorkspace
```
The workspace stores four fermion fields, namely `.sr`, `.sp`, `.sAp` and `.st`, used
for different purposes. If the representation is either `SU2fund` of `SU3fund`, an extra
field with values in `U2alg`/`U3alg` is created to store the clover, used for the improvement.
## Functions
The functions [`Dw!`](@ref), [`g5Dw!`](@ref) and [`DwdagDw!`](@ref) are all related to the
Wilson-Dirac operator.
The action of the Dirac operator `Dw!` is the following:
```math
D_w\psi (\vec{x} = x_1,x_2,x_3,x_4) = (4 + m_0 + i \mu \gamma_5)\psi(\vec{x}) -
```
```math
- \frac{1}{2}\sum_{\mu = 1}^4 \theta (\mu) (1-\gamma_\mu) U_\mu(\vec{x}) \psi(\vec{x} + \hat{\mu}) + \theta^* (\mu) (1 + \gamma_\mu) U^{-1}_\mu(\vec{x} - \hat{\mu}) \psi(\vec{x} - \hat{\mu})
```
where $$m_0$$ and $$\theta$$ are respectively the values `.m0` and `.th` of [`DiracParam`](@ref).
Note that $$|\theta(\mu)|=1$$ is not built into the code, so it should be imposed explicitly.
Additionally, if |`dpar.csw`| > 1.0E-10, the clover term is assumed to be stored in `ymws.csw`, which
can be done via the [`Csw!`](@ref) function. In this case we have the Sheikholeslami-Wohlert (SW) term
in `Dw!`:
```math
\delta D_w^{sw} = \frac{i}{2}c_{sw} \sum_{\pi = 1}^6 F^{cl}_\pi \sigma_\pi \psi(\vec{x})
```
where the $$\sigma$$ matrices are those described in the `Spinors` module and the index $$\pi$$ runs
as specified in `lp.plidx`.
If the boudary conditions, defined in `lp`, are either `BC_SF_ORBI,D` or `BC_SF_AFWB`, the
improvement term
```math
\delta D_w^{SF} = (c_t -1) (\delta_{x_4,a} \psi(\vec{x}) + \delta_{x_4,T-a} \psi(\vec{x}))
```
is added. Since the time-slice $$t=T$$ is not stored, this accounts for modifying the second
and last time-slice.
Note that the Dirac operator for SF boundary conditions assumes that the value of the field
in the first time-slice is zero. To enforce this, we have the function
```@docs
SF_bndfix!
```
The function [`Csw!`](@ref) is used to store the clover in `dws.csw`. It is computed
according to the expression
```math
F_{\mu,\nu} = \frac{1}{8} (Q_{\mu \nu} - Q_{\nu \mu})
```
where
```math
Q_{\mu\nu} = U_\mu(\vec{x})U_{\nu}(x+\mu)U_{\mu}^{-1}(\vec{x}+\nu)U_{\nu}(\vec{x}) +
U_{\nu}^{-1}(\vec{x}-\nu) U_\mu (\vec{x}-\nu) U_{\nu}(\vec{x} +\mu - \nu) U^{-1}_{\mu}(\vec{x}) +
```
```math
+ U^{-1}_{\mu}(x-\mu)U_\nu^{-1}(\vec{x} - \mu - \nu)U_\mu(\vec{x} - \mu - \nu)U_\nu^{-1}(x-\nu) +
```
```math
+U_{\nu}(\vec{x})U_{\mu}^{-1}(\vec{x} + \nu - \mu)U^{-1}_{\nu}(\vec{x} - \mu)U_\mu(\vec{x}-\mu)
```
The correspondence between the tensor field and the GPU-Array is the following:
```math
F[b,1,r] \to F_{41}(b,r) ,\quad F[b,2,r] \to F_{42}(b,r) ,\quad F[b,3,r] \to F_{43}(b,r)
```
```math
F[b,4,r] \to F_{31}(b,r) ,\quad F[b,5,r] \to F_{32}(b,r) ,\quad F[b,6,r] \to F_{21}(b,r)
```
where $$(b,r)$$ labels the lattice points as explained in the module `Space`
The function [`pfrandomize!`](@ref), userful for stochastic sources, is also present. It
randomizes a fermion field, either in all the space or in a specific time-slice.
The generic interface of these functions reads
```@docs
Dw!
g5Dw!
DwdagDw!
Csw!
pfrandomize!
mtwmdpar
```

View file

@ -1,25 +1,29 @@
# Lattice fields # Lattice fields
The module `Fields` include simple routines to define a few typical The module `Fields` includes simple routines to define a few typical
fields. Fields are simple `CuArray` types with special memory fields. Fields are simple `CuArray` types with special memory
layout. A field always has an associated elemental type (i.e. for layout. A field always has an associated elemental type (i.e. for
gauge fields `SU3`, for scalar fields `Float64`). We have: gauge fields `SU3`, for scalar fields `Float64`). We have:
- scalar fields: One elemental type in each spacetime point. - Scalar fields: One elemental type in each spacetime point.
- vector field: One elemental type at each spacetime point and - Vector field: One elemental type at each spacetime point and
direction. direction.
- `N` scalar fields: `N` elemental types at each spacetime point. - `N` scalar fields: `N` elemental types at each spacetime point.
- Tensor fields: One elemental type at each spacetime point and
plane. They are to be thought of as symmetric tensors.
Fields can have **natural indexing**, where the memory layout follows
the point-in-block and block indices (see
[`SpaceParm`](@ref)). Fields can also have **lexicographic indexing**,
where points are labelled by a D-dimensional index (see [`scalar_field_point`](@ref)).
For all these fields the spacetime point are ordered in memory
according to the point-in-block and block indices (see
[`SpaceParm`](@ref)). An execption is the [`scalar_field_point`](@ref)
fields.
## Initialization ## Initialization
```@docs ```@docs
scalar_field scalar_field
vector_field vector_field
tensor_field
nscalar_field nscalar_field
scalar_field_point scalar_field_point
``` ```

47
docs/src/flow.md Normal file
View file

@ -0,0 +1,47 @@
# Gradient flow
The gradient flow equations can be integrated in two different ways:
1. Using a fixed step-size integrator. In this approach one fixes the
step size $\epsilon$ and the links are evolved from
$V_\mu(t)$ to $V_\mu(t +\epsilon)$ using some integration
scheme.
1. Using an adaptive step-size integrator. In this approach one fixes
the tolerance $h$ and the links are evolved for a time $t_{\rm
end}$ (i.e. from $V_\mu(t)$ to $V_\mu(t +t_{\rm end})$)
with the condition that the maximum error while advancing is not
larger than $h$.
In general adaptive step size integrators are much more efficient, but
one loses the possibility to measure flow quantities at the
intermediate times $\epsilon, 2\epsilon, 3\epsilon,...$. Adaptive
step size integrators are ideal for finite size scaling studies, while
a mix of both integrators is the most efficient approach in scale
setting applications.
## Integration schemes
```@docs
FlowIntr
wfl_euler
zfl_euler
wfl_rk2
zfl_rk2
wfl_rk3
zfl_rk3
```
## Integrating the flow equations
```@docs
flw
flw_adapt
```
## Observables
```@docs
Eoft_plaq
Eoft_clover
Qtop
```

View file

@ -1,7 +1,7 @@
# Groups and Algebras # Groups and Algebras
The module `Groups` contain generic data types to deal with group and The module `Groups` contains generic data types to deal with group and
algebra elements. Group elements $$g\in SU(N)$$ are represented in algebra elements. Group elements $$g\in SU(N)$$ are represented in
some compact notation. For the case $$N=2$$ we use two complex numbers some compact notation. For the case $$N=2$$ we use two complex numbers
(Caley-Dickson representation, i.e. $$g=(z_1,z_2)$$ with (Caley-Dickson representation, i.e. $$g=(z_1,z_2)$$ with
@ -79,7 +79,7 @@ elements. The objective is to get an idea on how group operations
We can generate some random group elements. We can generate some random group elements.
```@repl exs ```@repl exs
# Generate random groups elements, # Generate random groups elements,
# check they are actually from the grup # check they are actually from the group
g = rand(SU2{Float64}) g = rand(SU2{Float64})
println("Are we in a group?: ", isgroup(g)) println("Are we in a group?: ", isgroup(g))
g = rand(SU3{Float64}) g = rand(SU3{Float64})
@ -153,6 +153,10 @@ projalg
## Generic `Algebra` methods ## Generic `Algebra` methods
```@docs ```@docs
dot
norm
norm2
normalize
exp exp
expm expm
alg2mat alg2mat

14
docs/src/io.md Normal file
View file

@ -0,0 +1,14 @@
# Input/Output
## Configurations
Routines to read/write and import gauge configurations.
```@docs
read_cnfg
save_cnfg
import_bsfqcd
import_lex64
import_cern64
```

9
docs/src/sf.md Normal file
View file

@ -0,0 +1,9 @@
# Schödinger Functional
Specific SF observables and routines
```@docs
setbndfield
sfcoupling
```

88
docs/src/solvers.md Normal file
View file

@ -0,0 +1,88 @@
# Solvers
The module `Solvers` contains the functions to invert the Dirac
operator as well as functions to obtain specific propagators.
## CG.jl
The function [`CG!`](@ref) implements the Conjugate gradient
algorith for the operator A
```@docs
CG!
```
where the tolerance is normalized with respect to $$|$$``si``$$|^2$$.
The operator A must have the same input structure as all the Dirac
operators. If the maximum number of iterations `maxiter` is reached,
the function will throw an error. The estimation for $$A^{-1}x$$ is
stored in ``si``, and the number of iterations is returned.
Note that all the fermion field in ``dws`` are used
inside the function and will be modified. In particular, the final residue
is given by $$|$$``dws.sr``$$|^2$$.
## Propagators.jl
In this file, we define some useful functions to obtain certain
propagators.
```@docs
propagator!
```
Note that the indexing in Julia starts at 1, so the first tiime slice is t=1.
Internally, this function solves the equation
```math
D_w^\dagger D_w \psi = \gamma_5 D_w \gamma_5 \eta
```
where $$\eta$$ is either a point-source with the specified color and spin
or a random source in a time-slice and stores the value in ``pro``.
To solve this equation, the [`CG!`](@ref) function is used.
For the case of SF boundary conditions, we have the boundary-to-bulk
propagator, implemented by the function [`bndpropagator!`](@ref)
```@docs
bndpropagator!
```
This propagator is defined by the equation:
```math
D_W S(x) = \frac{c_t}{\sqrt{V}} \delta_{x_0,1} U_0^\dagger(0,\vec{x}) P_+
```
The analog for the other boundary is implemented in the function [`Tbndpropagator!`](@ref)
```@docs
Tbndpropagator!
```
defined by the equation:
```math
D_W R(x) = \frac{c_t}{\sqrt{V}} \delta_{x_0,T-1} U_0(T-1,\vec{x}) P_-
```
Where $$P_\pm = (1 \pm \gamma_0)/2$$. The boundary-to-boundary
propagator
```math
\frac{-c_t}{\sqrt{V}} \sum_{\vec{x}} U_0 ^\dagger (T-1,\vec{x}) P_+ S(T-1,\vec{x})
```
is computed by the function [`bndtobnd`](@ref)
```@docs
bndtobnd
```

View file

@ -3,7 +3,10 @@
D-dimensional lattice points are labeled by two ordered integer D-dimensional lattice points are labeled by two ordered integer
numbers: the point-in-block index ($$b$$ in the figure below) and the numbers: the point-in-block index ($$b$$ in the figure below) and the
block index ($$r$$ in the figure below). The routines [`up`](@ref) and block index ($$r$$ in the figure below). This is called **natural
indexing**, in contrast with the **lexicographic indexing** where points on
the lattice are represented by a D-dimensional `CartesianIndex`.
The routines [`up`](@ref) and
[`dw`](@ref) allow you to displace to the neighboring points of the [`dw`](@ref) allow you to displace to the neighboring points of the
lattice. lattice.
![D dimensional lattice points are labeled by its ![D dimensional lattice points are labeled by its

85
docs/src/spinors.md Normal file
View file

@ -0,0 +1,85 @@
# Spinors
The module Spinors defines the necessary functions for the structure `Spinor{NS,G}`,
which is a NS-tuple with values in G.
The functions `norm`, `norm2`, `dot`, `*`, `/`, `/`, `+`, `-`, `imm` and `mimm`,
if defined for G, are extended to Spinor{NS,G} for general NS.
For the 4D case, where NS = 4, there are some specific functions to implement different
operations with the gamma matrices. The convention for these matrices is
```math
\gamma _4 = \left(
\begin{array}{cccc}
0 & 0 & -1 & 0\\
0 & 0 & 0 & -1\\
-1 & 0 & 0 & 0\\
0 & -1 & 0 & 0\\
\end{array}
\right)
\quad
\gamma_1 = \left(
\begin{array}{cccc}
0 & 0 & 0 & -i\\
0 & 0 & -i & 0\\
0 & i & 0 & 0\\
i & 0 & 0 & 0\\
\end{array}
\right)
```
```math
\gamma _2 = \left(
\begin{array}{cccc}
0 & 0 & 0 & -1\\
0 & 0 & 1 & 0\\
0 & 1 & 0 & 0\\
-1 & 0 & 0 & 0\\
\end{array}
\right)
\quad
\gamma_3 = \left(
\begin{array}{cccc}
0 & 0 & -i & 0\\
0 & 0 & 0 & i\\
i & 0 & 0 & 0\\
0 & -i & 0 & 0\\
\end{array}
\right)
```
The function [`dmul`](@ref) implements the multiplication over the $$\gamma$$ matrices
```@docs
dmul
```
The function [`pmul`](@ref) implements the $$ (1 \pm \gamma_N) $$ proyectors. The functions
[`gpmul`](@ref) and [`gdagpmul`](@ref) do the same and then multiply each element by `g`and
g^-1 repectively.
```@docs
pmul
gpmul
gdagpmul
```
## Some examples
Here we just display some examples for these functions. We display it with `ComplexF64`
instead of `SU3fund` or `SU2fund` for simplicity.
```@setup exs
import Pkg # hide
Pkg.activate("/home/alberto/code/julia/LatticeGPU/") # hide
using LatticeGPU # hide
```
```@repl exs
spin = Spinor{4,Complex{Float64}}((1.0,im*0.5,2.3,0.0))
println(dmul(Gamma{4},spin))
println(pmul(Pgamma{2,-1},spin))
```

59
docs/src/ym.md Normal file
View file

@ -0,0 +1,59 @@
# Simulating Yang-Mills on the lattice
```@docs
GaugeParm
YMworkspace
ztwist
```
## Gauge actions and forces
Routines to compute the gauge action.
```@docs
gauge_action
```
Routines to compute the force derived from gauge actions.
```@docs
force_gauge
```
### Force field refresh
Algebra fields with **natural indexing** can be randomized.
```@docs
randomize!
```
## Basic observables
Some basic observable.
```@docs
plaquette
```
## HMC simulations
### Integrating the EOM
```@docs
IntrScheme
leapfrog
omf2
omf4
MD!
```
### HMC algorithm
```@docs
hamiltonian
HMC!
```

View file

@ -19,20 +19,31 @@ using ..Fields
using ..YM using ..YM
using ..Spinors using ..Spinors
"""
struct DiracParam{T,R}
Stores the parameters of the Dirac operator. It can be generated via the constructor `function DiracParam{T}(::Type{R},m0,csw,th,tm,ct)`. The first argument can be ommited and is taken to be `SU3fund`.
The parameters are:
- `m0::T` : Mass of the fermion
- `csw::T` : Improvement coefficient for the Csw term
- `th{Ntuple{4,Complex{T}}}` : Phase for the fermions included in the boundary conditions, reabsorbed in the Dirac operator.
- `tm` : Twisted mass parameter
- `ct` : Boundary improvement term, only used for Schrödinger Funtional boundary conditions.
"""
struct DiracParam{T,R} struct DiracParam{T,R}
m0::T m0::T
csw::T csw::T
th::NTuple{4,Complex{T}} th::NTuple{4,Complex{T}}
tm::T
ct::T ct::T
function DiracParam{T}(::Type{R},m0,csw,th,tm,ct) where {T,R}
function DiracParam{T}(::Type{R},m0,csw,th,ct) where {T,R} return new{T,R}(m0,csw,th,tm,ct)
return new{T,R}(m0,csw,th,ct)
end end
function DiracParam{T}(m0,csw,th,ct) where {T} function DiracParam{T}(m0,csw,th,tm,ct) where {T}
return new{T,SU3fund}(m0,csw,th,ct) return new{T,SU3fund}(m0,csw,th,tm,ct)
end end
end end
function Base.show(io::IO, dpar::DiracParam{T,R}) where {T,R} function Base.show(io::IO, dpar::DiracParam{T,R}) where {T,R}
@ -40,11 +51,24 @@ function Base.show(io::IO, dpar::DiracParam{T,R}) where {T,R}
println(io, "Wilson fermions in the: ", R, " representation") println(io, "Wilson fermions in the: ", R, " representation")
println(io, " - Bare mass: ", dpar.m0," // Kappa = ",0.5/(dpar.m0+4)) println(io, " - Bare mass: ", dpar.m0," // Kappa = ",0.5/(dpar.m0+4))
println(io, " - Csw : ", dpar.csw) println(io, " - Csw : ", dpar.csw)
println(io, " - c_t: ", dpar.ct)
println(io, " - Theta: ", dpar.th) println(io, " - Theta: ", dpar.th)
println(io, " - Twisted mass: ", dpar.tm)
println(io, " - c_t: ", dpar.ct)
return nothing return nothing
end end
"""
struct DiracWorkspace{T}
Workspace needed to work with fermion fields. It contains four scalar fermion fields and, for the SU2fund and SU3fund, a U(N) field to store the clover term.
It can be created with the constructor `DiracWorkspace(::Type{G}, ::Type{T}, lp::SpaceParm{4,6,B,D})`. For example:
dws = DiracWorkspace(SU2fund,Float64,lp);
dws = DiracWorkspace(SU3fund,Float64,lp);
"""
struct DiracWorkspace{T} struct DiracWorkspace{T}
sr sr
sp sp
@ -81,573 +105,30 @@ struct DiracWorkspace{T}
end end
export DiracWorkspace, DiracParam
""" """
function Csw!(dws, U, gp, lp::SpaceParm) function mtwmdpar(dpar::DiracParam)
Computes the clover and stores it in dws.csw.
Returns `dpar` with oposite value of the twisted mass.
""" """
function Csw!(dws, U, gp, lp::SpaceParm{4,6,B,D}) where {B,D} function mtwmdpar(dpar::DiracParam{P,R}) where {P,R}
return DiracParam{P}(R,dpar.m0,dpar.csw,dpar.th,-dpar.tm,dpar.ct)
@timeit "Csw computation" begin
for i in 1:Int(lp.npls)
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_csw!(dws.csw, U, gp.Ubnd, i, lp)
end
end
end
return nothing
end
function krnl_csw!(csw::AbstractArray{T}, U, Ubnd, ipl, lp::SpaceParm{4,M,B,D}) where {T,M,B,D}
@inbounds begin
b = Int64(CUDA.threadIdx().x)
r = Int64(CUDA.blockIdx().x)
I = point_coord((b,r), lp)
it = I[4]
id1, id2 = lp.plidx[ipl]
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) && (id1 == 4)
bu1, ru1 = up((b, r), id1, lp)
bu2, ru2 = up((b, r), id2, lp)
bd1, rd1 = dw((b, r), id1, lp)
bd2, rd2 = dw((b, r), id2, lp)
bdd, rdd = dw((bd1, rd1), id2, lp)
bud, rud = dw((bu1, ru1), id2, lp)
bdu, rdu = up((bd1, rd1), id2, lp)
if SFBC && (it == lp.iL[end])
gt1 = Ubnd[id2]
gt2 = Ubnd[id2]
else
gt1 = U[bu1,id2,ru1]
gt2 = U[bud,id2,rud]
end
M1 = U[b,id1,r]*gt1/(U[b,id2,r]*U[bu2,id1,ru2])
M2 = (U[bd2,id2,rd2]\(U[bd2,id1,rd2]*gt2))/U[b,id1,r]
M3 = (U[bdd,id2,rdd]*U[bd1,id1,rd1])\(U[bdd,id1,rdd]*U[bd2,id2,rd2])
M4 = (U[b,id2,r]/(U[bd1,id2,rd1]*U[bdu,id1,rdu]))*U[bd1,id1,rd1]
if !(SFBC && (it == 1))
csw[b,ipl,r] = 0.125*(antsym(M1)+antsym(M2)+antsym(M3)+antsym(M4))
end
end
return nothing
end
function Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,B,D}) where {B,D}
if abs(dpar.csw) > 1.0E-10
@timeit "Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Dwimpr!(so, U, si, dws.csw, dpar.m0, dpar.th, dpar.csw, lp)
end
end
else
@timeit "Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Dw!(so, U, si, dpar.m0, dpar.th, lp)
end
end
end
return nothing
end
function krnl_Dwimpr!(so, U, si, Fcsw, m0, th, csw, lp::SpaceParm{4,6,B,D}) where {B,D}
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
@inbounds begin
so[b,r] = (4+m0)*si[b,r] + 0.5*csw*im*( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
+Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) + Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) + Fcsw[b,6,r]*dmul(Gamma{13},si[b,r]))
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
end
return nothing
end
function krnl_Dw!(so, U, si, m0, th, lp::SpaceParm{4,6,B,D}) where {B,D}
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
@inbounds begin
so[b,r] = (4+m0)*si[b,r]
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
end
return nothing
end
function Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
if abs(dpar.csw) > 1.0E-10
@timeit "Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Dwimpr!(so, U, si, dws.csw, dpar.m0, dpar.th, dpar.csw, dpar.ct, lp)
end
end
else
@timeit "Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Dw!(so, U, si, dpar.m0, dpar.th, dpar.ct, lp)
end
end
end
return nothing
end
function krnl_Dwimpr!(so, U, si, Fcsw, m0, th, csw, ct, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
# The field si is assumed to be zero at t = 0
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
if (point_time((b,r),lp) != 1)
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
@inbounds begin
so[b,r] = (4+m0)*si[b,r] + 0.5*csw*im*( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
+Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) + Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) + Fcsw[b,6,r]*dmul(Gamma{13},si[b,r]))
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == lp.iL[4])
so[b,r] += (ct-1.0)*si[b,r]
end
end
end
return nothing
end
function krnl_Dw!(so, U, si, m0, th, ct, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
# The field si is assumed to be zero at t = 0
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
if (point_time((b,r),lp) != 1)
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
@inbounds begin
so[b,r] = (4+m0)*si[b,r]
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == lp.iL[4])
so[b,r] += (ct-1.0)*si[b,r]
end
end
end
return nothing
end
function g5Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,B,D}) where {B,D}
if abs(dpar.csw) > 1.0E-10
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(so, U, si, dws.csw, dpar.m0, dpar.th, dpar.csw, lp)
end
end
else
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(so, U, si, dpar.m0, dpar.th, lp)
end
end
end
return nothing
end
function krnl_g5Dwimpr!(so, U, si, Fcsw, m0, th, csw, lp::SpaceParm{4,6,B,D}) where {B,D}
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
@inbounds begin
so[b,r] = (4+m0)*si[b,r] + 0.5*csw*im*( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
+Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) + Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) + Fcsw[b,6,r]*dmul(Gamma{13},si[b,r]))
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
so[b,r] = dmul(Gamma{5}, so[b,r])
end
return nothing
end
function krnl_g5Dw!(so, U, si, m0, th, lp::SpaceParm{4,6,B,D}) where {B,D}
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
@inbounds begin
so[b,r] = (4+m0)*si[b,r]
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
so[b,r] = dmul(Gamma{5}, so[b,r])
end
return nothing
end
function g5Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
if abs(dpar.csw) > 1.0E-10
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(so, U, si, dws.csw, dpar.m0, dpar.th, dpar.csw, dpar.ct, lp)
end
end
else
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(so, U, si, dpar.m0, dpar.th, dpar.ct, lp)
end
end
end
return nothing
end
function krnl_g5Dwimpr!(so, U, si, Fcsw, m0, th, csw, ct, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
# The field si is assumed to be zero at t = 0
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
if (point_time((b,r),lp) != 1)
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
@inbounds begin
so[b,r] = (4+m0)*si[b,r] + 0.5*csw*im*( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
+Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) + Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) + Fcsw[b,6,r]*dmul(Gamma{13},si[b,r]))
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == lp.iL[4])
so[b,r] += (ct-1.0)*si[b,r]
end
end
end
so[b,r] = dmul(Gamma{5}, so[b,r])
return nothing
end
function krnl_g5Dw!(so, U, si, m0, th, ct, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
# The field si is assumed to be zero at t = 0
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
if (point_time((b,r),lp) != 1)
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
@inbounds begin
so[b,r] = (4+m0)*si[b,r]
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == lp.iL[4])
so[b,r] += (ct-1.0)*si[b,r]
end
end
end
so[b,r] = dmul(Gamma{5}, so[b,r])
return nothing
end
function DwdagDw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
if abs(dpar.csw) > 1.0E-10
@timeit "DwdagDw" begin
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(dws.st, U, si, dws.csw, dpar.m0, dpar.th, dpar.csw, dpar.ct, lp)
end
end
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(so, U, dws.st, dws.csw, dpar.m0, dpar.th, dpar.csw, dpar.ct, lp)
end
end
end
else
@timeit "DwdagDw" begin
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(dws.st, U, si, dpar.m0, dpar.th, dpar.ct, lp)
end
end
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(so, U, dws.st, dpar.m0, dpar.th, dpar.ct, lp)
end
end
end
end
return nothing
end
function DwdagDw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,B,D}) where {B,D}
if abs(dpar.csw) > 1.0E-10
@timeit "DwdagDw" begin
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(dws.st, U, si, dws.csw, dpar.m0, dpar.th, dpar.csw, lp)
end
end
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(so, U, dws.st, dws.csw, dpar.m0, dpar.th, dpar.csw, lp)
end
end
end
else
@timeit "DwdagDw" begin
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(dws.st, U, si, dpar.m0, dpar.th, lp)
end
end
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(so, U, dws.st, dpar.m0, dpar.th, lp)
end
end
end
end
return nothing
end end
function SF_bndfix!(sp, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D} export DiracWorkspace, DiracParam, mtwmdpar
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_sfbndfix!(sp, lp)
end
return nothing include("Diracfields.jl")
end export SF_bndfix!, Csw!, pfrandomize!
function krnl_sfbndfix!(sp,lp::SpaceParm)
b=Int64(CUDA.threadIdx().x)
r=Int64(CUDA.blockIdx().x)
if (point_time((b,r),lp) == 1)
sp[b,r] = 0.0*sp[b,r]
end
return nothing
end
"""
function pfrandomize!(f::AbstractArray{Spinor{4, SU3fund / SU2fund {T}}}, lp::SpaceParm, t::Int64 = 0)
Randomizes the SU2fund / SU3fund fermion field. If the argument t is present, it only randomizes that time-slice.
"""
function pfrandomize!(f::AbstractArray{Spinor{4, SU3fund{T}}}, lp::SpaceParm, t::Int64 = 0) where {T}
@timeit "Randomize pseudofermion field" begin
p = ntuple(i->CUDA.randn(T, lp.bsz, 3, lp.rsz,2),4) # complex generation not suported for Julia 1.5.4
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_assign_pf_su3!(f,p,lp,t)
end
end
return nothing
end
function krnl_assign_pf_su3!(f::AbstractArray, p , lp::SpaceParm, t::Int64)
@inbounds begin
b = Int64(CUDA.threadIdx().x)
r = Int64(CUDA.blockIdx().x)
if t == 0
f[b,r] = Spinor(map(x->SU3fund(x[b,1,r,1] + im* x[b,1,r,2],
x[b,2,r,1] + im* x[b,2,r,2],
x[b,3,r,1] + im* x[b,3,r,2]),p))
elseif point_time((b,r),lp) == t
f[b,r] = Spinor(map(x->SU3fund(x[b,1,r,1] + im* x[b,1,r,2],
x[b,2,r,1] + im* x[b,2,r,2],
x[b,3,r,1] + im* x[b,3,r,2]),p))
end
end
return nothing
end
function pfrandomize!(f::AbstractArray{Spinor{4, SU2fund{T}}},lp::SpaceParm, t::Int64=0) where {T}
@timeit "Randomize pseudofermion field" begin
p = ntuple(i->CUDA.randn(T, lp.bsz, 2, lp.rsz,2),4) # complex generation not suported for Julia 1.5.4
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_assign_pf_su2!(f,p,lp,t)
end
end
return nothing
end
function krnl_assign_pf_su2!(f::AbstractArray, p , lp::SpaceParm, t::Int64)
@inbounds begin
b = Int64(CUDA.threadIdx().x)
r = Int64(CUDA.blockIdx().x)
if t == 0
f[b,r] = Spinor(map(x->SU2fund(x[b,1,r,1] + im* x[b,1,r,2],
x[b,2,r,1] + im* x[b,2,r,2]),p))
elseif point_time((b,r),lp) == t
f[b,r] = Spinor(map(x->SU2fund(x[b,1,r,1] + im* x[b,1,r,2],
x[b,2,r,1] + im* x[b,2,r,2]),p))
end
end
return nothing
end
export Dw!, g5Dw!, DwdagDw!, SF_bndfix!, Csw!, pfrandomize!
include("Diracoper.jl")
export Dw!, g5Dw!, DwdagDw!
include("DiracIO.jl") include("DiracIO.jl")
export read_prop, save_prop, read_dpar export read_prop, save_prop, read_dpar
include("Diracflow.jl")
export Nablanabla!, Dslash_sq!, flw, backflow
end end

View file

@ -41,7 +41,7 @@ function read_prop(fname::String)
footh = Vector{Float64}(undef, 4) footh = Vector{Float64}(undef, 4)
lp = SpaceParm{ndim}(iL, (4,4,4,4), ibc, ntw) lp = SpaceParm{ndim}(iL, (4,4,4,4), ibc, ntw)
dpar = DiracParam{Float64}(SU3fund,foopars[1],foopars[2],ntuple(i -> footh[i], 4),foopars[3]) dpar = DiracParam{Float64}(SU3fund,foopars[1],foopars[2],ntuple(i -> footh[i], 4),foopars[3],foopars[4])
dtr = (2,3,4,1) dtr = (2,3,4,1)
@ -100,7 +100,7 @@ function save_prop(fname::String, psi, lp::SpaceParm{4,M,B,D}, dpar::DiracParam;
BDIO_write!(fb, [convert(Int32, B)]) BDIO_write!(fb, [convert(Int32, B)])
BDIO_write!(fb, [convert(Int32, lp.iL[i]) for i in 1:4]) BDIO_write!(fb, [convert(Int32, lp.iL[i]) for i in 1:4])
BDIO_write!(fb, [convert(Int32, lp.ntw[i]) for i in 1:M]) BDIO_write!(fb, [convert(Int32, lp.ntw[i]) for i in 1:M])
BDIO_write!(fb, [dpar.m0, dpar.csw, dpar.ct]) BDIO_write!(fb, [dpar.m0, dpar.csw, dpar.tm, dpar.ct])
BDIO_write!(fb, [dpar.th[i] for i in 1:4]) BDIO_write!(fb, [dpar.th[i] for i in 1:4])
end end
BDIO_write_hash!(fb) BDIO_write_hash!(fb)
@ -175,7 +175,7 @@ function read_dpar(fname::String)
footh = Vector{Float64}(undef, 4) footh = Vector{Float64}(undef, 4)
lp = SpaceParm{ndim}(iL, (4,4,4,4), ibc, ntw) lp = SpaceParm{ndim}(iL, (4,4,4,4), ibc, ntw)
dpar = DiracParam{Float64}(SU3fund,foopars[1],foopars[2],ntuple(i -> footh[i], 4),foopars[3]) dpar = DiracParam{Float64}(SU3fund,foopars[1],foopars[2],ntuple(i -> footh[i], 4),foopars[3],foopars[4])
BDIO_close!(fb) BDIO_close!(fb)

211
src/Dirac/Diracfields.jl Normal file
View file

@ -0,0 +1,211 @@
"""
function Csw!(dws, U, gp, lp::SpaceParm)
Computes the clover and stores it in dws.csw.
"""
function Csw!(dws, U, gp, lp::SpaceParm{4,6,B,D}) where {B,D}
@timeit "Csw computation" begin
for i in 1:Int(lp.npls)
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_csw!(dws.csw, U, gp.Ubnd, i, lp)
end
end
end
return nothing
end
function krnl_csw!(csw::AbstractArray{T}, U, Ubnd, ipl, lp::SpaceParm{4,M,B,D}) where {T,M,B,D}
@inbounds begin
b = Int64(CUDA.threadIdx().x)
r = Int64(CUDA.blockIdx().x)
I = point_coord((b,r), lp)
it = I[4]
id1, id2 = lp.plidx[ipl]
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) && (id1 == 4)
OBC = (B == BC_OPEN) && ((it == 1) || (it == lp.iL[end]))
bu1, ru1 = up((b, r), id1, lp)
bu2, ru2 = up((b, r), id2, lp)
bd1, rd1 = dw((b, r), id1, lp)
bd2, rd2 = dw((b, r), id2, lp)
bdd, rdd = dw((bd1, rd1), id2, lp)
bud, rud = dw((bu1, ru1), id2, lp)
bdu, rdu = up((bd1, rd1), id2, lp)
if SFBC && (it == lp.iL[end])
gt1 = Ubnd[id2]
gt2 = Ubnd[id2]
else
gt1 = U[bu1,id2,ru1]
gt2 = U[bud,id2,rud]
end
M1 = U[b,id1,r]*gt1/(U[b,id2,r]*U[bu2,id1,ru2])
M2 = (U[bd2,id2,rd2]\(U[bd2,id1,rd2]*gt2))/U[b,id1,r]
M3 = (U[bdd,id2,rdd]*U[bd1,id1,rd1])\(U[bdd,id1,rdd]*U[bd2,id2,rd2])
M4 = (U[b,id2,r]/(U[bd1,id2,rd1]*U[bdu,id1,rdu]))*U[bd1,id1,rd1]
if !(SFBC && (it == 1)) && !OBC
csw[b,ipl,r] = 0.125*(antsym(M1)+antsym(M2)+antsym(M3)+antsym(M4))
end
end
return nothing
end
"""
SF_bndfix!(sp, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}})
Sets all the values of `sp` in the first time slice to zero.
"""
function SF_bndfix!(sp, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
@timeit "SF boundary fix" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_sfbndfix!(sp, lp)
end
end
return nothing
end
function krnl_sfbndfix!(sp,lp::SpaceParm)
b=Int64(CUDA.threadIdx().x)
r=Int64(CUDA.blockIdx().x)
if (point_time((b,r),lp) == 1)
sp[b,r] = 0.0*sp[b,r]
end
return nothing
end
"""
SF_bndfix!(sp, lp::SpaceParm{4,6,BC_OPEN,D})
Sets all the values of `sp` in the first and last time slice to zero.
"""
function SF_bndfix!(sp, lp::SpaceParm{4,6,BC_OPEN,D}) where {D}
@timeit "SF boundary fix" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_opbndfix!(sp, lp)
end
end
return nothing
end
function krnl_opbndfix!(sp,lp::SpaceParm)
b=Int64(CUDA.threadIdx().x)
r=Int64(CUDA.blockIdx().x)
if ((point_time((b,r),lp) == 1) || (point_time((b,r),lp) == lp.iL[end]))
sp[b,r] = 0.0*sp[b,r]
end
return nothing
end
"""
function pfrandomize!(f::AbstractArray{Spinor{4, SU3fund / SU2fund {T}}}, lp::SpaceParm, t::Int64 = 0)
Randomizes the SU2fund / SU3fund fermion field. If the argument t is present, it only randomizes that time-slice.
"""
function pfrandomize!(f::AbstractArray{Spinor{4, SU3fund{T}}}, lp::SpaceParm{4,6,BC_PERIODIC,D}, t::Int64 = 0) where {T,D}
@timeit "Randomize pseudofermion field" begin
p = ntuple(i->CUDA.randn(T, lp.bsz, 3, lp.rsz,2),4)./sqrt(2) # complex generation not suported for Julia 1.5.4
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_assign_pf_su3!(f,p,lp,t)
end
end
return nothing
end
function pfrandomize!(f::AbstractArray{Spinor{4, SU3fund{T}}}, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D},SpaceParm{4,6,BC_OPEN,D}}, t::Int64 = 0) where {T,D}
@timeit "Randomize pseudofermion field" begin
p = ntuple(i->CUDA.randn(T, lp.bsz, 3, lp.rsz,2),4)./sqrt(2) # complex generation not suported for Julia 1.5.4
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_assign_pf_su3!(f,p,lp,t)
end
end
SF_bndfix!(f,lp)
return nothing
end
function krnl_assign_pf_su3!(f::AbstractArray, p , lp::SpaceParm, t::Int64)
@inbounds begin
b = Int64(CUDA.threadIdx().x)
r = Int64(CUDA.blockIdx().x)
if t == 0
f[b,r] = Spinor(map(x->SU3fund(x[b,1,r,1] + im* x[b,1,r,2],
x[b,2,r,1] + im* x[b,2,r,2],
x[b,3,r,1] + im* x[b,3,r,2]),p))
elseif point_time((b,r),lp) == t
f[b,r] = Spinor(map(x->SU3fund(x[b,1,r,1] + im* x[b,1,r,2],
x[b,2,r,1] + im* x[b,2,r,2],
x[b,3,r,1] + im* x[b,3,r,2]),p))
end
end
return nothing
end
function pfrandomize!(f::AbstractArray{Spinor{4, SU2fund{T}}}, lp::SpaceParm{4,6,BC_PERIODIC,D}, t::Int64 = 0) where {T,D}
@timeit "Randomize pseudofermion field" begin
p = ntuple(i->CUDA.randn(T, lp.bsz, 3, lp.rsz,2),4)./sqrt(2) # complex generation not suported for Julia 1.5.4
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_assign_pf_su2!(f,p,lp,t)
end
end
return nothing
end
function pfrandomize!(f::AbstractArray{Spinor{4, SU2fund{T}}}, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D},SpaceParm{4,6,BC_OPEN,D}}, t::Int64 = 0) where {T,D}
@timeit "Randomize pseudofermion field" begin
p = ntuple(i->CUDA.randn(T, lp.bsz, 3, lp.rsz,2),4)./sqrt(2) # complex generation not suported for Julia 1.5.4
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_assign_pf_su2!(f,p,lp,t)
end
end
SF_bndfix!(f,lp)
return nothing
end
function krnl_assign_pf_su2!(f::AbstractArray, p , lp::SpaceParm, t::Int64)
@inbounds begin
b = Int64(CUDA.threadIdx().x)
r = Int64(CUDA.blockIdx().x)
if t == 0
f[b,r] = Spinor(map(x->SU2fund(x[b,1,r,1] + im* x[b,1,r,2],
x[b,2,r,1] + im* x[b,2,r,2]),p))
elseif point_time((b,r),lp) == t
f[b,r] = Spinor(map(x->SU2fund(x[b,1,r,1] + im* x[b,1,r,2],
x[b,2,r,1] + im* x[b,2,r,2]),p))
end
end
return nothing
end

456
src/Dirac/Diracflow.jl Normal file
View file

@ -0,0 +1,456 @@
import ..YM.flw, ..YM.force_gauge, ..YM.flw_adapt
function flw(U, psi, int::FlowIntr{NI,T}, ns::Int64, eps, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace) where {NI,T}
@timeit "Integrating flow equations" begin
for i in 1:ns
force_gauge(ymws, U, int.c0, 1, gp, lp)
if int.add_zth
add_zth_term(ymws::YMworkspace, U, lp)
end
Nablanabla!(dws.sAp, U, psi, dpar, dws, lp)
psi .= psi + 2*int.r*eps*dws.sAp
ymws.mom .= ymws.frc1
U .= expm.(U, ymws.mom, 2*eps*int.r)
for k in 1:NI
force_gauge(ymws, U, int.c0, 1, gp, lp)
if int.add_zth
add_zth_term(ymws::YMworkspace, U, lp)
end
Nablanabla!(dws.sp, U, psi, dpar, dws, lp)
dws.sAp .= int.e0[k].*dws.sAp .+ int.e1[k].*dws.sp
psi .= psi + 2*eps*dws.sAp
ymws.mom .= int.e0[k].*ymws.mom .+ int.e1[k].*ymws.frc1
U .= expm.(U, ymws.mom, 2*eps)
end
end
end
return nothing
end
flw(U, psi, int::FlowIntr{NI,T}, ns::Int64, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace) where {NI,T} = flw(U, psi, int::FlowIntr{NI,T}, ns::Int64, int.eps, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace)
"""
function backflow(psi, U, Dt, nsave::Int64, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace)
Performs one step back in flow time for the fermion field, according to 1302.5246. The fermion field must me that of the time-slice Dt and is flowed back to the first time-slice
nsave is the total number of gauge fields saved in the process
"""
function backflow(psi, U, Dt, maxnsave::Int64, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace)
int = wfl_rk3(Float64,0.01,1.0) # Default integrator, it has to be order 3 rk but in can be zfl
@timeit "Backflow integration" begin
@timeit "GPU to CPU" U0 = Array(U)
nt,eps_all = flw_adapt(U, int, Dt, gp, lp, ymws)
nsave = min(maxnsave,nt)
nsave != 0 ? dsave = Int64(floor(nt/nsave)) : dsave = nt
Usave = Vector{typeof(U0)}(undef,nsave)
@timeit "CPU to GPU" copyto!(U,U0)
for i in 1:(dsave*nsave)
flw(U, int, 1, eps_all[i], gp, lp, ymws)
(i%dsave)==0 ? Usave[Int64(i/dsave)] = Array(U) : nothing
end
for j in (nt%nsave):-1:1
@timeit "CPU to GPU" copyto!(U,Usave[end])
for k in 1:j-1
flw(U, int, 1, eps_all[nsave*dsave + k], gp, lp, ymws)
end
bflw_step!(psi, U, eps_all[nsave*dsave + j], int::FlowIntr, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace)
end
for i in (nsave-1):-1:1
for j in dsave:-1:1
@timeit "CPU to GPU" copyto!(U,Usave[i])
for k in 1:j-1
flw(U, int, 1, eps_all[i*dsave + k], gp, lp, ymws)
end
bflw_step!(psi, U, eps_all[i*dsave + j], int::FlowIntr, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace)
end
end
@timeit "CPU to GPU" copyto!(U,U0)
for j in dsave:-1:1
@timeit "CPU to GPU" copyto!(U,U0)
for k in 1:j-1
flw(U, int, 1, eps_all[k], gp, lp, ymws)
end
bflw_step!(psi, U, eps_all[j], int::FlowIntr, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace)
end
@timeit "CPU to GPU" copyto!(U,U0)
end
return nothing
end
"""
function bflw_step!(U, psi, eps, int::FlowIntr, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace)
Performs ONE backstep in psi, from t to t-\eps. U is supposed to be the one in t-\eps and is left unchanged. So far, int has to be rk4
"""
function bflw_step!(psi, U, eps, int::FlowIntr, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace)
@timeit "Backflow step" begin
V = copy(U)
V .= U
force_gauge(ymws, U, int.c0, 1, gp, lp)
if int.add_zth
add_zth_term(ymws::YMworkspace, U, lp)
end
ymws.mom .= ymws.frc1
U .= expm.(U, ymws.mom, 2*eps*int.r)
force_gauge(ymws, U, int.c0, 1, gp, lp)
if int.add_zth
add_zth_term(ymws::YMworkspace, U, lp)
end
ymws.mom .= int.e0[1].*ymws.mom .+ int.e1[1].*ymws.frc1
U .= expm.(U, ymws.mom, 2*eps)
Nablanabla!(dws.sp, U, 0.75*2*eps*psi, dpar, dws, lp)
U .= V
force_gauge(ymws, U, int.c0, 1, gp, lp)
if int.add_zth
add_zth_term(ymws::YMworkspace, U, lp)
end
U .= expm.(U, ymws.frc1, 2*eps*int.r)
Nablanabla!(dws.sAp, U, 2*eps*dws.sp, dpar, dws, lp)
dws.sAp .= psi + (8/9)*dws.sAp
U .= V
Nablanabla!(psi, U, 2*eps*(dws.sAp - (8/9)*dws.sp), dpar, dws, lp)
psi .= (1/4)*psi + dws.sp + dws.sAp
end
return nothing
end
function flw_adapt(U, psi, int::FlowIntr{NI,T}, tend::T, epsini::T, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace) where {NI,T}
eps = epsini
dt = tend
nstp = 0
eps_all = Vector{T}(undef,0)
while true
ns = convert(Int64, floor(dt/eps))
if ns > 10
flw(U, psi, int, 9, eps, gp, dpar, lp, ymws, dws)
ymws.U1 .= U
flw(U, psi, int, 1, eps, gp, dpar, lp, ymws, dws)
flw(ymws.U1, int, 2, eps/2, gp, lp, ymws)
dt = dt - 10*eps
nstp = nstp + 10
push!(eps_all,ntuple(i->eps,10)...)
# adjust step size
ymws.U1 .= ymws.U1 ./ U
maxd = CUDA.mapreduce(dev_one, max, ymws.U1, init=zero(tend))
eps = min(int.max_eps, 2*eps, int.sft_fac*eps*(int.tol/maxd)^(one(tend)/3))
else
flw(U, psi, int, ns, eps, gp, dpar, lp, ymws, dws)
dt = dt - ns*eps
push!(eps_all,ntuple(i->eps,ns)...)
push!(eps_all,dt)
flw(U, psi, int, 1, dt, gp, dpar, lp, ymws, dws)
dt = zero(tend)
nstp = nstp + ns + 1
end
if dt == zero(tend)
break
end
end
return nstp, eps_all
end
flw_adapt(U, psi, int::FlowIntr{NI,T}, tend::T, gp::GaugeParm, dpar::DiracParam, lp::SpaceParm, ymws::YMworkspace, dws::DiracWorkspace) where {NI,T} = flw_adapt(U, psi, int, tend, int.eps_ini, gp, dpar, lp, ymws, dws)
"""
function Nablanabla!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,B,D})
Computes /`/` \\nabla^* \\nabla /`/` `si` and stores it in `si`.
"""
function Nablanabla!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,BC_PERIODIC,D}) where {D}
@timeit "Laplacian" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Nablanabla(so, U, si, dpar.th, lp)
end
end
return nothing
end
function Nablanabla!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D},SpaceParm{4,6,BC_OPEN,D}}) where {D}
SF_bndfix!(si,lp)
@timeit "Laplacian" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Nablanabla(so, U, si, dpar.th, lp)
end
end
SF_bndfix!(so,lp)
return nothing
end
function krnl_Nablanabla(so, U, si, th, lp::SpaceParm{4,6,BC_OPEN,D}) where {D}
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
@inbounds begin
if ((point_time((b,r),lp) != 1) && (point_time((b,r),lp) != lp.iL[end]))
so[b,r] = -4*si[b,r]
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
so[b,r] += 0.5*( th[1] * (U[b,1,r]*si[bu1,ru1]) +conj(th[1]) * (U[bd1,1,rd1]\si[bd1,rd1]) +
th[2] * (U[b,2,r]*si[bu2,ru2]) +conj(th[2]) * (U[bd2,2,rd2]\si[bd2,rd2]) +
th[3] * (U[b,3,r]*si[bu3,ru3]) +conj(th[3]) * (U[bd3,3,rd3]\si[bd3,rd3]) +
th[4] * (U[b,4,r]*si[bu4,ru4]) +conj(th[4]) * (U[bd4,4,rd4]\si[bd4,rd4]) )
end
end
return nothing
end
function krnl_Nablanabla(so, U, si, th, lp::SpaceParm{4,6,BC_PERIODIC,D}) where {D}
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
@inbounds begin
so[b,r] = -4*si[b,r]
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
so[b,r] += 0.5*( th[1] * (U[b,1,r]*si[bu1,ru1]) +conj(th[1]) * (U[bd1,1,rd1]\si[bd1,rd1]) +
th[2] * (U[b,2,r]*si[bu2,ru2]) +conj(th[2]) * (U[bd2,2,rd2]\si[bd2,rd2]) +
th[3] * (U[b,3,r]*si[bu3,ru3]) +conj(th[3]) * (U[bd3,3,rd3]\si[bd3,rd3]) +
th[4] * (U[b,4,r]*si[bu4,ru4]) +conj(th[4]) * (U[bd4,4,rd4]\si[bd4,rd4]) )
end
return nothing
end
function krnl_Nablanabla(so, U, si, th, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
@inbounds begin
if (point_time((b,r),lp) != 1)
so[b,r] = -4*si[b,r]
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
so[b,r] += 0.5*( th[1] * (U[b,1,r]*si[bu1,ru1]) +conj(th[1]) * (U[bd1,1,rd1]\si[bd1,rd1]) +
th[2] * (U[b,2,r]*si[bu2,ru2]) +conj(th[2]) * (U[bd2,2,rd2]\si[bd2,rd2]) +
th[3] * (U[b,3,r]*si[bu3,ru3]) +conj(th[3]) * (U[bd3,3,rd3]\si[bd3,rd3]) +
th[4] * (U[b,4,r]*si[bu4,ru4]) +conj(th[4]) * (U[bd4,4,rd4]\si[bd4,rd4]) )
end
end
return nothing
end
export Nablanabla!, flw, backflow, flw_adapt, bflw_step!
"""
function Dslash_sq!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,B,D})
Computes /`/` //slashed{D}^2 si /`/` ans stores it in `si`.
"""
function Dslash_sq!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,B,D}) where {B,D}
@timeit "DwdagDw" begin
@timeit "g5Dslsh" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dslsh!(dws.st, U, si, dpar.th, lp)
end
end
if abs(dpar.csw) > 1.0E-10
@timeit "Dw_improvement" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dslsh_impr!(dws.st, dws.csw, dpar.csw, si, lp)
end
end
end
@timeit "g5Dslsh" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dslsh!(so, U, dws.st, dpar.th, lp)
end
end
if abs(dpar.csw) > 1.0E-10
@timeit "Dw_improvement" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dslsh_impr!(so, dws.csw, dpar.csw, dws.st, lp)
end
end
end
end
return nothing
end
function krnl_g5Dslsh!(so, U, si, th, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
if (point_time((b,r),lp) != 1)
@inbounds begin
so[b,r] = 4*si[b,r]
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
so[b,r] = dmul(Gamma{5}, so[b,r])
end
end
return nothing
end
function krnl_g5Dslsh!(so, U, si, th, lp::SpaceParm{4,6,B,D}) where {D,B}
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
@inbounds begin
so[b,r] = 4*si[b,r]
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
so[b,r] = dmul(Gamma{5}, so[b,r])
end
return nothing
end
function krnl_g5Dslsh_impr!(so, Fcsw, csw, si, lp::SpaceParm{4,6,B,D}) where {B,D}
@inbounds begin
b = Int64(CUDA.threadIdx().x);
r = Int64(CUDA.blockIdx().x)
so[b,r] += 0.5*csw*im*dmul(Gamma{5},( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
-Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) - Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) - Fcsw[b,6,r]*dmul(Gamma{13},si[b,r])))
end
return nothing
end
function krnl_g5Dslsh_impr!(so, Fcsw, csw, si, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
@inbounds begin
b = Int64(CUDA.threadIdx().x);
r = Int64(CUDA.blockIdx().x)
if (point_time((b,r),lp) != 1)
so[b,r] += 0.5*csw*im*dmul(Gamma{5},( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
-Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) - Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) - Fcsw[b,6,r]*dmul(Gamma{13},si[b,r])))
end
return nothing
end
end

667
src/Dirac/Diracoper.jl Normal file
View file

@ -0,0 +1,667 @@
## OPEN
"""
function Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,B,D})
Computes the Dirac operator (with the Wilson term) `\`\``D_w``\`\` with gauge field U and parameters `dpar` of the field `si` and stores it in `so`.
If `dpar.csw` is different from zero, the clover term should be stored in `dws.csw` via the Csw! function and is automatically included in the operator.
"""
function Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,BC_OPEN,D}) where {D}
SF_bndfix!(si,lp)
if abs(dpar.csw) > 1.0E-10
@timeit "Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Dwimpr!(so, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, dpar.ct, lp)
end
end
else
@timeit "Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Dw!(so, U, si, dpar.m0, dpar.tm, dpar.th, dpar.ct, lp)
end
end
end
SF_bndfix!(so,lp)
return nothing
end
function krnl_Dwimpr!(so, U, si, Fcsw, m0, tm, th, csw, ct, lp::SpaceParm{4,6,BC_OPEN,D}) where {D}
# The field si is assumed to be zero at t = 0,T
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
if ((point_time((b,r),lp) != 1) && (point_time((b,r),lp) != lp.iL[end]))
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
@inbounds begin
so[b,r] = (4+m0)*si[b,r] + im*tm*dmul(Gamma{5},si[b,r]) + 0.5*csw*im*( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
+Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) + Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) + Fcsw[b,6,r]*dmul(Gamma{13},si[b,r]))
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == (lp.iL[4]-1))
so[b,r] += (ct-1.0)*si[b,r]
end
end
end
return nothing
end
function krnl_Dw!(so, U, si, m0, tm, th, ct, lp::SpaceParm{4,6,BC_OPEN,D}) where {D}
# The field si is assumed to be zero at t = 0,T
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
if ((point_time((b,r),lp) != 1) && (point_time((b,r),lp) != lp.iL[end]))
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
@inbounds begin
so[b,r] = (4+m0)*si[b,r] + im*tm*dmul(Gamma{5},si[b,r])
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == (lp.iL[4]-1))
so[b,r] += (ct-1.0)*si[b,r]
end
end
end
return nothing
end
"""
function g5Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,B,D})
Computes \`\` \\gamma_5 \`\` times the Dirac operator (with the Wilson term) with gauge field U and parameters `dpar` of the field `si` and stores it in `so`.
If `dpar.csw` is different from zero, the clover term should be stored in `dws.csw` via the Csw! function and is automatically included in the operator.
"""
function g5Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,BC_OPEN,D}) where {D}
SF_bndfix!(si,lp)
if abs(dpar.csw) > 1.0E-10
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(so, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, dpar.ct, lp)
end
end
else
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(so, U, si, dpar.m0, dpar.tm, dpar.th, dpar.ct, lp)
end
end
end
SF_bndfix!(so,lp)
return nothing
end
function krnl_g5Dwimpr!(so, U, si, Fcsw, m0, tm, th, csw, ct, lp::SpaceParm{4,6,BC_OPEN,D}) where {D}
# The field si is assumed to be zero at t = 0,T
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
if ((point_time((b,r),lp) != 1) && (point_time((b,r),lp) != lp.iL[end]))
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
@inbounds begin
so[b,r] = (4+m0)*si[b,r] + 0.5*csw*im*( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
+Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) + Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) + Fcsw[b,6,r]*dmul(Gamma{13},si[b,r]))
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == (lp.iL[4]-1))
so[b,r] += (ct-1.0)*si[b,r]
end
end
end
so[b,r] = dmul(Gamma{5}, so[b,r])+ im*tm*si[b,r]
return nothing
end
function krnl_g5Dw!(so, U, si, m0, tm, th, ct, lp::SpaceParm{4,6,BC_OPEN,D}) where {D}
# The field si is assumed to be zero at t = 0,T
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
if ((point_time((b,r),lp) != 1) && (point_time((b,r),lp) != lp.iL[end]))
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
@inbounds begin
so[b,r] = (4+m0)*si[b,r]
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == (lp.iL[4]-1))
so[b,r] += (ct-1.0)*si[b,r]
end
end
end
so[b,r] = dmul(Gamma{5}, so[b,r]) + im*tm*si[b,r]
return nothing
end
"""
function DwdagDw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,B,D})
Applies the operator \`\` \\gamma_5 D_w \`\` twice to `si` and stores the result in `so`. This is equivalent to appling the operator \`\` D_w^\\dagger D_w \`\`
The Dirac operator is the same as in the functions `Dw!` and `g5Dw!`
"""
function DwdagDw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,BC_OPEN,D}) where {D}
SF_bndfix!(si,lp)
if abs(dpar.csw) > 1.0E-10
@timeit "DwdagDw" begin
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(dws.st, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, dpar.ct, lp)
end
end
SF_bndfix!(dws.st,lp)
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(so, U, dws.st, dws.csw, dpar.m0, -dpar.tm, dpar.th, dpar.csw, dpar.ct, lp)
end
end
SF_bndfix!(so,lp)
end
else
@timeit "DwdagDw" begin
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(dws.st, U, si, dpar.m0, dpar.tm, dpar.th, dpar.ct, lp)
end
end
SF_bndfix!(dws.st,lp)
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(so, U, dws.st, dpar.m0, -dpar.tm, dpar.th, dpar.ct, lp)
end
end
SF_bndfix!(so,lp)
end
end
return nothing
end
## PERDIODIC
function Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,BC_PERIODIC,D}) where {D}
if abs(dpar.csw) > 1.0E-10
@timeit "Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Dwimpr!(so, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, lp)
end
end
else
@timeit "Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Dw!(so, U, si, dpar.m0, dpar.tm, dpar.th, lp)
end
end
end
return nothing
end
function krnl_Dwimpr!(so, U, si, Fcsw, m0, tm, th, csw, lp::SpaceParm{4,6,BC_PERIODIC,D}) where {D}
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
@inbounds begin
so[b,r] = (4+m0)*si[b,r]+ im*tm*dmul(Gamma{5},si[b,r]) + 0.5*csw*im*( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
+Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) + Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) + Fcsw[b,6,r]*dmul(Gamma{13},si[b,r]))
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
end
return nothing
end
function krnl_Dw!(so, U, si, m0, tm, th, lp::SpaceParm{4,6,BC_PERIODIC,D}) where {D}
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
@inbounds begin
so[b,r] = (4+m0)*si[b,r] + im*tm*dmul(Gamma{5},si[b,r])
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
end
return nothing
end
function g5Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,BC_PERIODIC,D}) where {D}
if abs(dpar.csw) > 1.0E-10
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(so, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, lp)
end
end
else
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(so, U, si, dpar.m0, dpar.tm, dpar.th, lp)
end
end
end
return nothing
end
function krnl_g5Dwimpr!(so, U, si, Fcsw, m0, tm, th, csw, lp::SpaceParm{4,6,BC_PERIODIC,D}) where {D}
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
@inbounds begin
so[b,r] = (4+m0)*si[b,r] + 0.5*csw*im*( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
+Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) + Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) + Fcsw[b,6,r]*dmul(Gamma{13},si[b,r]))
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
so[b,r] = dmul(Gamma{5}, so[b,r])+ im*tm*si[b,r]
end
return nothing
end
function krnl_g5Dw!(so, U, si, m0, tm, th, lp::SpaceParm{4,6,BC_PERIODIC,D}) where {D}
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
@inbounds begin
so[b,r] = (4+m0)*si[b,r]
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
so[b,r] = dmul(Gamma{5}, so[b,r]) + im*tm*si[b,r]
end
return nothing
end
function DwdagDw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::SpaceParm{4,6,BC_PERIODIC,D}) where {D}
if abs(dpar.csw) > 1.0E-10
@timeit "DwdagDw" begin
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(dws.st, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, lp)
end
end
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(so, U, dws.st, dws.csw, dpar.m0, -dpar.tm, dpar.th, dpar.csw, lp)
end
end
end
else
@timeit "DwdagDw" begin
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(dws.st, U, si, dpar.m0, dpar.tm, dpar.th, lp)
end
end
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(so, U, dws.st, dpar.m0, -dpar.tm, dpar.th, lp)
end
end
end end
return nothing
end
## SF
function Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
SF_bndfix!(si,lp)
if abs(dpar.csw) > 1.0E-10
@timeit "Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Dwimpr!(so, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, dpar.ct, lp)
end
end
else
@timeit "Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_Dw!(so, U, si, dpar.m0, dpar.tm, dpar.th, dpar.ct, lp)
end
end
end
SF_bndfix!(so,lp)
return nothing
end
function krnl_Dwimpr!(so, U, si, Fcsw, m0, tm, th, csw, ct, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
# The field si is assumed to be zero at t = 0
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
if (point_time((b,r),lp) != 1)
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
@inbounds begin
so[b,r] = (4+m0)*si[b,r] + im*tm*dmul(Gamma{5},si[b,r]) + 0.5*csw*im*( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
+Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) + Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) + Fcsw[b,6,r]*dmul(Gamma{13},si[b,r]))
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == lp.iL[4])
so[b,r] += (ct-1.0)*si[b,r]
end
end
end
return nothing
end
function krnl_Dw!(so, U, si, m0, tm, th, ct, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
# The field si is assumed to be zero at t = 0
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
if (point_time((b,r),lp) != 1)
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
@inbounds begin
so[b,r] = (4+m0)*si[b,r] + im*tm*dmul(Gamma{5},si[b,r])
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == lp.iL[4])
so[b,r] += (ct-1.0)*si[b,r]
end
end
end
return nothing
end
function g5Dw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
SF_bndfix!(si,lp)
if abs(dpar.csw) > 1.0E-10
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(so, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, dpar.ct, lp)
end
end
else
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(so, U, si, dpar.m0, dpar.tm, dpar.th, dpar.ct, lp)
end
end
end
SF_bndfix!(so,lp)
return nothing
end
function krnl_g5Dwimpr!(so, U, si, Fcsw, m0, tm, th, csw, ct, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
# The field si is assumed to be zero at t = 0
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
if (point_time((b,r),lp) != 1)
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
@inbounds begin
so[b,r] = (4+m0)*si[b,r] + 0.5*csw*im*( Fcsw[b,1,r]*dmul(Gamma{10},si[b,r]) + Fcsw[b,2,r]*dmul(Gamma{11},si[b,r]) + Fcsw[b,3,r]*dmul(Gamma{12},si[b,r])
+Fcsw[b,4,r]*dmul(Gamma{15},si[b,r]) + Fcsw[b,5,r]*dmul(Gamma{14},si[b,r]) + Fcsw[b,6,r]*dmul(Gamma{13},si[b,r]))
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == lp.iL[4])
so[b,r] += (ct-1.0)*si[b,r]
end
end
end
so[b,r] = dmul(Gamma{5}, so[b,r])+ im*tm*si[b,r]
return nothing
end
function krnl_g5Dw!(so, U, si, m0, tm, th, ct, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
# The field si is assumed to be zero at t = 0
b = Int64(CUDA.threadIdx().x); r = Int64(CUDA.blockIdx().x)
if (point_time((b,r),lp) != 1)
bu1, ru1 = up((b,r), 1, lp)
bd1, rd1 = dw((b,r), 1, lp)
bu2, ru2 = up((b,r), 2, lp)
bd2, rd2 = dw((b,r), 2, lp)
bu3, ru3 = up((b,r), 3, lp)
bd3, rd3 = dw((b,r), 3, lp)
bu4, ru4 = up((b,r), 4, lp)
bd4, rd4 = dw((b,r), 4, lp)
@inbounds begin
so[b,r] = (4+m0)*si[b,r]
so[b,r] -= 0.5*(th[1]*gpmul(Pgamma{1,-1},U[b,1,r],si[bu1,ru1]) +conj(th[1])*gdagpmul(Pgamma{1,+1},U[bd1,1,rd1],si[bd1,rd1]) +
th[2]*gpmul(Pgamma{2,-1},U[b,2,r],si[bu2,ru2]) +conj(th[2])*gdagpmul(Pgamma{2,+1},U[bd2,2,rd2],si[bd2,rd2]) +
th[3]*gpmul(Pgamma{3,-1},U[b,3,r],si[bu3,ru3]) +conj(th[3])*gdagpmul(Pgamma{3,+1},U[bd3,3,rd3],si[bd3,rd3]) +
th[4]*gpmul(Pgamma{4,-1},U[b,4,r],si[bu4,ru4]) +conj(th[4])*gdagpmul(Pgamma{4,+1},U[bd4,4,rd4],si[bd4,rd4]) )
if (point_time((b,r),lp) == 2) || (point_time((b,r),lp) == lp.iL[4])
so[b,r] += (ct-1.0)*si[b,r]
end
end
end
so[b,r] = dmul(Gamma{5}, so[b,r]) + im*tm*si[b,r]
return nothing
end
function DwdagDw!(so, U, si, dpar::DiracParam, dws::DiracWorkspace, lp::Union{SpaceParm{4,6,BC_SF_ORBI,D},SpaceParm{4,6,BC_SF_AFWB,D}}) where {D}
SF_bndfix!(si,lp)
if abs(dpar.csw) > 1.0E-10
@timeit "DwdagDw" begin
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(dws.st, U, si, dws.csw, dpar.m0, dpar.tm, dpar.th, dpar.csw, dpar.ct, lp)
end
end
SF_bndfix!(dws.st,lp)
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dwimpr!(so, U, dws.st, dws.csw, dpar.m0, -dpar.tm, dpar.th, dpar.csw, dpar.ct, lp)
end
end
SF_bndfix!(so,lp)
end
else
@timeit "DwdagDw" begin
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(dws.st, U, si, dpar.m0, dpar.tm, dpar.th, dpar.ct, lp)
end
end
SF_bndfix!(dws.st,lp)
@timeit "g5Dw" begin
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_g5Dw!(so, U, dws.st, dpar.m0, -dpar.tm, dpar.th, dpar.ct, lp)
end
end
SF_bndfix!(so,lp)
end
end
return nothing
end

View file

@ -31,7 +31,7 @@ scalar_field(::Type{T}, lp::SpaceParm) where {T} = CuArray{T, 2}(undef, lp.b
""" """
nscalar_field(::Type{T}, n::Integer, lp::SpaceParm) nscalar_field(::Type{T}, n::Integer, lp::SpaceParm)
Returns `n` scalar fields of elemental type `T` Returns `n` scalar fields of elemental type `T`.
""" """
nscalar_field(::Type{T}, n, lp::SpaceParm) where {T} = CuArray{T, 3}(undef, lp.bsz, n, lp.rsz) nscalar_field(::Type{T}, n, lp::SpaceParm) where {T} = CuArray{T, 3}(undef, lp.bsz, n, lp.rsz)
@ -46,7 +46,7 @@ scalar_field_point(::Type{T}, lp::SpaceParm{N,M,D}) where {T,N,M,D} = CuArray{T,
""" """
tensor_field(::Type{T}, lp::SpaceParm) tensor_field(::Type{T}, lp::SpaceParm)
Returns a tensor field of elemental type `T`. Returns a (symmetric) tensor field of elemental type `T`.
""" """
tensor_field(::Type{T}, lp::SpaceParm) where {T} = CuArray{T, 3}(undef, lp.bsz, lp.npls, lp.rsz) tensor_field(::Type{T}, lp::SpaceParm) where {T} = CuArray{T, 3}(undef, lp.bsz, lp.npls, lp.rsz)

View file

@ -1,12 +1,23 @@
"""
struct U2alg{T} <: Algebra
Elements of the `U(2)` Algebra. The type `T <: AbstractFloat` can be used to define single or double precision elements.
"""
struct U2alg{T} <: Algebra struct U2alg{T} <: Algebra
u11::T u11::T
u22::T u22::T
u12::Complex{T} u12::Complex{T}
end end
"""
antsym(a::SU2{T}) where T <: AbstractFloat
Returns the antisymmetrization of the SU2 element `a`, that is `\`\` `a - a^{\\dagger}` `\`. This method returns al element of `U2alg{T}`.
"""
function antsym(a::SU2{T}) where T <: AbstractFloat function antsym(a::SU2{T}) where T <: AbstractFloat
return U2alg{T}(2.0*imag(a.t1),-2.0*imag(a.t1),2.0*a.t2) return U2alg{T}(2.0*imag(a.t1),-2.0*imag(a.t1),2.0*a.t2)
end end

View file

@ -1,6 +1,10 @@
"""
struct U3alg{T} <: Algebra
Elements of the `U(3)` Algebra. The type `T <: AbstractFloat` can be used to define single or double precision elements.
"""
struct U3alg{T} <: Algebra struct U3alg{T} <: Algebra
u11::T u11::T
u22::T u22::T
@ -10,6 +14,11 @@ struct U3alg{T} <: Algebra
u23::Complex{T} u23::Complex{T}
end end
"""
antsym(a::SU3{T}) where T <: AbstractFloat
Returns the antisymmetrization of the SU3 element `a`, that is `\`\` `a - a^{\\dagger}` `\`. This method returns al element of `U3alg{T}`.
"""
function antsym(a::SU3{T}) where T <: AbstractFloat function antsym(a::SU3{T}) where T <: AbstractFloat
t1 = 2.0*imag(a.u11) t1 = 2.0*imag(a.u11)
t2 = 2.0*imag(a.u22) t2 = 2.0*imag(a.u22)

View file

@ -38,7 +38,7 @@ norm2(a::SU3fund{T}) where T <: AbstractFloat = (abs2(a.t1) + abs2
Returns the scalar product of two fundamental elements. The convention is for the product to the linear in the second argument, and anti-linear in the first argument. Returns the scalar product of two fundamental elements. The convention is for the product to the linear in the second argument, and anti-linear in the first argument.
""" """
dot(g1::SU3fund{T},g2::SU3fund{T}) where T <: AbstractFloat = conj(g1.t1)*g2.t1+g1.t2*conj(g2.t2)+g1.t3*conj(g2.t3) dot(g1::SU3fund{T},g2::SU3fund{T}) where T <: AbstractFloat = conj(g1.t1)*g2.t1+conj(g1.t2)*g2.t2+conj(g1.t3)*g2.t3
""" """
*(g::SU3{T},b::SU3fund{T}) *(g::SU3{T},b::SU3fund{T})

View file

@ -36,7 +36,7 @@ norm2(a::SU2{T}) where T <: AbstractFloat = abs2(a.t1) + abs2(a.t2)
""" """
tr(g::T) where T <: Group tr(g::T) where T <: Group
Returns the trace of the groups element `g`. Returns the trace of the group element `g`.
""" """
tr(g::SU2{T}) where T <: AbstractFloat = complex(2*real(g.t1), 0.0) tr(g::SU2{T}) where T <: AbstractFloat = complex(2*real(g.t1), 0.0)

View file

@ -40,25 +40,27 @@ include("YM/YM.jl")
using .YM using .YM
export ztwist export ztwist
export YMworkspace, GaugeParm, force0_wilson!, field, field_pln, randomize!, zero!, norm2 export YMworkspace, GaugeParm, force0_wilson!, field, field_pln, randomize!, zero!, norm2
export force_gauge, MD!
export gauge_action, hamiltonian, plaquette, HMC!, OMF4! export gauge_action, hamiltonian, plaquette, HMC!, OMF4!
export Eoft_clover, Eoft_plaq, Qtop export Eoft_clover, Eoft_plaq, Qtop
export FlowIntr, wfl_euler, zfl_euler, wfl_rk2, zfl_rk2, wfl_rk3, zfl_rk3 export FlowIntr, wfl_euler, zfl_euler, wfl_rk2, zfl_rk2, wfl_rk3, zfl_rk3
export flw, flw_adapt export flw, flw_adapt
export sfcoupling, bndfield, setbndfield export sfcoupling, bndfield, setbndfield
export import_lex64, import_cern64, import_bsfqcd, save_cnfg, read_cnfg export import_lex64, import_cern64, import_bsfqcd, save_cnfg, read_cnfg, read_gp
include("Spinors/Spinors.jl") include("Spinors/Spinors.jl")
using .Spinors using .Spinors
export Spinor, Pgamma export Spinor, Pgamma, Gamma
export imm, mimm export imm, mimm
export pmul, gpmul, gdagpmul, dmul export pmul, gpmul, gdagpmul, dmul
include("Dirac/Dirac.jl") include("Dirac/Dirac.jl")
using .Dirac using .Dirac
export DiracWorkspace, DiracParam export DiracWorkspace, DiracParam
export Dw!, g5Dw!, DwdagDw!, SF_bndfix!, Csw!, pfrandomize! export Dw!, g5Dw!, DwdagDw!, SF_bndfix!, Csw!, pfrandomize!, mtwmdpar
export read_prop, save_prop, read_dpar export read_prop, save_prop, read_dpar
export Nablanabla!, flw, backflow
include("Solvers/Solvers.jl") include("Solvers/Solvers.jl")
using .Solvers using .Solvers

View file

@ -24,6 +24,11 @@ const r1omf2 = 0.1931833275037836
const r2omf2 = 0.5 const r2omf2 = 0.5
const r3omf2 = 1 - 2*r1omf2 const r3omf2 = 1 - 2*r1omf2
"""
struct IntrScheme{N, T}
Integrator for the molecular dynamics.
"""
struct IntrScheme{N, T} struct IntrScheme{N, T}
r::NTuple{N, T} r::NTuple{N, T}
eps::T eps::T
@ -31,8 +36,23 @@ struct IntrScheme{N, T}
end end
"""
omf2(::Type{T}, eps, ns)
Second order Omelyan integrator with `eps` stepsize and `ns` steps.
"""
omf2(::Type{T}, eps, ns) where T = IntrScheme{3,T}((r1omf2,r2omf2,r3omf2), eps, ns) omf2(::Type{T}, eps, ns) where T = IntrScheme{3,T}((r1omf2,r2omf2,r3omf2), eps, ns)
"""
omf4(::Type{T}, eps, ns)
Fourth order Omelyan integrator with `eps` stepsize and `ns` steps.
"""
omf4(::Type{T}, eps, ns) where T = IntrScheme{6,T}((r1omf4,r2omf4,r3omf4,r4omf4,r5omf4,r6omf4), eps, ns) omf4(::Type{T}, eps, ns) where T = IntrScheme{6,T}((r1omf4,r2omf4,r3omf4,r4omf4,r5omf4,r6omf4), eps, ns)
"""
leapfrog(::Type{T}, eps, ns)
Leapfrog integrator with `eps` stepsize and `ns` steps.
"""
leapfrog(::Type{T}, eps, ns) where T = IntrScheme{2,T}((0.5,1.0), eps, ns) leapfrog(::Type{T}, eps, ns) where T = IntrScheme{2,T}((0.5,1.0), eps, ns)

View file

@ -9,11 +9,6 @@
### created: Tue Nov 30 11:10:57 2021 ### created: Tue Nov 30 11:10:57 2021
### ###
"""
function CG!
Solves the linear equation `Ax = si`
"""
function krnl_dot!(sum,fone,ftwo) function krnl_dot!(sum,fone,ftwo)
b=Int64(CUDA.threadIdx().x) b=Int64(CUDA.threadIdx().x)
r=Int64(CUDA.blockIdx().x) r=Int64(CUDA.blockIdx().x)
@ -23,7 +18,7 @@ function krnl_dot!(sum,fone,ftwo)
return nothing return nothing
end end
function field_dot(fone::AbstractArray,ftwo::AbstractArray,sumf,lp) where {T} function field_dot(fone::AbstractArray,ftwo::AbstractArray,sumf,lp)
CUDA.@sync begin CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_dot!(sumf,fone,ftwo) CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_dot!(sumf,fone,ftwo)
@ -32,6 +27,12 @@ function field_dot(fone::AbstractArray,ftwo::AbstractArray,sumf,lp) where {T}
return sum(sumf) return sum(sumf)
end end
"""
function CG!(si, U, A, dpar::DiracParam, lp::SpaceParm, dws::DiracWorkspace{T}, maxiter::Int64 = 10, tol=1.0)
Solves the linear equation `Ax = si`
"""
function CG!(si, U, A, dpar::DiracParam, lp::SpaceParm, dws::DiracWorkspace{T}, maxiter::Int64 = 10, tol=1.0) where {T} function CG!(si, U, A, dpar::DiracParam, lp::SpaceParm, dws::DiracWorkspace{T}, maxiter::Int64 = 10, tol=1.0) where {T}
dws.sr .= si dws.sr .= si

View file

@ -5,7 +5,7 @@
function propagator!(pro,U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::SpaceParm, maxiter::Int64, tol::Float64, y::NTuple{4,Int64}, c::Int64, s::Int64) function propagator!(pro,U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::SpaceParm, maxiter::Int64, tol::Float64, y::NTuple{4,Int64}, c::Int64, s::Int64)
Saves the fermionic progapator in pro for a source at point `y` with color `c` and spin `s`. If the last three arguments are replaced by `time::Int64`, the source is replaced Saves the fermionic progapator in pro for a source at point `y` with color `c` and spin `s`. If the last three arguments are replaced by `time::Int64`, the source is replaced
by a random source in spin and color at t = `time`. by a random source in spin and color at t = `time`. Returns the number of iterations.
""" """
function propagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::SpaceParm, maxiter::Int64, tol::Float64, y::NTuple{4,Int64}, c::Int64, s::Int64) where {T} function propagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::SpaceParm, maxiter::Int64, tol::Float64, y::NTuple{4,Int64}, c::Int64, s::Int64) where {T}
@ -17,6 +17,8 @@ function propagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::Space
return nothing return nothing
end end
@timeit "Propagator computation" begin
fill!(dws.sp,zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp)))) fill!(dws.sp,zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp))))
CUDA.@allowscalar dws.sp[point_index(CartesianIndex{lp.ndim}(y),lp)...] = Spinor{4,SU3fund{Float64}}(ntuple(i -> (i==s)*SU3fund{Float64}(ntuple(j -> (j==c)*1.0,3)...),4)) CUDA.@allowscalar dws.sp[point_index(CartesianIndex{lp.ndim}(y),lp)...] = Spinor{4,SU3fund{Float64}}(ntuple(i -> (i==s)*SU3fund{Float64}(ntuple(j -> (j==c)*1.0,3)...),4))
@ -25,10 +27,12 @@ function propagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::Space
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnlg5!(dws.sp) CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnlg5!(dws.sp)
end end
g5Dw!(pro,U,dws.sp,dpar,dws,lp) g5Dw!(pro,U,dws.sp,mtwmdpar(dpar),dws,lp)
CG!(pro,U,DwdagDw!,dpar,lp,dws,maxiter,tol) niter = CG!(pro,U,DwdagDw!,dpar,lp,dws,maxiter,tol)
return nothing end
return niter
end end
function propagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::SpaceParm, maxiter::Int64, tol::Float64, time::Int64) where {T} function propagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::SpaceParm, maxiter::Int64, tol::Float64, time::Int64) where {T}
@ -40,28 +44,29 @@ function propagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::Space
return nothing return nothing
end end
@timeit "Propagator computation" begin
fill!(dws.sp,zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp))))
pfrandomize!(dws.sp,lp,time) pfrandomize!(dws.sp,lp,time)
CUDA.@sync begin CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnlg5!(dws.sp) CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnlg5!(dws.sp)
end end
g5Dw!(pro,U,dws.sp,dpar,dws,lp) g5Dw!(pro,U,dws.sp,mtwmdpar(dpar),dws,lp)
CUDA.@sync begin niter = CG!(pro,U,DwdagDw!,dpar,lp,dws,maxiter,tol)
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnlg5!(dws.sp)
end end
CG!(pro,U,DwdagDw!,dpar,lp,dws,maxiter,tol) return niter
return nothing
end end
""" """
function bndpropagator!(pro,U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::SpaceParm{4,6,1,D}, maxiter::Int64, tol::Float64, c::Int64, s::Int64) function bndpropagator!(pro,U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::SpaceParm{4,6,1,D}, maxiter::Int64, tol::Float64, c::Int64, s::Int64)
Saves the propagator in from the t=0 boundary to the bulk for the SF boundary conditions for a source with color 'c' and spin 's'. The factor c_t is included while the factor 1/sqrt(V) is not. Saves the propagator from the t=0 boundary to the bulk for the SF boundary conditions for a source with color 'c' and spin 's' in 'pro'. The factor c_t is included while the factor 1/sqrt(V) is not.
For the propagator from T to the bulk, use the function Tbndpropagator(U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::SpaceParm{4,6,1,D}, maxiter::Int64, tol::Float64, c::Int64, s::Int64) For the propagator from T to the bulk, use the function Tbndpropagator(U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::SpaceParm{4,6,1,D}, maxiter::Int64, tol::Float64, c::Int64, s::Int64). Returns the number of iterations.
""" """
function bndpropagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::SpaceParm{4,6,1,D}, maxiter::Int64, tol::Float64, c::Int64, s::Int64) where {T,D} function bndpropagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::SpaceParm{4,6,1,D}, maxiter::Int64, tol::Float64, c::Int64, s::Int64) where {T,D}
@ -85,6 +90,8 @@ function bndpropagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::Sp
return nothing return nothing
end end
@timeit "Propagator computation" begin
SF_bndfix!(pro,lp)
fill!(dws.sp,zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp)))) fill!(dws.sp,zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp))))
CUDA.@sync begin CUDA.@sync begin
@ -95,18 +102,20 @@ function bndpropagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::Sp
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnlg5!(dws.sp) CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnlg5!(dws.sp)
end end
g5Dw!(pro,U,dpar.ct*dws.sp,dpar,dws,lp) g5Dw!(pro,U,dpar.ct*dws.sp,mtwmdpar(dpar),dws,lp)
CG!(pro,U,DwdagDw!,dpar,lp,dws,maxiter,tol) niter = CG!(pro,U,DwdagDw!,dpar,lp,dws,maxiter,tol)
return pro end
return niter
end end
""" """
function Tbndpropagator(U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::SpaceParm{4,6,1,D}, maxiter::Int64, tol::Float64, c::Int64, s::Int64) function Tbndpropagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::SpaceParm{4,6,1,D}, maxiter::Int64, tol::Float64, c::Int64, s::Int64)
Returns the propagator from the t=T boundary to the bulk for the SF boundary conditions for a source with color 'c' and spin 's'. The factor c_t is included while the factor 1/sqrt(V) is not. Returns the propagator from the t=T boundary to the bulk for the SF boundary conditions for a source with color 'c' and spin 's'. The factor c_t is included while the factor 1/sqrt(V) is not.
For the propagator from t=0 to the bulk, use the function bndpropagator(U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::SpaceParm{4,6,1,D}, maxiter::Int64, tol::Float64, c::Int64, s::Int64) For the propagator from t=0 to the bulk, use the function bndpropagator(U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::SpaceParm{4,6,1,D}, maxiter::Int64, tol::Float64, c::Int64, s::Int64). Returns the number of iterations.
""" """
function Tbndpropagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::SpaceParm{4,6,1,D}, maxiter::Int64, tol::Float64, c::Int64, s::Int64) where {T,D} function Tbndpropagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::SpaceParm{4,6,1,D}, maxiter::Int64, tol::Float64, c::Int64, s::Int64) where {T,D}
@ -129,6 +138,7 @@ function Tbndpropagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::S
return nothing return nothing
end end
@timeit "Propagator computation" begin
fill!(dws.sp,zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp)))) fill!(dws.sp,zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp))))
CUDA.@sync begin CUDA.@sync begin
@ -139,10 +149,12 @@ function Tbndpropagator!(pro, U, dpar::DiracParam{T}, dws::DiracWorkspace, lp::S
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnlg5!(dws.sp) CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnlg5!(dws.sp)
end end
g5Dw!(pro,U,dpar.ct*dws.sp,dpar,dws,lp)
CG!(pro,U,DwdagDw!,dpar,lp,dws,maxiter,tol) g5Dw!(pro,U,dpar.ct*dws.sp,mtwmdpar(dpar),dws,lp)
return pro
niter = CG!(pro,U,DwdagDw!,dpar,lp,dws,maxiter,tol)
end
return niter
end end

View file

@ -26,19 +26,19 @@ This structure contains information about the lattice being simulated. The param
- `N`: The number of dimensions - `N`: The number of dimensions
- `M`: The number of planes (i.e. \`\` N(N-1)/2 \`\`) - `M`: The number of planes (i.e. \`\` N(N-1)/2 \`\`)
- `B`: The boundary conditions in Euclidean time. Acceptable values are - `B`: The boundary conditions in Euclidean time. Acceptable values are
- `BC_PERIODIC`: Periodic boundary conditions - `BC_PERIODIC`: Periodic boundary conditions.
- `BC_SF_AFWB`: Schrödinger Funtional Aoki-Frezzoptti-Weisz Choice B. - `BC_SF_AFWB`: Schrödinger Functional Aoki-Frezzotti-Weisz Choice B.
- `BC_SF_ORBI`: Schrödinger Funtional orbifold constructions. - `BC_SF_ORBI`: Schrödinger Functional orbifold constructions.
- `BC_OPEN`: Open boundary conditions. - `BC_OPEN`: Open boundary conditions.
The structure conatins the following components: The structure contains the following components:
- `iL`: Tuple containing the lattice length in each dimension. - `iL`: Tuple containing the lattice length in each dimension.
- `plidx`: The directions of each plane - `plidx`: The directions of each plane.
- `blk`: The block size in each each dimension - `blk`: The block size in each each dimension.
- `rbk`: The number of blocks in each dimension - `rbk`: The number of blocks in each dimension.
- `bsz`: The number of points in each block - `bsz`: The number of points in each block.
- `rsz`: The number of blocks in the lattice - `rsz`: The number of blocks in the lattice.
- `ntw`: The twist tensor in each plane - `ntw`: The twist tensor in each plane.
""" """
struct SpaceParm{N,M,B,D} struct SpaceParm{N,M,B,D}
ndim::Int64 ndim::Int64

View file

@ -14,6 +14,7 @@ module Spinors
using ..Groups using ..Groups
import ..Groups.imm, ..Groups.mimm, ..Groups.norm, ..Groups.norm2, ..Groups.dot import ..Groups.imm, ..Groups.mimm, ..Groups.norm, ..Groups.norm2, ..Groups.dot
struct Spinor{NS,G} struct Spinor{NS,G}
s::NTuple{NS,G} s::NTuple{NS,G}
end end
@ -169,7 +170,7 @@ end
""" """
gpmul(pgamma{N,S}, g::G, a::Spinor) G <: Group gpmul(Pgamma{N,S}, g::G, a::Spinor) G <: Group
Returns ``g(1+s\\gamma_N)a`` Returns ``g(1+s\\gamma_N)a``
""" """
@ -226,7 +227,7 @@ end
end end
""" """
gdagpmul(pgamma{N,S}, g::G, a::Spinor) G <: Group gdagpmul(Pgamma{N,S}, g::G, a::Spinor) G <: Group
Returns ``g^+ (1+s\\gamma_N)a`` Returns ``g^+ (1+s\\gamma_N)a``
""" """
@ -284,33 +285,31 @@ end
# dummy structs for dispatch: # dummy structs for dispatch:
# Basis of \\Gamma_n # Basis of \\gamma_n
struct Gamma{N} struct Gamma{N}
end end
""" """
dmul(n::Int64, a::Spinor) dmul(Gamma{n}, a::Spinor)
Returns ``\\Gamma_n a`` Returns ``\\gamma_n a``. Indexing for Dirac basis ``\\gamma_n``:
indexing for Dirac basis ``\\Gamma_n``: 1 ``\\gamma_1``;
2 ``\\gamma_2``;
1 gamma1 3 ``\\gamma_3``;
2 gamma2 4 ``\\gamma_0``;
3 gamma3 5 ``\\gamma_5``;
4 gamma0 6 ``\\gamma_1 \\gamma_5``;
5 gamma5 7 ``\\gamma_2 \\gamma_5``;
6 gamma1 gamma5 8 ``\\gamma_3 \\gamma_5``;
7 gamma2 gamma5 9 ``\\gamma_0 \\gamma_5``;
8 gamma3 gamma5 10 ``\\sigma_{01}``;
9 gamma0 gamma5 11 ``\\sigma_{02}``;
10 sigma01 12 ``\\sigma_{03}``;
11 sigma02 13 ``\\sigma_{21}``;
12 sigma03 14 ``\\sigma_{32}``;
13 sigma21 15 ``\\sigma_{31}``;
14 sigma32 16 identity;
15 sigma31
16 identity
""" """
@inline dmul(::Type{Gamma{1}}, a::Spinor{NS,G}) where {NS,G} = Spinor{NS,G}((mimm(a.s[4]), mimm(a.s[3]), imm(a.s[2]), imm(a.s[1]))) @inline dmul(::Type{Gamma{1}}, a::Spinor{NS,G}) where {NS,G} = Spinor{NS,G}((mimm(a.s[4]), mimm(a.s[3]), imm(a.s[2]), imm(a.s[1])))

View file

@ -20,6 +20,19 @@ using ..MD
import Base.show import Base.show
"""
struct GaugeParm{T,G,N}
Structure containing the parameters of a pure gauge simulation. These are:
- beta: Type `T`. The bare coupling of the simulation.
- c0: Type `T`. LatticeGPU supports the simulation of gauge actions made of 1x1 Wilson Loops and 2x1 Wilson loops. The parameter c0 defines the coefficient on the simulation of the 1x1 loops. Some common choices are:
- c0=1: Wilson plaquette action.
- c0=5/3: Tree-level improved Lüscher-Weisz action.
- c0=3.648: Iwasaki gauge action.
- cG: Tuple (`T`, `T`). Boundary improvement parameters.
- ng: `Int64`. Rank of the gauge group.
- Ubnd: Boundary field for SF boundary conditions.
"""
struct GaugeParm{T,G,N} struct GaugeParm{T,G,N}
beta::T beta::T
c0::T c0::T
@ -63,6 +76,21 @@ function Base.show(io::IO, gp::GaugeParm{T, G, N}) where {T,G,N}
return nothing return nothing
end end
"""
struct YMworkspace{T}
Structure containing memory workspace that is reused by different routines in order to avoid allocating/deallocating time.
The parameter `T` represents the precision of the simulation (i.e. single/double). The structure contains the following components
- GRP: Group being simulated.
- ALG: Corresponding Algebra.
- PRC: Precision (i.e. `T`).
- frc1: Algebra field with natural indexing.
- frc2: Algebra field with natural indexing.
- mom: Algebra field with natural indexing.
- U1: Group field with natural indexing.
- cm: Complex field with lexicographic indexing.
- rm: Real field with lexicographic indexing.
"""
struct YMworkspace{T} struct YMworkspace{T}
GRP GRP
ALG ALG
@ -110,7 +138,11 @@ function Base.show(io::IO, ymws::YMworkspace)
return nothing return nothing
end end
"""
function ztwist(gp::GaugeParm{T,G}, lp::SpaceParm{N,M,B,D}[, ipl])
Returns the twist factor. If a plane index is passed, returns the twist factor as a Complex{T}. If this is not provided, returns a tuple, containing the factor of each plane.
"""
function ztwist(gp::GaugeParm{T,G}, lp::SpaceParm{N,M,B,D}) where {T,G,N,M,B,D} function ztwist(gp::GaugeParm{T,G}, lp::SpaceParm{N,M,B,D}) where {T,G,N,M,B,D}
function plnf(ipl) function plnf(ipl)
@ -133,10 +165,10 @@ include("YMfields.jl")
export randomize!, zero!, norm2 export randomize!, zero!, norm2
include("YMact.jl") include("YMact.jl")
export krnl_plaq!, force0_wilson! export krnl_plaq!, force_gauge, force_wilson
include("YMhmc.jl") include("YMhmc.jl")
export gauge_action, hamiltonian, plaquette, HMC!, OMF4! export gauge_action, hamiltonian, plaquette, HMC!, MD!
include("YMflow.jl") include("YMflow.jl")
export FlowIntr, flw, flw_adapt export FlowIntr, flw, flw_adapt
@ -147,6 +179,6 @@ include("YMsf.jl")
export sfcoupling, bndfield, setbndfield export sfcoupling, bndfield, setbndfield
include("YMio.jl") include("YMio.jl")
export import_lex64, import_cern64, import_bsfqcd, save_cnfg, read_cnfg export import_lex64, import_cern64, import_bsfqcd, save_cnfg, read_cnfg, read_gp
end end

View file

@ -9,7 +9,11 @@
### created: Mon Jul 12 18:31:19 2021 ### created: Mon Jul 12 18:31:19 2021
### ###
function krnl_impr!(plx, U::AbstractArray{T}, c0, c1, Ubnd::NTuple{NB,T}, cG, ztw, lp::SpaceParm{N,M,B,D}) where {T,NB,N,M,B,D}
##
## OPEN
##
function krnl_impr!(plx, U::AbstractArray{T}, c0, c1, Ubnd::NTuple{NB,T}, cG, ztw, lp::SpaceParm{N,M,BC_OPEN,D}) where {T,NB,N,M,D}
b = Int64(CUDA.threadIdx().x) b = Int64(CUDA.threadIdx().x)
r = Int64(CUDA.blockIdx().x) r = Int64(CUDA.blockIdx().x)
@ -21,7 +25,404 @@ function krnl_impr!(plx, U::AbstractArray{T}, c0, c1, Ubnd::NTuple{NB,T}, cG, zt
@inbounds begin @inbounds begin
for id1 in N:-1:1 for id1 in N:-1:1
bu1, ru1 = up((b, r), id1, lp) bu1, ru1 = up((b, r), id1, lp)
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) && (id1==N) TOBC = (id1==N)
for id2 = 1:id1-1
bu2, ru2 = up((b, r), id2, lp)
ipl = ipl + 1
TWP = (I[id1]==1) && (I[id2]==1)
TWH2 = TWP || ( (I[id1]==lp.iL[id1]) && (I[id2]==1) )
TWH3 = TWP || ( (I[id1]==1) && (I[id2]==lp.iL[id2]) )
# H2 staple
(b1, r1) = up((b,r), id1, lp)
(b2, r2) = up((b1,r1), id1, lp)
gb = U[b2,id2,r2]
(b2, r2) = up((b1,r1), id2, lp)
h2 = (U[b1,id1,r1]*gb)/U[b2,id1,r2]
# H3 staple
(b1, r1) = up((b,r), id2, lp)
(b2, r2) = up((b1,r1), id2, lp)
(b3, r3) = up((b1,r1), id1, lp)
gc = U[b3,id2,r3]
h3 = (U[b1,id2,r1]*U[b2,id1,r2])/gc
# END staples
ga = U[bu1,id2,ru1]
g2 = U[b,id2,r]\U[b,id1,r]
if ( (it == lp.iL[end]) || (it == 1) ) && !TOBC
S += 0.5*cG*(c0*tr(g2*ga/U[bu2,id1,ru2]) + c1*tr(g2*ga/h3) + c1*tr(g2*h2/U[bu2,id1,ru2]))
elseif (it == lp.iL[end]-1) && TOBC
S += c0*tr(g2*ga/U[bu2,id1,ru2]) + c1*tr(g2*ga/h3)
elseif (it == lp.iL[end]) && TOBC
nothing
else
if TWP
S += (ztw[ipl]*c0)*tr(g2*ga/U[bu2,id1,ru2])
else
S += c0*tr(g2*ga/U[bu2,id1,ru2])
end
if TWH2
S += (ztw[ipl]*c1)*tr(g2*h2/U[bu2,id1,ru2])
else
S += c1*tr(g2*h2/U[bu2,id1,ru2])
end
if TWH3
S += (ztw[ipl]*c1)*tr(g2*ga/h3)
else
S += c1*tr(g2*ga/h3)
end
end
end
end
plx[I] = S
end
return nothing
end
function krnl_plaq!(plx, U::AbstractArray{T}, Ubnd, cG, ztw, lp::SpaceParm{N,M,BC_OPEN,D}) where {T,N,M,D}
@inbounds begin
b = Int64(CUDA.threadIdx().x)
r = Int64(CUDA.blockIdx().x)
I = point_coord((b,r), lp)
it = I[N]
S = zero(eltype(plx))
ipl = 0
for id1 in N:-1:1
bu1, ru1 = up((b, r), id1, lp)
TOBC = (id1==N)
for id2 = 1:id1-1
bu2, ru2 = up((b, r), id2, lp)
ipl = ipl + 1
TWP = (I[id1]==1) && (I[id2]==1)
gt1 = U[bu1,id2,ru1]
if ( (it == lp.iL[end]) || (it == 1)) && !TOBC
S += 0.5*cG*(tr(U[b,id1,r]*gt1 / (U[b,id2,r]*U[bu2,id1,ru2])))
elseif (it == lp.iL[end]) && TOBC
nothing
else
if TWP
S += ztw[ipl]*tr(U[b,id1,r]*gt1 / (U[b,id2,r]*U[bu2,id1,ru2]))
else
S += tr(U[b,id1,r]*gt1 / (U[b,id2,r]*U[bu2,id1,ru2]))
end
end
end
end
plx[I] = S
end
return nothing
end
function krnl_force_wilson_pln!(frc1, frc2, U::AbstractArray{T}, Ubnd, cG, ztw, ipl, lp::SpaceParm{N,M,BC_OPEN,D}) where {T,N,M,D}
b = Int64(CUDA.threadIdx().x)
r = Int64(CUDA.blockIdx().x)
I = point_coord((b,r), lp)
it = I[N]
@inbounds begin
id1, id2 = lp.plidx[ipl]
bu1, ru1 = up((b, r), id1, lp)
bu2, ru2 = up((b, r), id2, lp)
TWP = (I[id1]==1)&&(I[id2]==1)
TOBC = (id1 == N)
gt1 = U[bu1,id2,ru1]
g1 = gt1/U[bu2,id1,ru2]
g2 = U[b,id2,r]\U[b,id1,r]
if !TOBC && ( (it == 1) || (it == lp.iL[end]) )
X = 0.5*cG*projalg(U[b,id1,r]*g1/U[b,id2,r])
frc1[b ,id1, r ] -= X
frc1[b ,id2, r ] += X
frc2[bu1,id2,ru1] -= 0.5*cG*projalg(g1*g2)
frc2[bu2,id1,ru2] += 0.5*cG*projalg(g2*g1)
elseif TOBC && (it == lp.iL[end])
nothing
else
if TWP
X = projalg(ztw,U[b,id1,r]*g1/U[b,id2,r])
frc2[bu1,id2,ru1] -= projalg(ztw,g1*g2)
frc2[bu2,id1,ru2] += projalg(ztw,g2*g1)
else
X = projalg(U[b,id1,r]*g1/U[b,id2,r])
frc2[bu1,id2,ru1] -= projalg(g1*g2)
frc2[bu2,id1,ru2] += projalg(g2*g1)
end
frc1[b ,id1, r ] -= X
frc1[b ,id2, r ] += X
end
end
return nothing
end
function krnl_force_impr_pln!(frc1, frc2, U::AbstractArray{T}, c0, c1, Ubnd, cG, ztw, ipl, lp::SpaceParm{N,M,BC_OPEN,D}) where {T,N,M,D}
b = Int64(CUDA.threadIdx().x)
r = Int64(CUDA.blockIdx().x)
I = point_coord((b,r), lp)
it = I[N]
@inbounds begin
id1, id2 = lp.plidx[ipl]
bu1, ru1 = up((b, r), id1, lp)
bu2, ru2 = up((b, r), id2, lp)
TOBC = (id1 == N)
TWP = (I[id1]==1) && (I[id2]==1)
TWH1 = TWP || ( (I[id1]==1) && (I[id2]==2) )
TWH2 = TWP || ( (I[id1]==lp.iL[id1]) && (I[id2]==1) )
TWH3 = TWP || ( (I[id1]==1) && (I[id2]==lp.iL[id2]) )
TWH4 = TWP || ( (I[id1]==2) && (I[id2]==1) )
# H1 staple
(b1, r1) = dw((b,r), id2, lp)
(b2, r2) = up((b1,r1), id1, lp)
gc = U[b2,id2,r2]
h1 = (U[b1,id2,r1]\U[b1,id1,r1])*gc
# H2 staple
(b1, r1) = up((b,r), id1, lp)
(b2, r2) = up((b1,r1), id1, lp)
gb = U[b2,id2,r2]
(b2, r2) = up((b1,r1), id2, lp)
h2 = (U[b1,id1,r1]*gb)/U[b2,id1,r2]
# H3 staple
(b1, r1) = up((b,r), id2, lp)
(b2, r2) = up((b1,r1), id2, lp)
(b3, r3) = up((b1,r1), id1, lp)
gc = U[b3,id2,r3]
h3 = (U[b1,id2,r1]*U[b2,id1,r2])/gc
# H4 staple
(b1, r1) = dw((b,r), id1, lp)
(b2, r2) = up((b1,r1), id2, lp)
h4 = (U[b1,id1,r1]\U[b1,id2,r1])*U[b2,id1,r2]
# END staples
ga = U[bu1,id2,ru1]
g1 = ga/U[bu2,id1,ru2]
g2 = U[b,id2,r]\U[b,id1,r]
if !TOBC && ( (it == 1) || (it == lp.iL[end]) )
X = 0.5*cG*(c0*projalg(U[b,id1,r]*g1/U[b,id2,r]) + c1*projalg(U[b,id1,r]*h2/(U[b,id2,r]*U[bu2,id1,ru2])) + c1*projalg(U[b,id1,r]*ga/(U[b,id2,r]*h3)) )
frc1[b,id1,r] -= X + 0.5*cG*c1*projalg(U[b,id1,r]*g1/h4)
frc1[b,id2,r] += X + 0.5*cG*c1*projalg(h1*g1/U[b,id2,r])
frc2[bu1,id2,ru1] -= 0.5*cG*c0*projalg(g1*g2)
frc2[bu2,id1,ru2] += 0.5*cG*c0*projalg(g2*g1)
frc2[bu1,id2,ru1] -= 0.5*cG*c1*projalg((g1/U[b,id2,r])*h1)
frc2[bu2,id1,ru2] += 0.5*cG*c1*projalg((U[b,id2,r]\h1)*g1)
frc2[bu2,id1,ru2] += 0.5*cG*c1*projalg(g2*h2/U[bu2,id1,ru2])
frc2[bu1,id2,ru1] -= 0.5*cG*c1*projalg((ga/h3)*g2)
frc2[bu1,id2,ru1] -= 0.5*cG*c1*projalg((g1/h4)*U[b,id1,r])
frc2[bu2,id1,ru2] += 0.5*cG*c1*projalg(h4\U[b,id1,r]*g1)
elseif TOBC && (it == lp.iL[end])
nothing
elseif TOBC && (it == 1)
X = c0*projalg(U[b,id1,r]*g1/U[b,id2,r]) + c1*projalg(U[b,id1,r]*h2/(U[b,id2,r]*U[bu2,id1,ru2])) + c1*projalg(U[b,id1,r]*ga/(U[b,id2,r]*h3))
frc1[b,id1,r] -= X
frc1[b,id2,r] += X + c1*projalg(h1*g1/U[b,id2,r])
frc2[bu1,id2,ru1] -= c0*projalg(g1*g2)
frc2[bu2,id1,ru2] += c0*projalg(g2*g1)
frc2[bu1,id2,ru1] -= c1*projalg((g1/U[b,id2,r])*h1)
frc2[bu2,id1,ru2] += c1*projalg((U[b,id2,r]\h1)*g1)
frc2[bu2,id1,ru2] += c1*projalg(g2*h2/U[bu2,id1,ru2])
frc2[bu1,id2,ru1] -= c1*projalg((ga/h3)*g2)
elseif TOBC && (it == (lp.iL[end]-1) )
X = c0*projalg(U[b,id1,r]*g1/U[b,id2,r]) + c1*projalg(U[b,id1,r]*ga/(U[b,id2,r]*h3))
frc1[b,id1,r] -= X + c1*projalg(U[b,id1,r]*g1/h4)
frc1[b,id2,r] += X + c1*projalg(h1*g1/U[b,id2,r])
frc2[bu1,id2,ru1] -= c0*projalg(g1*g2)
frc2[bu2,id1,ru2] += c0*projalg(g2*g1)
frc2[bu1,id2,ru1] -= c1*projalg((g1/U[b,id2,r])*h1)
frc2[bu2,id1,ru2] += c1*projalg((U[b,id2,r]\h1)*g1)
frc2[bu1,id2,ru1] -= c1*projalg((ga/h3)*g2)
frc2[bu1,id2,ru1] -= c1*projalg((g1/h4)*U[b,id1,r])
frc2[bu2,id1,ru2] += c1*projalg(h4\U[b,id1,r]*g1)
else
if TWP
X = projalg(c0*ztw,U[b,id1,r]*g1/U[b,id2,r])
frc2[bu1,id2,ru1] -= projalg(c0*ztw,g1*g2)
frc2[bu2,id1,ru2] += projalg(c0*ztw,g2*g1)
else
X = c0*projalg(U[b,id1,r]*g1/U[b,id2,r])
frc2[bu1,id2,ru1] -= c0*projalg(g1*g2)
frc2[bu2,id1,ru2] += c0*projalg(g2*g1)
end
if TWH1
frc1[b,id2,r] += projalg(ztw*c1,h1*g1/U[b,id2,r])
frc2[bu1,id2,ru1] -= projalg(ztw*c1,(g1/U[b,id2,r])*h1)
frc2[bu2,id1,ru2] += projalg(ztw*c1,(U[b,id2,r]\h1)*g1)
else
frc1[b,id2,r] += c1*projalg(h1*g1/U[b,id2,r])
frc2[bu1,id2,ru1] -= c1*projalg((g1/U[b,id2,r])*h1)
frc2[bu2,id1,ru2] += c1*projalg((U[b,id2,r]\h1)*g1)
end
if TWH2
X += projalg(ztw*c1,U[b,id1,r]*h2/(U[b,id2,r]*U[bu2,id1,ru2]))
frc2[bu2,id1,ru2] += projalg(ztw*c1,g2*h2/U[bu2,id1,ru2])
else
X += c1*projalg(U[b,id1,r]*h2/(U[b,id2,r]*U[bu2,id1,ru2]))
frc2[bu2,id1,ru2] += c1*projalg(g2*h2/U[bu2,id1,ru2])
end
if TWH3
X += projalg(ztw*c1,U[b,id1,r]*ga/(U[b,id2,r]*h3))
frc2[bu1,id2,ru1] -= projalg(ztw*c1,(ga/h3)*g2)
else
X += c1*projalg(U[b,id1,r]*ga/(U[b,id2,r]*h3))
frc2[bu1,id2,ru1] -= c1*projalg((ga/h3)*g2)
end
if TWH4
frc1[b,id1,r] -= projalg(ztw*c1,U[b,id1,r]*g1/h4)
frc2[bu1,id2,ru1] -= projalg(ztw*c1,(g1/h4)*U[b,id1,r])
frc2[bu2,id1,ru2] += projalg(ztw*c1,h4\U[b,id1,r]*g1)
else
frc1[b,id1,r] -= c1*projalg(U[b,id1,r]*g1/h4)
frc2[bu1,id2,ru1] -= c1*projalg((g1/h4)*U[b,id1,r])
frc2[bu2,id1,ru2] += c1*projalg(h4\U[b,id1,r]*g1)
end
frc1[b,id1,r] -= X
frc1[b,id2,r] += X
end
end
return nothing
end
##
## SF
##
function krnl_impr!(plx, U::AbstractArray{T}, c0, c1, Ubnd::NTuple{NB,T}, cG, ztw, lp::SpaceParm{N,M,BC_SF_ORBI,D}) where {T,NB,N,M,D}
b = Int64(CUDA.threadIdx().x)
r = Int64(CUDA.blockIdx().x)
I = point_coord((b,r), lp)
it = I[N]
ipl = 0
S = zero(eltype(plx))
@inbounds begin
for id1 in N:-1:1
bu1, ru1 = up((b, r), id1, lp)
SFBC = (id1==N)
for id2 = 1:id1-1
bu2, ru2 = up((b, r), id2, lp)
ipl = ipl + 1
TWP = (I[id1]==1) && (I[id2]==1)
TWH2 = TWP || ( (I[id1]==lp.iL[id1]) && (I[id2]==1) )
TWH3 = TWP || ( (I[id1]==1) && (I[id2]==lp.iL[id2]) )
# H2 staple
(b1, r1) = up((b,r), id1, lp)
(b2, r2) = up((b1,r1), id1, lp)
if SFBC && (it == lp.iL[end]-1)
gb = Ubnd[id2]
else
gb = U[b2,id2,r2]
end
(b2, r2) = up((b1,r1), id2, lp)
h2 = (U[b1,id1,r1]*gb)/U[b2,id1,r2]
# H3 staple
(b1, r1) = up((b,r), id2, lp)
(b2, r2) = up((b1,r1), id2, lp)
(b3, r3) = up((b1,r1), id1, lp)
if SFBC && (it == lp.iL[end])
gc = Ubnd[id2]
else
gc = U[b3,id2,r3]
end
h3 = (U[b1,id2,r1]*U[b2,id1,r2])/gc
# END staples
if SFBC && (it == lp.iL[end])
ga = Ubnd[id2]
else
ga = U[bu1,id2,ru1]
end
g2 = U[b,id2,r]\U[b,id1,r]
if (it == lp.iL[end]) && SFBC
S += cG*(c0*tr(g2*ga/U[bu2,id1,ru2])) + (c1)*tr(g2*ga/h3) + (c1/2)*tr((g2*ga/U[bu2,id1,ru2])*g2*ga/U[bu2,id1,ru2])
elseif (it == 1) && SFBC
S += cG*(c0*tr(g2*ga/U[bu2,id1,ru2])) + (c1)*tr(g2*ga/h3) + c1*tr(g2*h2/U[bu2,id1,ru2]) + (c1/2)*tr((g2*ga/U[bu2,id1,ru2])*g2*ga/U[bu2,id1,ru2])
else
if TWP
S += (ztw[ipl]*c0)*tr(g2*ga/U[bu2,id1,ru2])
else
S += c0*tr(g2*ga/U[bu2,id1,ru2])
end
if TWH2
S += (ztw[ipl]*c1)*tr(g2*h2/U[bu2,id1,ru2])
else
S += c1*tr(g2*h2/U[bu2,id1,ru2])
end
if TWH3
S += (ztw[ipl]*c1)*tr(g2*ga/h3)
else
S += c1*tr(g2*ga/h3)
end
end
end
end
plx[I] = S
end
return nothing
end
function krnl_impr!(plx, U::AbstractArray{T}, c0, c1, Ubnd::NTuple{NB,T}, cG, ztw, lp::SpaceParm{N,M,BC_SF_AFWB,D}) where {T,NB,N,M,D}
b = Int64(CUDA.threadIdx().x)
r = Int64(CUDA.blockIdx().x)
I = point_coord((b,r), lp)
it = I[N]
ipl = 0
S = zero(eltype(plx))
@inbounds begin
for id1 in N:-1:1
bu1, ru1 = up((b, r), id1, lp)
SFBC = (id1==N)
for id2 = 1:id1-1 for id2 = 1:id1-1
bu2, ru2 = up((b, r), id2, lp) bu2, ru2 = up((b, r), id2, lp)
@ -95,7 +496,7 @@ function krnl_impr!(plx, U::AbstractArray{T}, c0, c1, Ubnd::NTuple{NB,T}, cG, zt
return nothing return nothing
end end
function krnl_plaq!(plx, U::AbstractArray{T}, Ubnd, cG, ztw, lp::SpaceParm{N,M,B,D}) where {T,N,M,B,D} function krnl_plaq!(plx, U::AbstractArray{T}, Ubnd, cG, ztw, lp::Union{SpaceParm{N,M,BC_SF_ORBI,D},SpaceParm{N,M,BC_SF_AFWB,D}}) where {T,N,M,D}
@inbounds begin @inbounds begin
@ -103,8 +504,7 @@ function krnl_plaq!(plx, U::AbstractArray{T}, Ubnd, cG, ztw, lp::SpaceParm{N,M,B
r = Int64(CUDA.blockIdx().x) r = Int64(CUDA.blockIdx().x)
I = point_coord((b,r), lp) I = point_coord((b,r), lp)
it = I[N] it = I[N]
IBND = ( ( (B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) && IBND = ( (it == 1) || (it == lp.iL[end]))
( (it == 1) || (it == lp.iL[end])) )
S = zero(eltype(plx)) S = zero(eltype(plx))
ipl = 0 ipl = 0
@ -141,7 +541,7 @@ function krnl_plaq!(plx, U::AbstractArray{T}, Ubnd, cG, ztw, lp::SpaceParm{N,M,B
return nothing return nothing
end end
function krnl_force_wilson_pln!(frc1, frc2, U::AbstractArray{T}, Ubnd, cG, ztw, ipl, lp::SpaceParm{N,M,B,D}) where {T,N,M,B,D} function krnl_force_wilson_pln!(frc1, frc2, U::AbstractArray{T}, Ubnd, cG, ztw, ipl, lp::Union{SpaceParm{N,M,BC_SF_ORBI,D},SpaceParm{N,M,BC_SF_AFWB,D}}) where {T,N,M,D}
b = Int64(CUDA.threadIdx().x) b = Int64(CUDA.threadIdx().x)
r = Int64(CUDA.blockIdx().x) r = Int64(CUDA.blockIdx().x)
@ -154,7 +554,7 @@ function krnl_force_wilson_pln!(frc1, frc2, U::AbstractArray{T}, Ubnd, cG, ztw,
bu2, ru2 = up((b, r), id2, lp) bu2, ru2 = up((b, r), id2, lp)
TWP = (I[id1]==1)&&(I[id2]==1) TWP = (I[id1]==1)&&(I[id2]==1)
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) && (id1 == N) SFBC = (id1 == N)
if SFBC && (it == lp.iL[end]) if SFBC && (it == lp.iL[end])
gt1 = Ubnd[id2] gt1 = Ubnd[id2]
@ -195,7 +595,7 @@ function krnl_force_wilson_pln!(frc1, frc2, U::AbstractArray{T}, Ubnd, cG, ztw,
return nothing return nothing
end end
function krnl_force_impr_pln!(frc1, frc2, U::AbstractArray{T}, c0, c1, Ubnd, cG, ztw, ipl, lp::SpaceParm{N,M,B,D}) where {T,N,M,B,D} function krnl_force_impr_pln!(frc1, frc2, U::AbstractArray{T}, c0, c1, Ubnd, cG, ztw, ipl, lp::SpaceParm{N,M,BC_SF_ORBI,D}) where {T,N,M,D}
b = Int64(CUDA.threadIdx().x) b = Int64(CUDA.threadIdx().x)
r = Int64(CUDA.blockIdx().x) r = Int64(CUDA.blockIdx().x)
@ -207,7 +607,146 @@ function krnl_force_impr_pln!(frc1, frc2, U::AbstractArray{T}, c0, c1, Ubnd, cG,
bu1, ru1 = up((b, r), id1, lp) bu1, ru1 = up((b, r), id1, lp)
bu2, ru2 = up((b, r), id2, lp) bu2, ru2 = up((b, r), id2, lp)
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) && (id1 == N) SFBC = (id1 == N)
TWP = (I[id1]==1) && (I[id2]==1)
TWH1 = TWP || ( (I[id1]==1) && (I[id2]==2) )
TWH2 = TWP || ( (I[id1]==lp.iL[id1]) && (I[id2]==1) )
TWH3 = TWP || ( (I[id1]==1) && (I[id2]==lp.iL[id2]) )
TWH4 = TWP || ( (I[id1]==2) && (I[id2]==1) )
# H1 staple
(b1, r1) = dw((b,r), id2, lp)
(b2, r2) = up((b1,r1), id1, lp)
if SFBC && (it == lp.iL[end])
gc = Ubnd[id2]
else
gc = U[b2,id2,r2]
end
h1 = (U[b1,id2,r1]\U[b1,id1,r1])*gc
# H2 staple
(b1, r1) = up((b,r), id1, lp)
(b2, r2) = up((b1,r1), id1, lp)
if SFBC && (it == lp.iL[end]-1)
gb = Ubnd[id2]
else
gb = U[b2,id2,r2]
end
(b2, r2) = up((b1,r1), id2, lp)
h2 = (U[b1,id1,r1]*gb)/U[b2,id1,r2]
# H3 staple
(b1, r1) = up((b,r), id2, lp)
(b2, r2) = up((b1,r1), id2, lp)
(b3, r3) = up((b1,r1), id1, lp)
if SFBC && (it == lp.iL[end])
gc = Ubnd[id2]
else
gc = U[b3,id2,r3]
end
h3 = (U[b1,id2,r1]*U[b2,id1,r2])/gc
# H4 staple
(b1, r1) = dw((b,r), id1, lp)
(b2, r2) = up((b1,r1), id2, lp)
h4 = (U[b1,id1,r1]\U[b1,id2,r1])*U[b2,id1,r2]
# END staples
if SFBC && (it == lp.iL[end])
ga = Ubnd[id2]
else
ga = U[bu1,id2,ru1]
end
g1 = ga/U[bu2,id1,ru2]
g2 = U[b,id2,r]\U[b,id1,r]
if SFBC && (it == 1)
X = (cG*c0)*projalg(U[b,id1,r]*g1/U[b,id2,r]) + c1*projalg(U[b,id1,r]*h2/(U[b,id2,r]*U[bu2,id1,ru2])) +
(c1)*projalg(U[b,id1,r]*ga/(U[b,id2,r]*h3)) + c1*(projalg(U[b,id1,r]*g1*g2*g1/U[b,id2,r]))
frc1[b,id1,r] -= X
frc2[bu1,id2,ru1] -= (cG*c0)*projalg(g1*g2) + c1*projalg((ga/h3)*g2) +
c1*projalg((g1/U[b,id2,r])*h1) + c1*projalg(g1*g2*g1*g2)
frc2[bu2,id1,ru2] += (cG*c0)*projalg(g2*g1) + c1*projalg((U[b,id2,r]\h1)*g1) +
c1*projalg(g2*h2/U[bu2,id1,ru2]) + c1*projalg(g2*g1*g2*g1)
elseif SFBC && (it == lp.iL[end])
X = cG*c0*projalg(U[b,id1,r]*g1/U[b,id2,r]) +
c1 * (projalg(U[b,id1,r]*ga/(U[b,id2,r]*h3))) + c1*(projalg(U[b,id1,r]*g1*g2*g1/U[b,id2,r]))
frc1[b,id1,r] -= X + c1*projalg(U[b,id1,r]*g1/h4)
frc1[b,id2,r] += X + c1*projalg(h1*g1/U[b,id2,r])
frc2[bu2,id1,ru2] += (cG*c0)*projalg(g2*g1) + c1*projalg((U[b,id2,r]\h1)*g1) +
c1 * projalg(h4\U[b,id1,r]*g1) + c1* projalg(g2*g1*g2*g1)
else
if TWP
X = projalg(c0*ztw,U[b,id1,r]*g1/U[b,id2,r])
frc2[bu1,id2,ru1] -= projalg(c0*ztw,g1*g2)
frc2[bu2,id1,ru2] += projalg(c0*ztw,g2*g1)
else
X = c0*projalg(U[b,id1,r]*g1/U[b,id2,r])
frc2[bu1,id2,ru1] -= c0*projalg(g1*g2)
frc2[bu2,id1,ru2] += c0*projalg(g2*g1)
end
if TWH1
frc1[b,id2,r] += projalg(ztw*c1,h1*g1/U[b,id2,r])
frc2[bu1,id2,ru1] -= projalg(ztw*c1,(g1/U[b,id2,r])*h1)
frc2[bu2,id1,ru2] += projalg(ztw*c1,(U[b,id2,r]\h1)*g1)
else
frc1[b,id2,r] += c1*projalg(h1*g1/U[b,id2,r])
frc2[bu1,id2,ru1] -= c1*projalg((g1/U[b,id2,r])*h1)
frc2[bu2,id1,ru2] += c1*projalg((U[b,id2,r]\h1)*g1)
end
if TWH2
X += projalg(ztw*c1,U[b,id1,r]*h2/(U[b,id2,r]*U[bu2,id1,ru2]))
frc2[bu2,id1,ru2] += projalg(ztw*c1,g2*h2/U[bu2,id1,ru2])
else
X += c1*projalg(U[b,id1,r]*h2/(U[b,id2,r]*U[bu2,id1,ru2]))
frc2[bu2,id1,ru2] += c1*projalg(g2*h2/U[bu2,id1,ru2])
end
if TWH3
X += projalg(ztw*c1,U[b,id1,r]*ga/(U[b,id2,r]*h3))
frc2[bu1,id2,ru1] -= projalg(ztw*c1,(ga/h3)*g2)
else
X += c1*projalg(U[b,id1,r]*ga/(U[b,id2,r]*h3))
frc2[bu1,id2,ru1] -= c1*projalg((ga/h3)*g2)
end
if TWH4
frc1[b,id1,r] -= projalg(ztw*c1,U[b,id1,r]*g1/h4)
frc2[bu1,id2,ru1] -= projalg(ztw*c1,(g1/h4)*U[b,id1,r])
frc2[bu2,id1,ru2] += projalg(ztw*c1,h4\U[b,id1,r]*g1)
else
frc1[b,id1,r] -= c1*projalg(U[b,id1,r]*g1/h4)
frc2[bu1,id2,ru1] -= c1*projalg((g1/h4)*U[b,id1,r])
frc2[bu2,id1,ru2] += c1*projalg(h4\U[b,id1,r]*g1)
end
frc1[b,id1,r] -= X
frc1[b,id2,r] += X
end
end
return nothing
end
function krnl_force_impr_pln!(frc1, frc2, U::AbstractArray{T}, c0, c1, Ubnd, cG, ztw, ipl, lp::SpaceParm{N,M,BC_SF_AFWB,D}) where {T,N,M,D}
b = Int64(CUDA.threadIdx().x)
r = Int64(CUDA.blockIdx().x)
I = point_coord((b,r), lp)
it = I[N]
@inbounds begin
id1, id2 = lp.plidx[ipl]
bu1, ru1 = up((b, r), id1, lp)
bu2, ru2 = up((b, r), id2, lp)
SFBC = (id1 == N)
TWP = (I[id1]==1) && (I[id2]==1) TWP = (I[id1]==1) && (I[id2]==1)
TWH1 = TWP || ( (I[id1]==1) && (I[id2]==2) ) TWH1 = TWP || ( (I[id1]==1) && (I[id2]==2) )
TWH2 = TWP || ( (I[id1]==lp.iL[id1]) && (I[id2]==1) ) TWH2 = TWP || ( (I[id1]==lp.iL[id1]) && (I[id2]==1) )
@ -334,10 +873,257 @@ function krnl_force_impr_pln!(frc1, frc2, U::AbstractArray{T}, c0, c1, Ubnd, cG,
return nothing return nothing
end end
"""
function force_wilson(ymws::YMworkspace, U, lp::SpaceParm)
Computes the force deriving from the Wilson plaquette action, without
##
## PERIODIC
##
function krnl_impr!(plx, U::AbstractArray{T}, c0, c1, Ubnd::NTuple{NB,T}, cG, ztw, lp::SpaceParm{N,M,BC_PERIODIC,D}) where {T,NB,N,M,D}
b = Int64(CUDA.threadIdx().x)
r = Int64(CUDA.blockIdx().x)
I = point_coord((b,r), lp)
it = I[N]
ipl = 0
S = zero(eltype(plx))
@inbounds begin
for id1 in N:-1:1
bu1, ru1 = up((b, r), id1, lp)
for id2 = 1:id1-1
bu2, ru2 = up((b, r), id2, lp)
ipl = ipl + 1
TWP = (I[id1]==1) && (I[id2]==1)
TWH2 = TWP || ( (I[id1]==lp.iL[id1]) && (I[id2]==1) )
TWH3 = TWP || ( (I[id1]==1) && (I[id2]==lp.iL[id2]) )
# H2 staple
(b1, r1) = up((b,r), id1, lp)
(b2, r2) = up((b1,r1), id1, lp)
gb = U[b2,id2,r2]
(b2, r2) = up((b1,r1), id2, lp)
h2 = (U[b1,id1,r1]*gb)/U[b2,id1,r2]
# H3 staple
(b1, r1) = up((b,r), id2, lp)
(b2, r2) = up((b1,r1), id2, lp)
(b3, r3) = up((b1,r1), id1, lp)
gc = U[b3,id2,r3]
h3 = (U[b1,id2,r1]*U[b2,id1,r2])/gc
# END staples
ga = U[bu1,id2,ru1]
g2 = U[b,id2,r]\U[b,id1,r]
if TWP
S += (ztw[ipl]*c0)*tr(g2*ga/U[bu2,id1,ru2])
else
S += c0*tr(g2*ga/U[bu2,id1,ru2])
end
if TWH2
S += (ztw[ipl]*c1)*tr(g2*h2/U[bu2,id1,ru2])
else
S += c1*tr(g2*h2/U[bu2,id1,ru2])
end
if TWH3
S += (ztw[ipl]*c1)*tr(g2*ga/h3)
else
S += c1*tr(g2*ga/h3)
end
end
end
plx[I] = S
end
return nothing
end
function krnl_plaq!(plx, U::AbstractArray{T}, Ubnd, cG, ztw, lp::SpaceParm{N,M,BC_PERIODIC,D}) where {T,N,M,D}
@inbounds begin
b = Int64(CUDA.threadIdx().x)
r = Int64(CUDA.blockIdx().x)
I = point_coord((b,r), lp)
it = I[N]
S = zero(eltype(plx))
ipl = 0
for id1 in N:-1:1
bu1, ru1 = up((b, r), id1, lp)
for id2 = 1:id1-1
bu2, ru2 = up((b, r), id2, lp)
ipl = ipl + 1
TWP = (I[id1]==1) && (I[id2]==1)
gt1 = U[bu1,id2,ru1]
if TWP
S += ztw[ipl]*tr(U[b,id1,r]*gt1 / (U[b,id2,r]*U[bu2,id1,ru2]))
else
S += tr(U[b,id1,r]*gt1 / (U[b,id2,r]*U[bu2,id1,ru2]))
end
end
end
plx[I] = S
end
return nothing
end
function krnl_force_wilson_pln!(frc1, frc2, U::AbstractArray{T}, Ubnd, cG, ztw, ipl, lp::SpaceParm{N,M,BC_PERIODIC,D}) where {T,N,M,D}
b = Int64(CUDA.threadIdx().x)
r = Int64(CUDA.blockIdx().x)
I = point_coord((b,r), lp)
it = I[N]
@inbounds begin
id1, id2 = lp.plidx[ipl]
bu1, ru1 = up((b, r), id1, lp)
bu2, ru2 = up((b, r), id2, lp)
TWP = (I[id1]==1)&&(I[id2]==1)
gt1 = U[bu1,id2,ru1]
g1 = gt1/U[bu2,id1,ru2]
g2 = U[b,id2,r]\U[b,id1,r]
if TWP
X = projalg(ztw,U[b,id1,r]*g1/U[b,id2,r])
frc2[bu1,id2,ru1] -= projalg(ztw,g1*g2)
frc2[bu2,id1,ru2] += projalg(ztw,g2*g1)
else
X = projalg(U[b,id1,r]*g1/U[b,id2,r])
frc2[bu1,id2,ru1] -= projalg(g1*g2)
frc2[bu2,id1,ru2] += projalg(g2*g1)
end
frc1[b ,id1, r ] -= X
frc1[b ,id2, r ] += X
end
return nothing
end
function krnl_force_impr_pln!(frc1, frc2, U::AbstractArray{T}, c0, c1, Ubnd, cG, ztw, ipl, lp::SpaceParm{N,M,BC_PERIODIC,D}) where {T,N,M,D}
b = Int64(CUDA.threadIdx().x)
r = Int64(CUDA.blockIdx().x)
I = point_coord((b,r), lp)
it = I[N]
@inbounds begin
id1, id2 = lp.plidx[ipl]
bu1, ru1 = up((b, r), id1, lp)
bu2, ru2 = up((b, r), id2, lp)
TWP = (I[id1]==1) && (I[id2]==1)
TWH1 = TWP || ( (I[id1]==1) && (I[id2]==2) )
TWH2 = TWP || ( (I[id1]==lp.iL[id1]) && (I[id2]==1) )
TWH3 = TWP || ( (I[id1]==1) && (I[id2]==lp.iL[id2]) )
TWH4 = TWP || ( (I[id1]==2) && (I[id2]==1) )
# H1 staple
(b1, r1) = dw((b,r), id2, lp)
(b2, r2) = up((b1,r1), id1, lp)
gc = U[b2,id2,r2]
h1 = (U[b1,id2,r1]\U[b1,id1,r1])*gc
# H2 staple
(b1, r1) = up((b,r), id1, lp)
(b2, r2) = up((b1,r1), id1, lp)
gb = U[b2,id2,r2]
(b2, r2) = up((b1,r1), id2, lp)
h2 = (U[b1,id1,r1]*gb)/U[b2,id1,r2]
# H3 staple
(b1, r1) = up((b,r), id2, lp)
(b2, r2) = up((b1,r1), id2, lp)
(b3, r3) = up((b1,r1), id1, lp)
gc = U[b3,id2,r3]
h3 = (U[b1,id2,r1]*U[b2,id1,r2])/gc
# H4 staple
(b1, r1) = dw((b,r), id1, lp)
(b2, r2) = up((b1,r1), id2, lp)
h4 = (U[b1,id1,r1]\U[b1,id2,r1])*U[b2,id1,r2]
# END staples
ga = U[bu1,id2,ru1]
g1 = ga/U[bu2,id1,ru2]
g2 = U[b,id2,r]\U[b,id1,r]
if TWP
X = projalg(c0*ztw,U[b,id1,r]*g1/U[b,id2,r])
frc2[bu1,id2,ru1] -= projalg(c0*ztw,g1*g2)
frc2[bu2,id1,ru2] += projalg(c0*ztw,g2*g1)
else
X = c0*projalg(U[b,id1,r]*g1/U[b,id2,r])
frc2[bu1,id2,ru1] -= c0*projalg(g1*g2)
frc2[bu2,id1,ru2] += c0*projalg(g2*g1)
end
if TWH1
frc1[b,id2,r] += projalg(ztw*c1,h1*g1/U[b,id2,r])
frc2[bu1,id2,ru1] -= projalg(ztw*c1,(g1/U[b,id2,r])*h1)
frc2[bu2,id1,ru2] += projalg(ztw*c1,(U[b,id2,r]\h1)*g1)
else
frc1[b,id2,r] += c1*projalg(h1*g1/U[b,id2,r])
frc2[bu1,id2,ru1] -= c1*projalg((g1/U[b,id2,r])*h1)
frc2[bu2,id1,ru2] += c1*projalg((U[b,id2,r]\h1)*g1)
end
if TWH2
X += projalg(ztw*c1,U[b,id1,r]*h2/(U[b,id2,r]*U[bu2,id1,ru2]))
frc2[bu2,id1,ru2] += projalg(ztw*c1,g2*h2/U[bu2,id1,ru2])
else
X += c1*projalg(U[b,id1,r]*h2/(U[b,id2,r]*U[bu2,id1,ru2]))
frc2[bu2,id1,ru2] += c1*projalg(g2*h2/U[bu2,id1,ru2])
end
if TWH3
X += projalg(ztw*c1,U[b,id1,r]*ga/(U[b,id2,r]*h3))
frc2[bu1,id2,ru1] -= projalg(ztw*c1,(ga/h3)*g2)
else
X += c1*projalg(U[b,id1,r]*ga/(U[b,id2,r]*h3))
frc2[bu1,id2,ru1] -= c1*projalg((ga/h3)*g2)
end
if TWH4
frc1[b,id1,r] -= projalg(ztw*c1,U[b,id1,r]*g1/h4)
frc2[bu1,id2,ru1] -= projalg(ztw*c1,(g1/h4)*U[b,id1,r])
frc2[bu2,id1,ru2] += projalg(ztw*c1,h4\U[b,id1,r]*g1)
else
frc1[b,id1,r] -= c1*projalg(U[b,id1,r]*g1/h4)
frc2[bu1,id2,ru1] -= c1*projalg((g1/h4)*U[b,id1,r])
frc2[bu2,id1,ru2] += c1*projalg(h4\U[b,id1,r]*g1)
end
frc1[b,id1,r] -= X
frc1[b,id2,r] += X
end
return nothing
end
"""
function force_gauge(ymws::YMworkspace, U, gp::GaugeParm, lp::SpaceParm)
Computes the force deriving from an improved action with parameter `c0`, without
the prefactor 1/g0^2, and assign it to the workspace force `ymws.frc1` the prefactor 1/g0^2, and assign it to the workspace force `ymws.frc1`
""" """
function force_gauge(ymws::YMworkspace, U, c0, cG, gp::GaugeParm, lp::SpaceParm) function force_gauge(ymws::YMworkspace, U, c0, cG, gp::GaugeParm, lp::SpaceParm)
@ -354,8 +1140,15 @@ function force_gauge(ymws::YMworkspace, U, c0, cG, gp::GaugeParm, lp::SpaceParm)
end end
return nothing return nothing
end end
force_gauge(ymws::YMworkspace, U, c0, gp, lp) = force_gauge(ymws, U, c0, gp.cG[1], gp, lp) force_gauge(ymws::YMworkspace, U, c0, gp, lp) = force_gauge(ymws, U, c0, gp.cG[1], gp, lp)
force_gauge(ymws::YMworkspace, U, gp, lp) = force_gauge(ymws, U, gp.c0, gp.cG[1], gp, lp)
"""
function force_wilson(ymws::YMworkspace, U, gp::GaugeParm, lp::SpaceParm)
Computes the force deriving from the Wilson plaquette action, without
the prefactor 1/g0^2, and assign it to the workspace force `ymws.frc1`
"""
force_wilson(ymws::YMworkspace, U, gp::GaugeParm, lp::SpaceParm) = force_gauge(ymws, U, 1, gp, lp) force_wilson(ymws::YMworkspace, U, gp::GaugeParm, lp::SpaceParm) = force_gauge(ymws, U, 1, gp, lp)
force_wilson(ymws::YMworkspace, U, cG, gp::GaugeParm, lp::SpaceParm) = force_gauge(ymws, U, 1, gp.cG[1], gp, lp) force_wilson(ymws::YMworkspace, U, cG, gp::GaugeParm, lp::SpaceParm) = force_gauge(ymws, U, 1, gp.cG[1], gp, lp)
@ -381,4 +1174,3 @@ function force_pln!(frc1, ftmp, U, Ubnd, cG, ztw, lp::SpaceParm, c0=1)
return nothing return nothing
end end

View file

@ -9,6 +9,12 @@
### created: Thu Jul 15 15:16:47 2021 ### created: Thu Jul 15 15:16:47 2021
### ###
"""
function randomize!(f, lp::SpaceParm, ymws::YMworkspace; curng=CUDA.default_rng())
Given an algebra field with natural indexing, this routine sets the components to random Gaussian distributed values. If SF boundary conditions are used, the force at the boundaries is set to zero.
"""
function randomize!(f, lp::SpaceParm, ymws::YMworkspace; curng=CUDA.default_rng()) function randomize!(f, lp::SpaceParm, ymws::YMworkspace; curng=CUDA.default_rng())
if ymws.ALG == SU2alg if ymws.ALG == SU2alg
@ -49,14 +55,28 @@ function krnl_assign_SU3!(frc::AbstractArray{T}, m, lp::SpaceParm{N,M,BC_PERIODI
return nothing return nothing
end end
function krnl_assign_SU3!(frc::AbstractArray{T}, m, lp::SpaceParm{N,M,B,D}) where {T,N,M,B,D} function krnl_assign_SU3!(frc::AbstractArray{T}, m, lp::SpaceParm{N,M,BC_OPEN,D}) where {T,N,M,D}
@inbounds begin
b = Int64(CUDA.threadIdx().x)
r = Int64(CUDA.blockIdx().x)
for id in 1:lp.ndim
frc[b,id,r] = SU3alg(m[b,id,1,r], m[b,id,2,r], m[b,id,3,r],
m[b,id,4,r], m[b,id,5,r], m[b,id,6,r],
m[b,id,7,r], m[b,id,8,r])
end
end
return nothing
end
function krnl_assign_SU3!(frc::AbstractArray{T}, m, lp::Union{SpaceParm{N,M,BC_SF_ORBI,D},SpaceParm{N,M,BC_SF_AFWB,D}}) where {T,N,M,D}
@inbounds begin @inbounds begin
b = Int64(CUDA.threadIdx().x) b = Int64(CUDA.threadIdx().x)
r = Int64(CUDA.blockIdx().x) r = Int64(CUDA.blockIdx().x)
it = point_time((b,r), lp) it = point_time((b,r), lp)
if ((B==BC_SF_AFWB)||(B==BC_SF_ORBI))
if it == 1 if it == 1
for id in 1:lp.ndim-1 for id in 1:lp.ndim-1
frc[b,id,r] = zero(T) frc[b,id,r] = zero(T)
@ -72,7 +92,6 @@ function krnl_assign_SU3!(frc::AbstractArray{T}, m, lp::SpaceParm{N,M,B,D}) wher
end end
end end
end end
end
return nothing return nothing
end end

View file

@ -10,6 +10,11 @@
### ###
"""
struct FlowIntr{N,T}
Structure containing info about a particular flow integrator
"""
struct FlowIntr{N,T} struct FlowIntr{N,T}
r::T r::T
e0::NTuple{N,T} e0::NTuple{N,T}
@ -26,11 +31,46 @@ struct FlowIntr{N,T}
end end
# pre-defined integrators # pre-defined integrators
"""
wfl_euler(::Type{T}, eps::T, tol::T)
Euler scheme integrator for the Wilson Flow. The fixed step size is given by `eps` and the tolerance for the adaptive integrators by `tol`.
"""
wfl_euler(::Type{T}, eps::T, tol::T) where T = FlowIntr{0,T}(one(T),(),(),false,one(T),eps,tol,one(T)/200,one(T)/10,9/10) wfl_euler(::Type{T}, eps::T, tol::T) where T = FlowIntr{0,T}(one(T),(),(),false,one(T),eps,tol,one(T)/200,one(T)/10,9/10)
"""
zfl_euler(::Type{T}, eps::T, tol::T)
Euler scheme integrator for the Zeuthen flow. The fixed step size is given by `eps` and the tolerance for the adaptive integrators by `tol`.
"""
zfl_euler(::Type{T}, eps::T, tol::T) where T = FlowIntr{0,T}(one(T),(),(),true, (one(T)*5)/3,eps,tol,one(T)/200,one(T)/10,9/10) zfl_euler(::Type{T}, eps::T, tol::T) where T = FlowIntr{0,T}(one(T),(),(),true, (one(T)*5)/3,eps,tol,one(T)/200,one(T)/10,9/10)
"""
wfl_rk2(::Type{T}, eps::T, tol::T)
Second order Runge-Kutta integrator for the Wilson flow. The fixed step size is given by `eps` and the tolerance for the adaptive integrators by `tol`.
"""
wfl_rk2(::Type{T}, eps::T, tol::T) where T = FlowIntr{1,T}(one(T)/2,(-one(T)/2,),(one(T),),false,one(T),eps,tol,one(T)/200,one(T)/10,9/10) wfl_rk2(::Type{T}, eps::T, tol::T) where T = FlowIntr{1,T}(one(T)/2,(-one(T)/2,),(one(T),),false,one(T),eps,tol,one(T)/200,one(T)/10,9/10)
"""
zfl_rk2(::Type{T}, eps::T, tol::T)
Second order Runge-Kutta integrator for the Zeuthen flow. The fixed step size is given by `eps` and the tolerance for the adaptive integrators by `tol`.
"""
zfl_rk2(::Type{T}, eps::T, tol::T) where T = FlowIntr{1,T}(one(T)/2,(-one(T)/2,),(one(T),),true, (one(T)*5)/3,eps,tol,one(T)/200,one(T)/10,9/10) zfl_rk2(::Type{T}, eps::T, tol::T) where T = FlowIntr{1,T}(one(T)/2,(-one(T)/2,),(one(T),),true, (one(T)*5)/3,eps,tol,one(T)/200,one(T)/10,9/10)
"""
wfl_rk3(::Type{T}, eps::T, tol::T)
Third order Runge-Kutta integrator for the Wilson flow. The fixed step size is given by `eps` and the tolerance for the adaptive integrators by `tol`.
"""
wfl_rk3(::Type{T}, eps::T, tol::T) where T = FlowIntr{2,T}(one(T)/4,(-17/36,-one(T)),(8/9,3/4),false,one(T),eps,tol,one(T)/200,one(T)/10,9/10) wfl_rk3(::Type{T}, eps::T, tol::T) where T = FlowIntr{2,T}(one(T)/4,(-17/36,-one(T)),(8/9,3/4),false,one(T),eps,tol,one(T)/200,one(T)/10,9/10)
"""
Zfl_rk3(::Type{T}, eps::T, tol::T)
Third order Runge-Kutta integrator for the Zeuthen flow. The fixed step size is given by `eps` and the tolerance for the adaptive integrators by `tol`.
"""
zfl_rk3(::Type{T}, eps::T, tol::T) where T = FlowIntr{2,T}(one(T)/4,(-17/36,-one(T)),(8/9,3/4),true, (one(T)*5)/3,eps,tol,one(T)/200,one(T)/10,9/10) zfl_rk3(::Type{T}, eps::T, tol::T) where T = FlowIntr{2,T}(one(T)/4,(-17/36,-one(T)),(8/9,3/4),true, (one(T)*5)/3,eps,tol,one(T)/200,one(T)/10,9/10)
function Base.show(io::IO, int::FlowIntr{N,T}) where {N,T} function Base.show(io::IO, int::FlowIntr{N,T}) where {N,T}
@ -95,6 +135,7 @@ function krnl_add_zth!(frc, frc2::AbstractArray{TA}, U::AbstractArray{TG}, lp::S
it = point_time((b, r), lp) it = point_time((b, r), lp)
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) )
OBC = (B == BC_OPEN)
@inbounds for id in 1:N @inbounds for id in 1:N
bu, ru = up((b,r), id, lp) bu, ru = up((b,r), id, lp)
@ -112,16 +153,29 @@ function krnl_add_zth!(frc, frc2::AbstractArray{TA}, U::AbstractArray{TG}, lp::S
frc2[b,id,r] = (5/6)*frc[b,id,r] + (1/6)*(projalg(Ud\Y*Ud) + frc2[b,id,r] = (5/6)*frc[b,id,r] + (1/6)*(projalg(Ud\Y*Ud) +
projalg(U[b,id,r]*X/U[b,id,r])) projalg(U[b,id,r]*X/U[b,id,r]))
end end
end
if OBC
if (it > 1) && (it < lp.iL[end])
frc2[b,id,r] = (5/6)*frc[b,id,r] + (1/6)*(projalg(Ud\Y*Ud) +
projalg(U[b,id,r]*X/U[b,id,r]))
elseif ((it == lp.iL[end]) || (it == 1)) && (id < N)
frc2[b,id,r] = (5/6)*frc[b,id,r] + (1/6)*(projalg(Ud\Y*Ud) +
projalg(U[b,id,r]*X/U[b,id,r]))
end
else else
frc2[b,id,r] = (5/6)*frc[b,id,r] + (1/6)*(projalg(Ud\Y*Ud) + frc2[b,id,r] = (5/6)*frc[b,id,r] + (1/6)*(projalg(Ud\Y*Ud) +
projalg(U[b,id,r]*X/U[b,id,r])) projalg(U[b,id,r]*X/U[b,id,r]))
end end
end end
end end
return nothing return nothing
end end
"""
function flw(U, int::FlowIntr{NI,T}, ns::Int64, gp::GaugeParm, lp::SpaceParm, ymws::YMworkspace)
Integrates the flow equations with the integration scheme defined by `int` performing `ns` steps with fixed step size. The configuration `U` is overwritten.
"""
function flw(U, int::FlowIntr{NI,T}, ns::Int64, eps, gp::GaugeParm, lp::SpaceParm, ymws::YMworkspace) where {NI,T} function flw(U, int::FlowIntr{NI,T}, ns::Int64, eps, gp::GaugeParm, lp::SpaceParm, ymws::YMworkspace) where {NI,T}
@timeit "Integrating flow equations" begin @timeit "Integrating flow equations" begin
for i in 1:ns for i in 1:ns
@ -152,21 +206,28 @@ flw(U, int::FlowIntr{NI,T}, ns::Int64, gp::GaugeParm, lp::SpaceParm, ymws::YMwor
# Adaptive step size integrators # Adaptive step size integrators
## ##
"""
function flw_adapt(U, int::FlowIntr{NI,T}, tend::T, gp::GaugeParm, lp::SpaceParm, ymws::YMworkspace)
Integrates the flow equations with the integration scheme defined by `int` using the adaptive step size integrator up to `tend` with the tolerance defined in `int`. The configuration `U` is overwritten.
"""
function flw_adapt(U, int::FlowIntr{NI,T}, tend::T, epsini::T, gp::GaugeParm, lp::SpaceParm, ymws::YMworkspace) where {NI,T} function flw_adapt(U, int::FlowIntr{NI,T}, tend::T, epsini::T, gp::GaugeParm, lp::SpaceParm, ymws::YMworkspace) where {NI,T}
eps = int.eps_ini eps = epsini
dt = tend dt = tend
nstp = 0 nstp = 0
eps_all = Vector{T}(undef,0)
while true while true
ns = convert(Int64, floor(dt/eps)) ns = convert(Int64, floor(dt/eps))
if ns > 10 if ns > 10
flw(U, int, 9, eps, gp, lp, ymws) flw(U, int, 9, eps, gp, lp, ymws)
ymws.U1 .= U ymws.U1 .= U
flw(U, int, 2, eps/2, gp, lp, ymws) flw(U, int, 1, eps, gp, lp, ymws)
flw(ymws.U1, int, 1, eps, gp, lp, ymws) flw(ymws.U1, int, 2, eps/2, gp, lp, ymws)
dt = dt - 10*eps dt = dt - 10*eps
nstp = nstp + 10 nstp = nstp + 10
push!(eps_all,ntuple(i->eps,10)...)
# adjust step size # adjust step size
ymws.U1 .= ymws.U1 ./ U ymws.U1 .= ymws.U1 ./ U
@ -177,6 +238,9 @@ function flw_adapt(U, int::FlowIntr{NI,T}, tend::T, epsini::T, gp::GaugeParm, lp
flw(U, int, ns, eps, gp, lp, ymws) flw(U, int, ns, eps, gp, lp, ymws)
dt = dt - ns*eps dt = dt - ns*eps
push!(eps_all,ntuple(i->eps,ns)...)
push!(eps_all,dt)
flw(U, int, 1, dt, gp, lp, ymws) flw(U, int, 1, dt, gp, lp, ymws)
dt = zero(tend) dt = zero(tend)
@ -188,7 +252,7 @@ function flw_adapt(U, int::FlowIntr{NI,T}, tend::T, epsini::T, gp::GaugeParm, lp
end end
end end
return nstp, eps return nstp, eps_all
end end
flw_adapt(U, int::FlowIntr{NI,T}, tend::T, gp::GaugeParm, lp::SpaceParm, ymws::YMworkspace) where {NI,T} = flw_adapt(U, int, tend, int.eps_ini, gp, lp, ymws) flw_adapt(U, int::FlowIntr{NI,T}, tend::T, gp::GaugeParm, lp::SpaceParm, ymws::YMworkspace) where {NI,T} = flw_adapt(U, int, tend, int.eps_ini, gp, lp, ymws)
@ -201,7 +265,7 @@ flw_adapt(U, int::FlowIntr{NI,T}, tend::T, gp::GaugeParm, lp::SpaceParm, ymws::Y
""" """
function Eoft_plaq([Eslc,] U, gp::GaugeParm, lp::SpaceParm, ymws::YMworkspace) function Eoft_plaq([Eslc,] U, gp::GaugeParm, lp::SpaceParm, ymws::YMworkspace)
Measure the action density `E(t)` using the plaquette discretization. If the argument `Eslc` Measure the action density `E(t)` using the plaquette discretization. If the argument `Eslc` is given
the contribution for each Euclidean time slice and plane are returned. the contribution for each Euclidean time slice and plane are returned.
""" """
function Eoft_plaq(Eslc, U, gp::GaugeParm{T,G,NN}, lp::SpaceParm{N,M,B,D}, ymws::YMworkspace) where {T,G,NN,N,M,B,D} function Eoft_plaq(Eslc, U, gp::GaugeParm{T,G,NN}, lp::SpaceParm{N,M,B,D}, ymws::YMworkspace) where {T,G,NN,N,M,B,D}
@ -210,6 +274,7 @@ function Eoft_plaq(Eslc, U, gp::GaugeParm{T,G,NN}, lp::SpaceParm{N,M,B,D}, ymws:
ztw = ztwist(gp, lp) ztw = ztwist(gp, lp)
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) )
OBC = (B == BC_OPEN)
tp = ntuple(i->i, N-1) tp = ntuple(i->i, N-1)
V3 = prod(lp.iL[1:end-1]) V3 = prod(lp.iL[1:end-1])
@ -230,6 +295,10 @@ function Eoft_plaq(Eslc, U, gp::GaugeParm{T,G,NN}, lp::SpaceParm{N,M,B,D}, ymws:
if !SFBC if !SFBC
Eslc[1,ipl] = Etmp[1] + Etmp[end] Eslc[1,ipl] = Etmp[1] + Etmp[end]
end end
if OBC ## Check normalization of timelike boundary plaquettes
Eslc[end,ipl] = Etmp[end-1]
Eslc[1,ipl] = Etmp[1]
end
else else
for it in 1:lp.iL[end] for it in 1:lp.iL[end]
Eslc[it,ipl] = 2*Etmp[it] Eslc[it,ipl] = 2*Etmp[it]
@ -254,7 +323,7 @@ function krnl_plaq_pln!(plx, U::AbstractArray{T}, Ubnd, ztw, ipl, lp::SpaceParm{
I = point_coord((b,r), lp) I = point_coord((b,r), lp)
id1, id2 = lp.plidx[ipl] id1, id2 = lp.plidx[ipl]
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI)) && (id1 == lp.iL[end]) SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI)) && (id1 == N)
TWP = ((I[id1]==1)&&(I[id2]==1)) TWP = ((I[id1]==1)&&(I[id2]==1))
bu1, ru1 = up((b, r), id1, lp) bu1, ru1 = up((b, r), id1, lp)
@ -272,15 +341,13 @@ function krnl_plaq_pln!(plx, U::AbstractArray{T}, Ubnd, ztw, ipl, lp::SpaceParm{
plx[I] = tr(U[b,id1,r]*gt / (U[b,id2,r]*U[bu2,id1,ru2])) plx[I] = tr(U[b,id1,r]*gt / (U[b,id2,r]*U[bu2,id1,ru2]))
end end
end end
return nothing return nothing
end end
""" """
Qtop([Qslc,] U, lp, ymws) Qtop([Qslc,] U, gp::GaugeParm, lp::SpaceParm, ymws::YMworkspace)
Measure the topological charge `Q` of the configuration `U`. If the argument `Qslc` is present Measure the topological charge `Q` of the configuration `U` using the clover definition of the field strength tensor. If the argument `Qslc` is present the contributions for each Euclidean time slice are returned. Only works in 4D.
the contribution for each Euclidean time slice are returned.
""" """
function Qtop(Qslc, U, gp::GaugeParm, lp::SpaceParm{4,M,B,D}, ymws::YMworkspace) where {M,B,D} function Qtop(Qslc, U, gp::GaugeParm, lp::SpaceParm{4,M,B,D}, ymws::YMworkspace) where {M,B,D}
@ -296,21 +363,18 @@ function Qtop(Qslc, U, gp::GaugeParm, lp::SpaceParm{4,M,B,D}, ymws::YMworkspace)
CUDA.@sync begin CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_add_qd!(ymws.rm, -, ymws.frc1, ymws.frc2, lp) CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_add_qd!(ymws.rm, -, ymws.frc1, ymws.frc2, lp)
end end
CUDA.@sync begin CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_field_tensor!(ymws.frc1, ymws.frc2, U, gp.Ubnd, 2,4, ztw[2], ztw[4], lp) CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_field_tensor!(ymws.frc1, ymws.frc2, U, gp.Ubnd, 2,4, ztw[2], ztw[4], lp)
end end
CUDA.@sync begin CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_add_qd!(ymws.rm, +, ymws.frc1, ymws.frc2, lp) CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_add_qd!(ymws.rm, +, ymws.frc1, ymws.frc2, lp)
end end
CUDA.@sync begin CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_field_tensor!(ymws.frc1, ymws.frc2, U, gp.Ubnd, 3,6, ztw[3], ztw[6], lp) CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_field_tensor!(ymws.frc1, ymws.frc2, U, gp.Ubnd, 3,6, ztw[3], ztw[6], lp)
end end
CUDA.@sync begin CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_add_qd!(ymws.rm, -, ymws.frc1, ymws.frc2, lp) CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnl_add_qd!(ymws.rm, -, ymws.frc1, ymws.frc2, lp)
end end
Qslc .= reshape(Array(CUDA.reduce(+, ymws.rm; dims=tp)),lp.iL[end])./(32*pi^2) Qslc .= reshape(Array(CUDA.reduce(+, ymws.rm; dims=tp)),lp.iL[end])./(32*pi^2)
end end
@ -322,7 +386,7 @@ Qtop(U, gp::GaugeParm, lp::SpaceParm{4,M,D}, ymws::YMworkspace{T}) where {T,M,D}
""" """
function Eoft_clover([Eslc,] U, gp::GaugeParm, lp::SpaceParm, ymws::YMworkspace) function Eoft_clover([Eslc,] U, gp::GaugeParm, lp::SpaceParm, ymws::YMworkspace)
Measure the action density `E(t)` using the clover discretization. If the argument `Eslc` Measure the action density `E(t)` using the clover discretization. If the argument `Eslc` is given
the contribution for each Euclidean time slice and plane are returned. the contribution for each Euclidean time slice and plane are returned.
""" """
function Eoft_clover(Eslc, U, gp::GaugeParm, lp::SpaceParm{4,M,B,D}, ymws::YMworkspace{T}) where {T,M,B,D} function Eoft_clover(Eslc, U, gp::GaugeParm, lp::SpaceParm{4,M,B,D}, ymws::YMworkspace{T}) where {T,M,B,D}
@ -420,6 +484,7 @@ function krnl_field_tensor!(frc1::AbstractArray{TA}, frc2, U::AbstractArray{T},
#First plane #First plane
id1, id2 = lp.plidx[ipl1] id1, id2 = lp.plidx[ipl1]
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) && (id1 == 4) SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) && (id1 == 4)
OBC = ((B == BC_OPEN) && (id1 == 4))
TWP = ((I[id1]==1)&&(I[id2]==1)) TWP = ((I[id1]==1)&&(I[id2]==1))
bu1, ru1 = up((b, r), id1, lp) bu1, ru1 = up((b, r), id1, lp)
@ -439,6 +504,11 @@ function krnl_field_tensor!(frc1::AbstractArray{TA}, frc2, U::AbstractArray{T},
frc1[bu1,2,ru1] = zero(TA) frc1[bu1,2,ru1] = zero(TA)
frc1[bd,3,rd] = zero(TA) frc1[bd,3,rd] = zero(TA)
frc1[bu2,4,ru2] = projalg(l2*l1) frc1[bu2,4,ru2] = projalg(l2*l1)
elseif OBC && (it == lp.iL[end])
frc1[b,1,r] = projalg(U[b,id1,r]*l1/U[b,id2,r])
frc1[bu1,2,ru1] = zero(TA)
frc1[bd,3,rd] = zero(TA)
frc1[bu2,4,ru2] = projalg(l2*l1)
else else
if TWP if TWP
frc1[b,1,r] = projalg(ztw1, U[b,id1,r]*l1/U[b,id2,r]) frc1[b,1,r] = projalg(ztw1, U[b,id1,r]*l1/U[b,id2,r])
@ -456,6 +526,7 @@ function krnl_field_tensor!(frc1::AbstractArray{TA}, frc2, U::AbstractArray{T},
# Second plane # Second plane
id1, id2 = lp.plidx[ipl2] id1, id2 = lp.plidx[ipl2]
SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) && (id1 == 4) SFBC = ((B == BC_SF_AFWB) || (B == BC_SF_ORBI) ) && (id1 == 4)
OBC = ((B == BC_OPEN) && (id1 == 4))
TWP = ((I[id1]==1)&&(I[id2]==1)) TWP = ((I[id1]==1)&&(I[id2]==1))
bu1, ru1 = up((b, r), id1, lp) bu1, ru1 = up((b, r), id1, lp)
@ -475,6 +546,11 @@ function krnl_field_tensor!(frc1::AbstractArray{TA}, frc2, U::AbstractArray{T},
frc2[bu1,2,ru1] = zero(TA) frc2[bu1,2,ru1] = zero(TA)
frc2[bd,3,rd] = zero(TA) frc2[bd,3,rd] = zero(TA)
frc2[bu2,4,ru2] = projalg(l2*l1) frc2[bu2,4,ru2] = projalg(l2*l1)
elseif OBC && (it == lp.iL[end])
frc1[b,1,r] = projalg(U[b,id1,r]*l1/U[b,id2,r])
frc1[bu1,2,ru1] = zero(TA)
frc1[bd,3,rd] = zero(TA)
frc1[bu2,4,ru2] = projalg(l2*l1)
else else
if TWP if TWP
frc2[b,1,r] = projalg(ztw2, U[b,id1,r]*l1/U[b,id2,r]) frc2[b,1,r] = projalg(ztw2, U[b,id1,r]*l1/U[b,id2,r])
@ -489,7 +565,5 @@ function krnl_field_tensor!(frc1::AbstractArray{TA}, frc2, U::AbstractArray{T},
end end
end end
end end
return nothing return nothing
end end

View file

@ -13,7 +13,7 @@
function gauge_action(U, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace) function gauge_action(U, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace)
Returns the value of the gauge plaquette action for the configuration U. The parameters `\beta` and `c0` are taken from the `gp` structure. Returns the value of the gauge action for the configuration U. The parameters ``\\beta`` and `c0` are taken from the `gp` structure.
""" """
function gauge_action(U, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace{T}) where T <: AbstractFloat function gauge_action(U, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace{T}) where T <: AbstractFloat
@ -37,6 +37,11 @@ function gauge_action(U, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace{T}) whe
return S return S
end end
"""
function plaquette(U, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace)
Computes the average plaquette for the configuration `U`.
"""
function plaquette(U, lp::SpaceParm{N,M,B,D}, gp::GaugeParm, ymws::YMworkspace) where {N,M,B,D} function plaquette(U, lp::SpaceParm{N,M,B,D}, gp::GaugeParm, ymws::YMworkspace) where {N,M,B,D}
ztw = ztwist(gp, lp) ztw = ztwist(gp, lp)
@ -49,6 +54,11 @@ function plaquette(U, lp::SpaceParm{N,M,B,D}, gp::GaugeParm, ymws::YMworkspace)
return CUDA.mapreduce(real, +, ymws.cm)/(prod(lp.iL)*lp.npls) return CUDA.mapreduce(real, +, ymws.cm)/(prod(lp.iL)*lp.npls)
end end
"""
function hamiltonian(mom, U, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace)
Returns the Energy ``H = \\frac{p^2}{2}+S[U]``, where the momenta field is given by `mom` and the configuration by `U`.
"""
function hamiltonian(mom, U, lp, gp, ymws) function hamiltonian(mom, U, lp, gp, ymws)
@timeit "Computing Hamiltonian" begin @timeit "Computing Hamiltonian" begin
K = CUDA.mapreduce(norm2, +, mom)/2 K = CUDA.mapreduce(norm2, +, mom)/2
@ -58,6 +68,12 @@ function hamiltonian(mom, U, lp, gp, ymws)
return K+V return K+V
end end
"""
HMC!(U, int::IntrScheme, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace; noacc=false, rng=Random.default_rng(), curng=CUDA.default_rng())
Performs a HMC step (molecular dynamics integration and accept/reject step). The configuration `U` is updated and function returns the energy violation and if the configuration was accepted in a tuple.
"""
function HMC!(U, int::IntrScheme, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace{T}; noacc=false, rng=Random.default_rng(), curng=CUDA.default_rng()) where T function HMC!(U, int::IntrScheme, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace{T}; noacc=false, rng=Random.default_rng(), curng=CUDA.default_rng()) where T
@timeit "HMC trayectory" begin @timeit "HMC trayectory" begin
@ -92,6 +108,11 @@ function HMC!(U, int::IntrScheme, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspac
end end
HMC!(U, eps, ns, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace{T}; noacc=false, rng=Random.default_rng(), curng=CUDA.default_rng()) where T = HMC!(U, omf4(T, eps, ns), lp, gp, ymws; noacc=noacc, rng, curng) HMC!(U, eps, ns, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace{T}; noacc=false, rng=Random.default_rng(), curng=CUDA.default_rng()) where T = HMC!(U, omf4(T, eps, ns), lp, gp, ymws; noacc=noacc, rng, curng)
"""
function MD!(mom, U, int::IntrScheme, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace)
Performs the integration of a molecular dynamics trajectory starting from the momentum field `mom` and the configuration `U` according to the integrator described by `int`.
"""
function MD!(mom, U, int::IntrScheme{NI, T}, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace{T}) where {NI, T <: AbstractFloat} function MD!(mom, U, int::IntrScheme{NI, T}, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace{T}) where {NI, T <: AbstractFloat}
@timeit "MD evolution" begin @timeit "MD evolution" begin

View file

@ -75,7 +75,7 @@ function read_cnfg(fname::String)
end end
if ibc == BC_SF_AFWB || ibc == BC_SF_ORBI if ibc == BC_SF_AFWB || ibc == BC_SF_ORBI
BDIO_read(fb, V) BDIO_read(fb, vec(V))
Ubnd = ntuple(i->assign(i, V, 1), 3) Ubnd = ntuple(i->assign(i, V, 1), 3)
BDIO_close!(fb) BDIO_close!(fb)
@ -297,3 +297,50 @@ function import_cern64(fname, ibc, lp::SpaceParm; log=true)
return CuArray(Ucpu) return CuArray(Ucpu)
end end
"""
read_gp(fname::String)
Reads Gauge parameters from file `fname` using the native (BDIO) format. Returns GaugeParm and SpaceParm.
"""
function read_gp(fname::String)
UID_HDR = 14
fb = BDIO_open(fname, "r")
while BDIO_get_uinfo(fb) != UID_HDR
BDIO_seek!(fb)
end
ihdr = Vector{Int32}(undef, 2)
BDIO_read(fb, ihdr)
if (ihdr[1] != convert(Int32, 1653996111)) && (ihdr[2] != convert(Int32, 2))
error("Wrong file format [header]")
end
run = BDIO.BDIO_read_str(fb)
while BDIO_get_uinfo(fb) != 1
BDIO_seek!(fb)
end
ifoo = Vector{Int32}(undef, 4)
BDIO_read(fb, ifoo)
ndim = convert(Int64, ifoo[1])
npls = convert(Int64, round(ndim*(ndim-1)/2))
ibc = convert(Int64, ifoo[2])
nf = ifoo[4]
ifoo = Vector{Int32}(undef, ndim+convert(Int32, npls))
BDIO_read(fb, ifoo)
iL = ntuple(i -> convert(Int64, ifoo[i]),ndim)
ntw = ntuple(i -> convert(Int64, ifoo[i+ndim]), npls)
dfoo = Vector{Float64}(undef, 4)
BDIO_read(fb, dfoo)
lp = SpaceParm{ndim}(iL, (4,4,4,4), ibc, ntw)
gp = GaugeParm{Float64}(SU3{Float64}, dfoo[1], dfoo[2])
return gp, lp
end

View file

@ -10,9 +10,9 @@
### ###
""" """
sfcoupling(U, lp::SpaceParm{N,M,B,D}, gp::GaugeParm, ymws::YMworkspace) where {N,M,B,D} sfcoupling(U, lp::SpaceParm, gp::GaugeParm, ymws::YMworkspace)
Measures the Schrodinger Functional coupling `ds/d\eta` and `d^2S/d\eta d\nu`. Measures the Schrodinger Functional coupling ``{\\rm d}S/{\\rm d}\\eta`` and ``{\\rm d}^2S/{\\rm d}\\eta d\nu``.
""" """
function sfcoupling(U, lp::SpaceParm{N,M,B,D}, gp::GaugeParm, ymws::YMworkspace) where {N,M,B,D} function sfcoupling(U, lp::SpaceParm{N,M,B,D}, gp::GaugeParm, ymws::YMworkspace) where {N,M,B,D}
@ -89,7 +89,11 @@ end
return exp(X) return exp(X)
end end
"""
function setbndfield(U, phi, lp::SpaceParm)
Sets abelian boundary fields with phases `phi[1]` and `phi[2]` to the configuration `U` at time salice ``x_0=0``.
"""
function setbndfield(U, phi, lp::SpaceParm{N,M,B,D}) where {N,M,B,D} function setbndfield(U, phi, lp::SpaceParm{N,M,B,D}) where {N,M,B,D}
CUDA.@sync begin CUDA.@sync begin

View file

@ -0,0 +1,42 @@
using CUDA
using Pkg
Pkg.activate("/home/fperez/Git/LGPU_fork_ferflow")
using LatticeGPU
lp = SpaceParm{4}((4,4,4,4),(2,2,2,2),0,(0,0,0,0,0,0));
pso = scalar_field(Spinor{4,SU3fund{Float64}},lp);
psi = scalar_field(Spinor{4,SU3fund{Float64}},lp);
psi2 = scalar_field(Spinor{4,SU3fund{Float64}},lp);
ymws = YMworkspace(SU3,Float64,lp);
dws = DiracWorkspace(SU3fund,Float64,lp);
int = wfl_rk3(Float64, 0.01, 1.0)
gp = GaugeParm{Float64}(SU3{Float64},6.0,1.0,(1.0,0.0),(0.0,0.0),lp.iL)
dpar = DiracParam{Float64}(SU3fund,1.3,0.9,(1.0,1.0,1.0,1.0),0.0)
randomize!(ymws.mom, lp, ymws)
U = exp.(ymws.mom);
pfrandomize!(psi,lp)
for L in 4:19
pso .= psi
V = Array(U)
a,b = flw_adapt(U, psi, int, L*int.eps, gp,dpar, lp, ymws,dws)
# for i in 1:a
# flw(U, psi, int, 1 ,b[i], gp, dpar, lp, ymws, dws)
# end
pfrandomize!(psi2,lp)
foo = sum(dot.(psi,psi2))# field_dot(psi,psi2,sumf,lp)
copyto!(U,V);
backflow(psi2,U,L*int.eps,7,gp,dpar,lp, ymws,dws)
println("Error:",(sum(dot.(pso,psi2))-foo)/foo)
psi .= pso
end

View file

@ -0,0 +1,119 @@
using LatticeGPU, CUDA, TimerOutputs
#Test for the relation K(t,y;0,n)^+ Dw(n|m)^{-1} e^(ipm) = D(p)^{-1} exp(4t sin^2(p/2)) e^{ipn} with a given momenta (if p=0 its randomized), spin and color
#Kernel en 1207.2096
@timeit "Plw backflow test" begin
function Dwpw_test(;p=0,s=1,c=1)
lp = SpaceParm{4}((16,16,16,16), (4,4,4,4), 0, (0,0,0,0,0,0))
gp = GaugeParm{Float64}(SU3{Float64}, 6.0, 1.0)
dpar = DiracParam{Float64}(SU3fund,1.3,0.0,(1.0,1.0,1.0,1.0),0.0)
dws = DiracWorkspace(SU3fund,Float64,lp);
ymws = YMworkspace(SU3,Float64,lp);
p==0 ? p = Int.(round.(lp.iL.*rand(4),RoundUp)) : nothing
U = fill!(vector_field(SU3{Float64},lp),one(SU3{Float64}))
rm = 2* pi* p./(lp.iL)
rmom=(rm[1],rm[2],rm[3],rm[4])
int = wfl_rk3(Float64, 0.01, 1.0)
Nsteps = 15
@timeit "Generate plane wave" begin
pwave = fill!(scalar_field(Spinor{4,SU3fund{Float64}},lp),zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp))))
prop = scalar_field(Spinor{4,SU3fund{Float64}},lp)
prop_th = fill!(scalar_field(Spinor{4,SU3fund{Float64}},lp),zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp))))
#Generate plane wave
for x in 1:lp.iL[1] for y in 1:lp.iL[2] for z in 1:lp.iL[3] for t in 1:lp.iL[4]
CUDA.@allowscalar pwave[point_index(CartesianIndex{lp.ndim}((x,y,z,t)),lp)...] = exp(im * (x*rmom[1] + y*rmom[2] + z*rmom[3] + t*rmom[4]))*Spinor{4,SU3fund{Float64}}(ntuple(i -> (i==s)*SU3fund{Float64}(ntuple(j -> (j==c)*1.0,3)...),4))
end end end end
end
@timeit "Generate analitical solution" begin
#Th solution
if s == 1
vals = (dpar.m0 + 4.0 - sum(cos.(rmom)),0.0,im*sin(rmom[4])+sin(rmom[3]),im*sin(rmom[2])+sin(rmom[1]))
for x in 1:lp.iL[1] for y in 1:lp.iL[2] for z in 1:lp.iL[3] for t in 1:lp.iL[4]
CUDA.@allowscalar prop_th[point_index(CartesianIndex{lp.ndim}((x,y,z,t)),lp)...] = exp(im * (x*rmom[1] + y*rmom[2] + z*rmom[3] + t*rmom[4]))*
( Spinor{4,SU3fund{Float64}}(ntuple(i -> SU3fund{Float64}(ntuple(j -> (j==c)*vals[i],3)...),4)) )/(sum((sin.(rmom)) .^2) + (dpar.m0+ 4.0 - sum(cos.(rmom)))^2)
end end end end
elseif s == 2
vals = (0.0,dpar.m0 + 4.0 - sum(cos.(rmom)),sin(rmom[1]) - im *sin(rmom[2]),-sin(rmom[3])+im*sin(rmom[4]))
for x in 1:lp.iL[1] for y in 1:lp.iL[2] for z in 1:lp.iL[3] for t in 1:lp.iL[4]
CUDA.@allowscalar prop_th[point_index(CartesianIndex{lp.ndim}((x,y,z,t)),lp)...] = exp(im * (x*rmom[1] + y*rmom[2] + z*rmom[3] + t*rmom[4]))*
( Spinor{4,SU3fund{Float64}}(ntuple(i -> SU3fund{Float64}(ntuple(j -> (j==c)*vals[i],3)...),4)) )/(sum((sin.(rmom)) .^2) + (dpar.m0+ 4.0 - sum(cos.(rmom)))^2)
end end end end
elseif s == 3
vals = (-sin(rmom[3])+im*sin(rmom[4]),-sin(rmom[1])-im*sin(rmom[2]),dpar.m0 + 4.0 - sum(cos.(rmom)),0.0)
for x in 1:lp.iL[1] for y in 1:lp.iL[2] for z in 1:lp.iL[3] for t in 1:lp.iL[4]
CUDA.@allowscalar prop_th[point_index(CartesianIndex{lp.ndim}((x,y,z,t)),lp)...] = exp(im * (x*rmom[1] + y*rmom[2] + z*rmom[3] + t*rmom[4]))*
( Spinor{4,SU3fund{Float64}}(ntuple(i -> SU3fund{Float64}(ntuple(j -> (j==c)*vals[i],3)...),4)) )/(sum((sin.(rmom)) .^2) + (dpar.m0+ 4.0 - sum(cos.(rmom)))^2)
end end end end
else
vals = (-sin(rmom[1])+im*sin(rmom[2]),sin(rmom[3])+im*sin(rmom[4]),0.0,dpar.m0 + 4.0 - sum(cos.(rmom)))
for x in 1:lp.iL[1] for y in 1:lp.iL[2] for z in 1:lp.iL[3] for t in 1:lp.iL[4]
CUDA.@allowscalar prop_th[point_index(CartesianIndex{lp.ndim}((x,y,z,t)),lp)...] = exp(im * (x*rmom[1] + y*rmom[2] + z*rmom[3] + t*rmom[4]))*
( Spinor{4,SU3fund{Float64}}(ntuple(i -> SU3fund{Float64}(ntuple(j -> (j==c)*vals[i],3)...),4)) )/(sum((sin.(rmom)) .^2) + (dpar.m0+ 4.0 - sum(cos.(rmom)))^2)
end end end end
end
end
prop_th .= exp(-4*Nsteps*int.eps*sum(sin.(rmom./2).^2))*prop_th
#compute Sum{x} D^-1(x|y)P(y)
@timeit "Solving propagator and flowing" begin
function krnlg5!(src)
b=Int64(CUDA.threadIdx().x)
r=Int64(CUDA.blockIdx().x)
src[b,r] = dmul(Gamma{5},src[b,r])
return nothing
end
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnlg5!(pwave)
end
g5Dw!(prop,U,pwave,dpar,dws,lp)
CG!(prop,U,DwdagDw!,dpar,lp,dws,10000,1.0e-14)
for _ in 1:Nsteps
backflow(U,prop,1,int.eps,gp,dpar,lp, ymws,dws)
end
end
dif = sum(norm2.(prop - prop_th))
return dif
end
begin
dif = 0.0
for i in 1:3 for j in 1:4
dif += Dwpw_test(c=i,s=j)
end end
if dif < 1.0e-5
print("Backflow_tl test passed with average error ", dif/12,"!\n")
else
error("Backflow_tl test failed with difference: ",dif,"\n")
end
end
end

119
test/dirac/test_flow_tl.jl Normal file
View file

@ -0,0 +1,119 @@
using LatticeGPU, CUDA, TimerOutputs
#Test for the relation K(t,y;0,n) Dw(n|m)^{-1} e^(ipm) = D(p)^{-1} exp(-4t sin^2(p/2)) e^{ipn} with a given momenta (if p=0 its randomized), spin and color
#Kernel en 1207.2096
@timeit "Plw flow test" begin
function Dwpw_test(;p=0,s=1,c=1)
lp = SpaceParm{4}((16,16,16,16), (4,4,4,4), 0, (0,0,0,0,0,0))
gp = GaugeParm{Float64}(SU3{Float64}, 6.0, 1.0)
dpar = DiracParam{Float64}(SU3fund,1.3,0.0,(1.0,1.0,1.0,1.0),0.0)
dws = DiracWorkspace(SU3fund,Float64,lp);
ymws = YMworkspace(SU3,Float64,lp);
p==0 ? p = Int.(round.(lp.iL.*rand(4),RoundUp)) : nothing
U = fill!(vector_field(SU3{Float64},lp),one(SU3{Float64}))
rm = 2* pi* p./(lp.iL)
rmom=(rm[1],rm[2],rm[3],rm[4])
int = wfl_rk3(Float64, 0.01, 1.0)
Nsteps = 15
@timeit "Generate plane wave" begin
pwave = fill!(scalar_field(Spinor{4,SU3fund{Float64}},lp),zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp))))
prop = scalar_field(Spinor{4,SU3fund{Float64}},lp)
prop_th = fill!(scalar_field(Spinor{4,SU3fund{Float64}},lp),zero(eltype(scalar_field(Spinor{4,SU3fund{Float64}},lp))))
#Generate plane wave
for x in 1:lp.iL[1] for y in 1:lp.iL[2] for z in 1:lp.iL[3] for t in 1:lp.iL[4]
CUDA.@allowscalar pwave[point_index(CartesianIndex{lp.ndim}((x,y,z,t)),lp)...] = exp(im * (x*rmom[1] + y*rmom[2] + z*rmom[3] + t*rmom[4]))*Spinor{4,SU3fund{Float64}}(ntuple(i -> (i==s)*SU3fund{Float64}(ntuple(j -> (j==c)*1.0,3)...),4))
end end end end
end
@timeit "Generate analitical solution" begin
#Th solution
if s == 1
vals = (dpar.m0 + 4.0 - sum(cos.(rmom)),0.0,im*sin(rmom[4])+sin(rmom[3]),im*sin(rmom[2])+sin(rmom[1]))
for x in 1:lp.iL[1] for y in 1:lp.iL[2] for z in 1:lp.iL[3] for t in 1:lp.iL[4]
CUDA.@allowscalar prop_th[point_index(CartesianIndex{lp.ndim}((x,y,z,t)),lp)...] = exp(im * (x*rmom[1] + y*rmom[2] + z*rmom[3] + t*rmom[4]))*
( Spinor{4,SU3fund{Float64}}(ntuple(i -> SU3fund{Float64}(ntuple(j -> (j==c)*vals[i],3)...),4)) )/(sum((sin.(rmom)) .^2) + (dpar.m0+ 4.0 - sum(cos.(rmom)))^2)
end end end end
elseif s == 2
vals = (0.0,dpar.m0 + 4.0 - sum(cos.(rmom)),sin(rmom[1]) - im *sin(rmom[2]),-sin(rmom[3])+im*sin(rmom[4]))
for x in 1:lp.iL[1] for y in 1:lp.iL[2] for z in 1:lp.iL[3] for t in 1:lp.iL[4]
CUDA.@allowscalar prop_th[point_index(CartesianIndex{lp.ndim}((x,y,z,t)),lp)...] = exp(im * (x*rmom[1] + y*rmom[2] + z*rmom[3] + t*rmom[4]))*
( Spinor{4,SU3fund{Float64}}(ntuple(i -> SU3fund{Float64}(ntuple(j -> (j==c)*vals[i],3)...),4)) )/(sum((sin.(rmom)) .^2) + (dpar.m0+ 4.0 - sum(cos.(rmom)))^2)
end end end end
elseif s == 3
vals = (-sin(rmom[3])+im*sin(rmom[4]),-sin(rmom[1])-im*sin(rmom[2]),dpar.m0 + 4.0 - sum(cos.(rmom)),0.0)
for x in 1:lp.iL[1] for y in 1:lp.iL[2] for z in 1:lp.iL[3] for t in 1:lp.iL[4]
CUDA.@allowscalar prop_th[point_index(CartesianIndex{lp.ndim}((x,y,z,t)),lp)...] = exp(im * (x*rmom[1] + y*rmom[2] + z*rmom[3] + t*rmom[4]))*
( Spinor{4,SU3fund{Float64}}(ntuple(i -> SU3fund{Float64}(ntuple(j -> (j==c)*vals[i],3)...),4)) )/(sum((sin.(rmom)) .^2) + (dpar.m0+ 4.0 - sum(cos.(rmom)))^2)
end end end end
else
vals = (-sin(rmom[1])+im*sin(rmom[2]),sin(rmom[3])+im*sin(rmom[4]),0.0,dpar.m0 + 4.0 - sum(cos.(rmom)))
for x in 1:lp.iL[1] for y in 1:lp.iL[2] for z in 1:lp.iL[3] for t in 1:lp.iL[4]
CUDA.@allowscalar prop_th[point_index(CartesianIndex{lp.ndim}((x,y,z,t)),lp)...] = exp(im * (x*rmom[1] + y*rmom[2] + z*rmom[3] + t*rmom[4]))*
( Spinor{4,SU3fund{Float64}}(ntuple(i -> SU3fund{Float64}(ntuple(j -> (j==c)*vals[i],3)...),4)) )/(sum((sin.(rmom)) .^2) + (dpar.m0+ 4.0 - sum(cos.(rmom)))^2)
end end end end
end
end
prop_th .= exp(-4*Nsteps*int.eps*sum(sin.(rmom./2).^2))*prop_th
#compute Sum{x} D^-1(x|y)P(y)
@timeit "Solving propagator and flowing" begin
function krnlg5!(src)
b=Int64(CUDA.threadIdx().x)
r=Int64(CUDA.blockIdx().x)
src[b,r] = dmul(Gamma{5},src[b,r])
return nothing
end
CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnlg5!(pwave)
end
g5Dw!(prop,U,pwave,dpar,dws,lp)
CG!(prop,U,DwdagDw!,dpar,lp,dws,10000,1.0e-14)
flw(U, prop, int, Nsteps ,int.eps, gp, dpar, lp, ymws, dws)
end
dif = sum(norm2.(prop - prop_th))
return dif
end
begin
dif = 0.0
for i in 1:3 for j in 1:4
dif += Dwpw_test(c=i,s=j)
end end
if dif < 1.0e-4
print("Flow_tl test passed with average error ", dif/12,"!\n")
else
error("Flow_tl test failed with difference: ",dif,"\n")
end
end
end

View file

@ -2,26 +2,24 @@ using LatticeGPU
using CUDA using CUDA
using TimerOutputs using TimerOutputs
@timeit "fA_fP test" begin @timeit "fA_fP test" begin
function fP_test(;theta = (0.5,0.7,1.0,0.0), m = 1.3, size = (8,8,8,16),prec = 1.0e-16) function fP_test(;theta = (0.5,0.7,1.0,0.0), m = 1.3, size = (8,8,8,16),prec = 1.0e-16)
@timeit "fP inversion (x12)" begin @timeit "fP inversion (x12)" begin
lp = SpaceParm{4}(size,(4,4,4,4),1,(0,0,0,0,0,0)); lp = SpaceParm{4}(size,(4,4,4,4),1,(0,0,0,0,0,0));
exptheta = exp.(im.*theta./lp.iL); exptheta = exp.(im.*theta./lp.iL);
dpar = DiracParam{Float64}(SU3fund,m,0.0,exptheta,0.0,1.0);
dws = DiracWorkspace(SU3fund,Float64,lp);
dpar = DiracParam{Float64}(SU3fund,m,0.0,exptheta,1.0); U = fill!(vector_field(SU3{Float64},lp),one(SU3{Float64}));
dws = DiracWorkspace(SU3fund,Float64,lp); psi = scalar_field(Spinor{4,SU3fund{Float64}},lp);
U = fill!(vector_field(SU3{Float64},lp),one(SU3{Float64})); res = zeros(lp.iL[4])
psi = scalar_field(Spinor{4,SU3fund{Float64}},lp);
res = zeros(lp.iL[4]) for s in 1:4 for c in 1:3
for s in 1:4 for c in 1:3
bndpropagator!(psi,U,dpar,dws,lp,1000,prec,c,s); bndpropagator!(psi,U,dpar,dws,lp,1000,prec,c,s);
for t in 1:lp.iL[4] for t in 1:lp.iL[4]
@ -33,11 +31,11 @@ for s in 1:4 for c in 1:3
end end
end end end end
end end
@timeit "fP analitical solution" begin @timeit "fP analitical solution" begin
#THEORETICAL SOLUTION: hep-lat/9606016 eq (2.33) #THEORETICAL SOLUTION: hep-lat/9606016 eq (2.33)
@ -53,19 +51,19 @@ end
res_th[i] = (2*3*sinh(omega)/(Rpp^2)) * ( (Mpp + sinh(omega))*exp(-2*omega*(i-1)) - (Mpp - sinh(omega))*exp(-2*omega*(2*lp.iL[4]- (i - 1))) ) res_th[i] = (2*3*sinh(omega)/(Rpp^2)) * ( (Mpp + sinh(omega))*exp(-2*omega*(i-1)) - (Mpp - sinh(omega))*exp(-2*omega*(2*lp.iL[4]- (i - 1))) )
end end
end end
return sum(abs.(res-res_th)) return sum(abs.(res-res_th))
end end
function fA_test(;theta = (0.5,0.7,1.0,0.0), m = 1.3, size = (8,8,8,16),prec = 1.0e-16) function fA_test(;theta = (0.5,0.7,1.0,0.0), m = 1.3, size = (8,8,8,16),prec = 1.0e-16)
@timeit "fA inversion (x12)" begin @timeit "fA inversion (x12)" begin
lp = SpaceParm{4}(size,(4,4,4,4),1,(0,0,0,0,0,0)); lp = SpaceParm{4}(size,(4,4,4,4),1,(0,0,0,0,0,0));
exptheta = exp.(im.*theta./lp.iL); exptheta = exp.(im.*theta./lp.iL);
dpar = DiracParam{Float64}(SU3fund,m,0.0,exptheta,1.0); dpar = DiracParam{Float64}(SU3fund,m,0.0,exptheta,0.0,1.0);
dws = DiracWorkspace(SU3fund,Float64,lp); dws = DiracWorkspace(SU3fund,Float64,lp);
U = fill!(vector_field(SU3{Float64},lp),one(SU3{Float64})); U = fill!(vector_field(SU3{Float64},lp),one(SU3{Float64}));
@ -87,10 +85,10 @@ function fA_test(;theta = (0.5,0.7,1.0,0.0), m = 1.3, size = (8,8,8,16),prec = 1
end end end end
end end
#THEORETICAL SOLUTION: hep-lat/9606016 eq (2.32) #THEORETICAL SOLUTION: hep-lat/9606016 eq (2.32)
@timeit "fA analitical solution" begin @timeit "fA analitical solution" begin
res_th = zeros(lp.iL[4]) res_th = zeros(lp.iL[4])
pp3 = ntuple(i -> theta[i]/lp.iL[i],3) pp3 = ntuple(i -> theta[i]/lp.iL[i],3)
@ -104,22 +102,20 @@ end
- Mpp*((Mpp + sinh(omega))*exp(-2*omega*(i-1)) + (Mpp - sinh(omega))*exp(-2*omega*(2*lp.iL[4]- (i - 1))))) - Mpp*((Mpp + sinh(omega))*exp(-2*omega*(i-1)) + (Mpp - sinh(omega))*exp(-2*omega*(2*lp.iL[4]- (i - 1)))))
end end
end end
return sum(abs.(res-res_th)) return sum(abs.(res-res_th))
end
end
difA = fA_test(); difA = fA_test();
difP = fP_test(); difP = fP_test();
if difA > 1.0e-15 if difA > 1.0e-15
error("fA test failed with error ", difA) error("fA test failed with error ", difA)
elseif difP > 1.0e-15 elseif difP > 1.0e-15
error("fP test failed with error ", difP) error("fP test failed with error ", difP)
else else
print("fA & fP tests passed with errors: ", difA," and ",difP,"!\n") print("fA & fP tests passed with errors: ", difA," and ",difP,"!\n")
end end
end end

View file

@ -7,7 +7,7 @@ using LatticeGPU, CUDA, TimerOutputs
function Dwpw_test(;p=0,s=1,c=1) function Dwpw_test(;p=0,s=1,c=1)
lp = SpaceParm{4}((16,16,16,16), (4,4,4,4), 0, (0,0,0,0,0,0)) lp = SpaceParm{4}((16,16,16,16), (4,4,4,4), 0, (0,0,0,0,0,0))
gp = GaugeParm{Float64}(SU3{Float64}, 6.0, 1.0) gp = GaugeParm{Float64}(SU3{Float64}, 6.0, 1.0)
dpar = DiracParam{Float64}(SU3fund,1.3,0.0,(1.0,1.0,1.0,1.0),0.0) dpar = DiracParam{Float64}(SU3fund,1.3,0.0,(1.0,1.0,1.0,1.0),0.0,0.0)
dws = DiracWorkspace(SU3fund,Float64,lp); dws = DiracWorkspace(SU3fund,Float64,lp);
p==0 ? p = Int.(round.(lp.iL.*rand(4),RoundUp)) : nothing p==0 ? p = Int.(round.(lp.iL.*rand(4),RoundUp)) : nothing
@ -96,15 +96,15 @@ end
begin begin
dif = 0.0 diff = 0.0
for i in 1:3 for j in 1:4 for i in 1:3 for j in 1:4
dif += Dwpw_test(c=i,s=j) global diff += Dwpw_test(c=i,s=j)
end end end end
if dif < 1.0e-15 if diff < 1.0e-15
print("Dwpl test passed with average error ", dif/12,"!\n") print("Dwpl test passed with average error ", diff/12,"!\n")
else else
error("Dwpl test failed with difference: ",dif,"\n") error("Dwpl test failed with difference: ",diff,"\n")
end end

View file

@ -9,7 +9,7 @@ using CUDA, LatticeGPU, TimerOutputs
lp = SpaceParm{4}((16,16,16,16), (4,4,4,4), 0, (0,0,0,0,0,0)) lp = SpaceParm{4}((16,16,16,16), (4,4,4,4), 0, (0,0,0,0,0,0))
gp = GaugeParm{Float64}(SU3{Float64}, 6.0, 1.0) gp = GaugeParm{Float64}(SU3{Float64}, 6.0, 1.0)
ymws = YMworkspace(SU3, Float64, lp) ymws = YMworkspace(SU3, Float64, lp)
dpar = DiracParam{Float64}(SU3fund,2.3,0.0,(1.0,1.0,1.0,1.0),0.0) dpar = DiracParam{Float64}(SU3fund,2.3,0.0,(1.0,1.0,1.0,1.0),0.0,0.0)
dws = DiracWorkspace(SU3fund,Float64,lp); dws = DiracWorkspace(SU3fund,Float64,lp);
randomize!(ymws.mom, lp, ymws) randomize!(ymws.mom, lp, ymws)
@ -32,7 +32,8 @@ end
CUDA.@sync begin CUDA.@sync begin
CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnlg5!(rpsi) CUDA.@cuda threads=lp.bsz blocks=lp.rsz krnlg5!(rpsi)
end end
g5Dw!(prop,U,rpsi,dpar,dws,lp)
g5Dw!(prop,U,rpsi,mtwmdpar(dpar),dws,lp)
CG!(prop,U,DwdagDw!,dpar,lp,dws,10000,1.0e-14) CG!(prop,U,DwdagDw!,dpar,lp,dws,10000,1.0e-14)
Dw!(dws.sp,U,prop,dpar,dws,lp) Dw!(dws.sp,U,prop,dpar,dws,lp)

View file

@ -1,3 +1,6 @@
#include("SAD/test_sad.jl") include("SAD/test_sad.jl")
include("flow/test_adapt.jl") include("flow/test_adapt.jl")
include("dirac/test_fp_fa.jl")
include("dirac/test_solver_plw.jl")
include("dirac/test_solver_rand.jl")