. (4.3)The outer product of vectors {a(n)} \u2208 RIn ,n = 1,2, . . . ,N yields a rank-one tensorX = a(1) \u25e6 a(2) \u25e6 . . .a(N) with entries xi1,i2,...,iN = a(1)i1 a(2)i2 . . .a(N)iN , where the \u25e6 rep-resents the outer product operation. The superscript in parentheses represents oneelement in a sequence, e.g., a(n) represents the nth vector in a sequence of vectors.In order to demonstrate multi-way models, the usual matrix product, such asKronecker product and Khatri-Rao product, is not sufficient. A frequently usedoperation is the mode-n product, denoted by \u00d7n. The mode-n product of a tensorX \u2208 RI1\u00d7I2\u00d7\u00b7\u00b7\u00b7\u00d7IN with a matrix A \u2208 RJn\u00d7In amounts to the product of all mode-nfibers with A and yields a tensor with the size of (I1\u00d7I2\u00d7\u00b7\u00b7 \u00b7\u00d7In\u22121\u00d7Jn\u00d7In+1 \u00b7 \u00b7 \u00b7\u00d7IN), whose entries are given by(X \u00d7n A)i1,i2,...,in\u22121, jn,in+1...,iN=In\u2211in=1xi1,i2,...,in\u22121,in,in+1...iN a jn,in .(4.4)68The mode-n product of a tensor and a vector is a special case of the mode-nproduct of a tensor and a matrix with the size of (1\u00d7 In). Note that the order of theresult is (N\u22121), one less than the order of the original tensor. It is often useful tocalculate the product of a tensor with a sequence of vectors. LetX denote a tensorwith the size of I1\u00d7 I2\u00d7 \u00b7\u00b7 \u00b7\u00d7 IN , and let {a(n)} (n = 1,2, . . . ,N), be a sequenceof vectors, each with the length of In. Then the product of X with a sequence ofvectors in all modes yields a scalar, i.e.,y =X \u00d71 a(1)\u00d72 a(2)\u00d73 \u00b7 \u00b7 \u00b7\u00d7N a(N)=I1\u2211i1=1I2\u2211i2=1. . .IN\u2211iNxi1,i2,...,iN a(1)i1 a(2)i2 . . .a(N)iN .(4.5)We refer the readers to [57, 61] for further details and discussions about varioustensor operations.4.3 Problem FormulationThe problem of interest here is the underdetermined JBSS for multiple datasets,e.g., K datasets. The M observations of each dataset contain the linear mixtures ofthe corresponding N sources. We can model the mixing process as follows,X (k) = A(k)S(k)+E(k), k = 1,2, . . . ,K. (4.6)X (k) = [x(k)1 ,x(k)2 , . . . ,x(k)M ]T denotes the M-dimensional observations with real val-ues and x(k)m is the mth channel of the observations in dataset k. S(k)= [s(k)1 ,s(k)2 , . . . ,s(k)N ]Tmeans the underlying N-dimensional sources with real values and s(k)n is the nthsource for dataset k. A(k) = [a(k)1 ,a(k)2 , . . . ,a(k)N ] \u2208 RM\u00d7N with M < N (i.e., the un-derdetermined case) denotes the unknown mixing matrix, whose nth column a(k)ncorresponds to the source s(k)n for dataset k. E(k) means the possible additive noise69which is generally assumed to be zero mean, temporally white and uncorrelatedwith the source signals.Similar to several existing JBSS methods, e.g. MCCA [72] and JDAIG-SOS[69], we have the following assumptions regarding the sources:(1) The sources are uncorrelated within each dataset:E{s(k)i (t)(s(k)j (t+ \u03c4))T}= 0\u2200\u03c4, 1\u2264 i 6= j \u2264 N, k = 1,2, . . . ,K,(I)where s(k)i (t) is the i-th source in dataset k and s(k)j (t+\u03c4) represents the j-th sourcewith the time delay \u03c4 in dataset k.(2) The corresponding sources from two different datasets have non-zero cor-relations and the sources with different indices across datasets are not correlated:D(\u03c4) =E{S(k1)(t)(S(k2)(t+ \u03c4))T}=Diag(\u03c11(\u03c4),\u03c12(\u03c4), . . . ,\u03c1N(\u03c4)),(II)where Diag(\u00b7) represents the diagonal matrix, the \u03c1n(\u03c4)=E{s(k1)n (t)(s(k2)n (t+\u03c4))T}denotes the covariance between s(k1)n (t) and s(k2)n (t + \u03c4). This assumption meansthat the corresponding sources in multiple datasets are second-order correlated witheach other. In addition, the sources within [s(1)i ,s(2)i , . . . ,s(K)i ] are uncorrelated withthe sources within [s(1)j ,s(2)j , . . . ,s(K)j ] for 1\u2264 i 6= j \u2264 N.The task of estimating the mixing matrices {A(k)} and retrieving the under-lying sources are not equivalent in the underdetermined case. Therefore, mostUBSS methods consist of two stages: estimate the mixing matrices first and thenretrieve the underlying sources. The major problem under consideration is to esti-mate {A(k)} jointly up to permutation and scaling. In this chapter, this problem isaddressed via a specially designed joint tensor decomposition. In addition, retriev-70ing the underlying sources when the mixing matrices are estimated or known is aclassic inverse problem [62]. In order to further demonstrate the performance ofthe proposed mixing matrices estimation method, we also implement an approachfor source recovering based on the estimated A(k).4.4 Canonical Polyadic Decomposition of TensorA polyadic decomposition aims to decompose a higher-order tensor as a linearcombination of rank-one tensors [61, 120]. For the case of a third-order tensorX \u2208 RI\u00d7J\u00d7K , it can be written in the formX =N\u2211n=1an \u25e6bn \u25e6 cn, (4.7)where N is a positive integer and an \u2208 RI , bn \u2208 RJ , cn \u2208 RK . Equivalently, it can bewritten element wisely asxi, j,k =N\u2211n=1ai,nb j,nck,n, (4.8)where i = 1,2, . . . , I, j = 1,2, . . . ,J and k = 1,2, . . . ,K. The rank of a tensor isthe smallest number of rank-one tensors that yield the tensor in the way as (4.7).If rank(X ) = N, (4.7) is the CPD of X , which is also known as the CanonicalDecomposition (CANDECOMP) or Parallel Factor Analysis (PARAFAC) [2]. Thecanonical polyadic approximation means thatX \u2248 [[A,B,C]]\u2261N\u2211n=1an \u25e6bn \u25e6 cn,(4.9)where N = rank(X ).71The factor matrices refer to the combination of the vectors corresponding toeach rank-one tensor and can be written asA = [a1,a2, . . . ,aN ] \u2208 RI\u00d7NB = [b1,b2, . . . ,bN ] \u2208 RJ\u00d7NC = [c1,c2, . . . ,cN ] \u2208 RK\u00d7N .(4.10)To a large extent, the power of CPD mainly stems from its uniqueness property.The uniqueness of CPD means that the decomposition is the only possible combi-nation of rank-one tensors which sum to the objective tensor with the exception ofthe indeterminacies of column permutation and scaling. The permutation indeter-minacy refers to the fact that we can permute the rank-one terms arbitrarily. Thescaling indeterminacy means that we can scale the individual column of the factormatrices as long as their product remains the same, i.e.,X =N\u2211n=1(\u03b11n an)\u25e6 (\u03b12n bn)\u25e6 (\u03b13n cn) if \u03b11n\u03b12n\u03b13n = 1. (4.11)The uniqueness condition is based on the rank of tensors. The most famous resulton uniqueness of CPD was reported by J. Kruskal [64]. Kruskal\u2019s theorem statesthat the CPD of a third-order tensor X \u2208 RI\u00d7J\u00d7K is deterministically unique if N(where N = rank(X )) satisfiesN \u2264 kA+ kB+ kC\u221222, (4.12)where k\u00b7 denotes the k-rank of a given matrix (\u00b7), meaning that k\u00b7 is the largestinteger that any k\u00b7 columns of the matrix (\u00b7) are linearly independent. Checkingdeterministic conditions can be cumbersome. De Lathauwer et al. have studieddifferent methods to determine the rank of a tensor and concluded that the de-72composition of a third-order tensor X \u2208 RI\u00d7J\u00d7K is generically unique (i.e., withprobability one) [30] provided that N satisfiesN \u2264 K and N(N\u22121)\u2264 IJ(I\u22121)(J\u22121)\/2. (4.13)Domanov et al. further complemented the existing bounds for generic uniquenessof the CPD [36] and concluded that the CPD of a third-order tensor X \u2208 RI\u00d7J\u00d7Kof rank N is generically unique if2\u2264 I \u2264 J \u2264 K \u2264 NN \u2264 I+ J+2K\u22122\u2212\u221a(I\u2212 J)2+4K2,(4.14)or3\u2264 I \u2264 J \u2264 N \u2264 KN \u2264 (I\u22121)(J\u22121).(4.15)There are two main approaches to compute the CPD of a tensor, namely thelinear algebra [35] and optimization based methods [2, 99]. Both types of methodshave their own strengths and weaknesses. For a thorough study of the uniquenessconditions and computation, we refer to [30, 34, 61] and the references therein.4.5 Algorithm for Estimating the Mixing Matrices inUJBSSHow to estimate the mixing matrix is still a challenging problem, even for under-determined case of single dataset. In this chapter, we propose a novel and effectivealgorithm to jointly estimate the mixing matrices from multiple dataset, which canbe regarded as an extension of the method based on statistical property of signals,e.g., simultaneous diagonalization of the second order autocovariance and CPD of73Figure 4.1: Illustration of how to generate tensors by incorporating the de-pendence information between each pair of datasets.a specialized tensor [31, 46, 124]. For ease of presentation, we take the case of 3datasets as an example, e.g., X (1), X (2) and X (3), and it can be easily generalizedto the case of more than 3 datasets. The problem is reformulated as joint canonicalpolyadic decomposition of a sequence of third-order tensors, which share commonfactor matrices. It should be mentioned that the proposed method is limited toreal-valued problems and cannot be directly generalized to complex-valued cases.4.5.1 Tensor ConstructionThe cross covariance of the observations with time delay \u03c4 , such as the observa-tions in dataset k1, X (k1)(t), and the observations in dataset k2 with time delay \u03c4 ,X (k2)(t+ \u03c4), can be formulated asE{X (k1)(t)X (k2)(t+ \u03c4)T}=(A(k1))E{S(k1)(t)S(k2)(t+ \u03c4)T}(A(k2))T ,(4.16)where k1 and k2 represent the index of each dataset and range from 1 to 3. Con-sidering the correlations within and between each pair of datasets, the covariance74matrices between X (1) and X (2) with time delay \u03c4 satisfyP(1) = E{X (1)(t)X (2)(t+ \u03c41)T}= (A(1))U (\u03c41)(A(2))T ,P(2) = E{X (1)(t)X (2)(t+ \u03c42)T}= (A(1))U (\u03c42)(A(2))T ,...P(L) = E{X (1)(t)X (2)(t+ \u03c4L)T}= (A(1))U (\u03c4L)(A(2))T ,(4.17)in which \u03c4l means the time delay and the matrix U (\u03c4l) = E{S(1)(t)S(2)(t + \u03c4l)T} isdiagonal, for l = 1,2, ...,L.We stack the sequence of covariance matrices P(1),P(2), . . . ,P(L), denoted as{P(l)}, in a tensor P \u2208 RM\u00d7M\u00d7L as follows: (P)i, j,l = (P(l))i, j, i = 1,2, . . . ,M,j = 1,2, . . . ,M, l = 1,2, . . . ,L. We define the matrix U of size L\u00d7N with theelement U l,n = (U (\u03c4l))n,n, for l = 1, 2, ..., L, n = 1, 2, ..., N. Then we can representP as (see Fig. 4.1):P =N\u2211n=1a(1)n \u25e6a(2)n \u25e6un, (4.18)in which a(1)n and a(2)n are the nth column of the mixing matrices A(1) and A(2)respectively, and un is the nth column of the matrix U .Similarly, the covariance matrix between the other two pairs of observationswith time delay \u03c4l , denoted as Q(l) and R(l) satisfyQ(l) = E{X (1)(t)X (3)(t+ \u03c4l)T}= (A(1))V (\u03c4l)(A(3))T ,R(l) = E{X (2)(t)X (3)(t+ \u03c4l)T}= (A(2))W (\u03c4l)(A(3))T ,(4.19)where V (\u03c4l) = E{S(1)(t)S(3)(t+\u03c4l)T} and W (\u03c4l) = E{S(2)(t)S(3)(t+\u03c4l)T} for l = 1,2, . . . , L. Stack these two sequence of covariance matrices {Q} and {R} in tensorsQ \u2208RM\u00d7M\u00d7L andR \u2208RM\u00d7M\u00d7L as follows: (Q)i, j,l = (Q(l))i, j, (R)i, j,l = (R(l))i, j,75i = 1, 2, ..., M, j = 1, 2, ..., M, l = 1, 2, ..., L. To simplify the notation, we furtherdefine the matrix V \u2208 RL\u00d7N and W \u2208 RL\u00d7N with the element V l,n = (V (\u03c4l))n,n andW l,n = (W (\u03c4l))n,n, for l = 1, 2, ..., L, n = 1, 2, ..., N. Then these two tensors can berepresented as (see Fig. 4.1):Q =N\u2211n=1a(1)n \u25e6a(3)n \u25e6 vn,R =N\u2211n=1a(2)n \u25e6a(3)n \u25e6wn,(4.20)in which a(k)n is the nth column of the mixing matrices A(k) for k = 1, 2, 3, vn andwn are the nth column of the matrix V and W respectively.It should be mentioned that the choices of \u03c41, \u03c42, . . . , \u03c4L may affect the esti-mation precision of the mixing matrices. If \u03c4l is too large, the correlation betweentwo related sources with the delay will be close to 0 and then the covariance matrixmight be ill conditioned. It is desired to select \u03c41, \u03c42, . . . , \u03c4L such that U , V and Ware well conditioned. In addition, if the time delay \u03c4 is too large, the covariancematrix of the sources in two datasets (e.g., U (\u03c4)) will be close to a null matrix andthus the assumption (II) may not hold. Here, we heuristically choose the time delayas \u03c4l \u2208 [0,200] data samples.Fig. 4.1 illustrates how to generate these tensors by incorporating the depen-dence information between each pair of datasets. It is worth noting that each pairof tensors share a common factor matrix, e.g., P and Q are coupled in the modeof A(1).4.5.2 Joint Tensor Polyadic DecompositionConsidering the common latent structure, now the problem of estimating the mix-ing matrices A(k) can be reformulated as a problem of joint CPD of a collection of76tensors, e.g., P , Q and R for the case of three datasets. There are two main ap-proaches to jointly decompose a sequence of tensors, i.e., linear algebra [101] andoptimization based methods [3, 100, 109]. S\u00f8rensen et al. took into account thecoupling between multiple tensors and developed a linear algebra based algorithm[101]. This method can provide an explicit solution for exact tensor decomposi-tion. However, in practice data are noisy and consequently the estimation may benot accurate. In addition, it is notable that the linear algebra based method requiresthe full column rank of the common factor matrices whereas the common factorsin our problem are rank deficient [101]. In this chapter, we generalize the idea ofcoupled matrix and tensor factorization (CMTF) and jointly decompose a sequenceof tensors via gradient-based optimization method [3, 100, 109].The uniqueness condition of the joint CPD is important in practice. Simplysaid, the solution of the joint CPD will be generic unique if all the individual CPDsare unique. In this chapter, we can get the unique solution of each mixing matrixgenerically, providing the number of sources satisfies the condition (4.14) or (4.15).It is worth mentioning that this uniqueness condition of the joint CPD might befurther relaxed, but the topic itself deserves a stand-alone theoretical paper and itis out of scope of the current paper.The aim is to find the factor matrices {A(k)} \u2208 RM\u00d7N and the covariance ofsources between different datasets U ,V and W \u2208 RL\u00d7N which can minimize thefollowing objective function, a variant of Frobenius norm of the difference betweenthe given tensors and their canonical polyadic approximation, written as77f (A(1),A(2),A(3),U ,V ,W )=12\u2016P\u2212 [[A(1),A(2),U ]]\u20162\ufe38 \ufe37\ufe37 \ufe38f (1)(A(1),A(2),U)+12\u2016Q\u2212 [[A(1),A(3),V ]]\u20162\ufe38 \ufe37\ufe37 \ufe38f (2)(A(1),A(3),V )+12\u2016R\u2212 [[A(2),A(3),W ]]\u20162\ufe38 \ufe37\ufe37 \ufe38f (3)(A(2),A(3),W ).(4.21)where [[\u00b7]] denotes the canonical polyadic approximation of a given tensor. Thisequation simultaneously takes the coupling information between different tensorsinto account. We propose to solve this problem via a gradient-based optimizationmethod. Proposition 1 elaborates the partial derivative of the objective function fwith respect to each column of the desired matrices, i.e. {a(k)n }, un, vn and wn forn = 1,2, . . . ,N.The equations in Proposition 1 is proved in the Appendix. Then the gradient off can be assembled via stacking the partial derivatives with respect to each columnof the factor matrices, as\u2207 f =[\u2202 f\u2202a(1)1; \u2202 f\u2202a(1)2; . . . ; \u2202 f\u2202a(1)N; . . . ; \u2202 f\u2202w1 ; . . . ;\u2202 f\u2202wN]T. (4.22)Once we get this gradient, we can calculate the factor matrices, including the mix-ing matrices and the covariance matrices, based on any first-order optimizationmethod. In this chapter, we employ the nonlinear conjugate gradient (NCG) algo-rithm implemented in [38] to solve the unconstrained optimization problem andestimate the mixing matrices of multiple datasets simultaneously. Compared withsecond-order optimization methods, such as Newton-based methods, NCG alwaysrequires less computation and memory[109].78Proposition 1. The partial derivative of the objective function f with respect toeach column of the desired matrices , i.e., {a(k)n }, un, vr and wn, are given by\u2202 f\u2202a(1)n=\u2212P\u00d72 a(2)n \u00d73 un\u2212Q\u00d72 a(3)n \u00d73 vn+N\u2211c=1[(a(2)n )T a(2)c (un)T uc+(a(3)n )T a(3)c (vn)T vc]a(1)c\u2202 f\u2202a(2)n=\u2212P\u00d71 a(1)n \u00d73 un\u2212R\u00d72 a(3)n \u00d73 wn+N\u2211c=1[(a(1)n )T a(1)c (un)T uc+(a(3)n )T a(3)c (wn)T wc]a(2)c\u2202 f\u2202a(3)n=\u2212Q\u00d71 a(1)n \u00d73 vn\u2212R\u00d71 a(2)n \u00d73 wn+N\u2211c=1[(a(1)n )T a(1)c (vn)T vc+(a(2)n )T a(2)c (wn)T wc]a(3)c\u2202 f\u2202un=\u2212P\u00d71 a(1)n \u00d72 a(2)n +N\u2211c=1[(a(1)n )T a(1)c (a(2)n )T a(2)c ]uc\u2202 f\u2202vn=\u2212Q\u00d71 a(1)n \u00d72 a(3)n +N\u2211c=1[(a(1)n )T a(1)c (a(3)n )T a(3)c ]vc\u2202 f\u2202wn=\u2212R\u00d71 a(2)n \u00d72 a(3)n +N\u2211c=1[(a(2)n )T a(2)c (a(2)n )T a(3)c ]wc.4.6 Source Extraction Based on the Estimated MixingMatricesUnlike the (over)determined case, the estimation of the mixing matrix is not equiv-alent to recovering the underlying sources in UBSS. A complete UBSS approachalways consists of both mixing matrix estimation and source extraction, even thoughour main focus in this chapter is the estimation of mixing matrices. Extracting thesources when the mixing matrix is estimated is a classic inverse problem. Manytechniques have already been proposed in the literature, including array process-ing techniques [107] and methods exploiting the sparsity of sources in a domain,e.g., the TF domain [116]. In order to demonstrate the performance of the proposedmixing matrices estimation method, we adopt a recently-developed subspace repre-sentation method [59] to recover the latent sources based on the estimated mixingmatrices. For simplicity, the proposed method for extracting sources is derived79without considering the background noise. However, it was shown to be robust tothe background noise [59].For any underdetermined non-homogeneous linear equation, the complete so-lution can be represented as the sum of its particular solution and a general solu-tion of the corresponding homogeneous equation. As to the case in this chapter,A(k)S(k) = X (k), the general solution of source S(k) can be written asS(k) = S(k)p +S(k)h, (4.23)where the S(k)p denotes its particular solution and S(k)h denotes a general solution ofthe corresponding homogeneous equation A(k)S(k) = 0. One particular solution ofthe above mentioned non-homogeneous equation isS(k)p = (A(k))\u2020X (k), (4.24)where (A(k))\u2020 denotes the pseudo-inverse of the mixing matrix A(k). In addition,the general solution of the homogeneous equation A(k)S(k) = 0 can be expressed asS(k)h =V Z(k), (4.25)where V is an N \u2217 (N\u2212M) matrix whose columns are bases of the nullspace ofA(k) and Z(k) is an arbitrary matrix with the size of (N \u2212M) \u2217 T (T representsthe total number of samples in each channel) [102]. The basis matrix V can beobtained from the mixing matrix A(k) and then the problem which aims to estimatethe N dimensional observations boils down to the problem of estimating N \u2212Mdimensional latent variable Z(k).In order to be applicable to a wide class of signals, such as audio and biologicalsignals EEG, EMG, the Generalized Gaussian Distribution (GGD) [58] is utilized80to model the source distributions. Mathematically, it is expressed in the followingequationpy(y;\u03c3 ,\u03b2 ) =v(\u03b2 )\u03c3exp{\u2212c(\u03b2 )|y\u2212\u00b5\u03c3|2\/(1+\u03b2 )}, (4.26)wherec(\u03b2 ) = (\u0393(3\/2(1+\u03b2 ))\u0393(1\/2(1+\u03b2 )))1\/(1+\u03b2 )v(\u03b2 ) =\u0393(3\/2(1+\u03b2 ))1\/2(1+\u03b2 )\u0393(1\/2(1+\u03b2 ))3\/2,(4.27)in which \u0393(\u00b7) is the Gamma function. \u03c3 is the standard derivation and \u00b5 is the meanof a continuous random variable y. In this chapter, the mean of source is assumed tobe 0. We define the parameter set \u03b8 = {\u03b2 ,\u03c3} for simplicity, where each componentof \u03b2 = [\u03b21, . . . ,\u03b2N ] and \u03c3 = [\u03c31, . . . ,\u03c3N ] correspond to each channel of the sources.The parameters of the GGD \u03b8 can be estimated to maximize the likelihood of theobserved mixtures X (k) based on Expectation-maximization (EM) algorithm. ThenZ can be obtained by sampling from p(Z(k)|X (k),\u03b8) asZ\u02c6(k) =1GG\u2211g=1Z(k)g , (4.28)where {Z(k)1 , . . . ,Z(k)G } are the G samples drawn from p(Z(k)|X (k),\u03b8) using theMarkov Chain Monte Carlo (MCMC) method. Then we recover the underlyingsources based onS\u02c6(k)= (A(k))\u2020X (k)+V Z(k). (4.29)The major steps of the proposed UJBSS-m algorithm are summarized in Algo-rithm 6. The number of time delays is 20 in default. The step size of time delays,i.e. \u03c4l+1\u2212\u03c4l , is suggested to be 2 samples (corresponding to 0.25ms) and 5 samples(corresponding to 5ms) for audio signals and physiological signals respectively.81Algorithm 6 The UJBSS-m algorithm based on joint tensor decompositionInput: M-dimensional observations {X (k)} and the number of sources N in eachdatasets, for k= 1, 2,. . . ,K.Output: the estimated mixing matrices {A(k)} and the recovered N-dimensionalsources {S(k)}, for k= 1, 2,. . . ,K.STEP 1: For each pair of datasets, e.g., X (k1) and X (k2)(k1 6= k2), we calculatethe cross covariance matrices as (5.3) and stack them to construct a third-ordertensor as in Section 4.5.1. Considering the combination of datasets, we get(K2)tensors where each pair of tensors share a common factor matrix, as shown inFig. 4.1;STEP 2: Calculate the joint polyadic decomposition of the tensors constructedin step 1 via optimization based method and estimate the mixing matrices {A(k)}as in Section 4.5.2;STEP 3: Estimate the parameters of the Generalized Gaussian distribution basedon the EM algorithm.Initialize: initialize the parameter \u03b8 to some random values.E-step: calculate the expected value of the log likelihood function with re-spect to the conditional distribution of Z(k) given the observation X (k) under thecurrent estimate of \u03b8 . It can be expressed as Ep(Z(k)|X (k),\u03b8 \u2217)(log(p(Z(k)|X (k),\u03b8))),where \u03b8 \u2217 means the parameter value got in the initialization or the previous M-step.M-step: update the parameter set \u03b8 to maximize the above expected value.The updated value is\u03b8 = argmax\u03b8Ep(Z(k)|X (k),\u03b8 \u2217)(log(p(Z(k)|X (k),\u03b8)))\u2248 argmax\u03b81GG\u2211g=1log(p(Z(k)g |X (k),\u03b8)),where {Z(k)1 , . . . ,Z(k)G } are G samples drawn from p(Z(k)|X (k),\u03b8) based on theMCMC method.Iterate: iterate the E-step and M-step until convergence.STEP 4: Recover the sources S(k) based on the minimum mean-square errorcriterion as in Equation (4.29).824.7 Numerical Study for the Multiple Dataset CaseTo demonstrate the joint separation performance for multiple datasets, simulationsare performed on both audio and biological signals when applying the proposedUJBSS-m and several commonly used BSS methods. Two performance indicesare used to evaluate the separation performances. One is the estimation error of themixing matrices, defined as:Error = 10log10{mean( ||A\u2212 A\u02c6||||A|| )}, (4.30)where A\u02c6 denotes the optimally ordered estimate of A. The other measures the Pear-son correlation coefficient (PCC) between the estimated sources and the originalones, which is defined asPCC(s(k)n , s\u02c6(k)n ) =cov(s(k)n , s\u02c6(k)n )\u03c3s(k)n\u03c3s\u02c6(k)n, (4.31)where s\u02c6(k)n means the estimate of the source s(k)n in the kth dataset, cov(\u00b7, \u00b7) meansthe covariance between two variables and \u03c3 means the standard deviation. In orderto ensure the dependence between the sources of each pair of datasets, the sourcesare synthesized as follows,S(1) =[s(1)1 ,s(1)2 , . . . ,s(1)N ]T ;S(2) =S(1).\u2217 (uni f rnd(0,1,S(1))S(3) =S(1).\u2217 (uni f rnd(0,1,S(1)),(4.32)where uni f rnd(0,1,S(1)) generates a matrix with the same size of S(1) and eachelement of the matrix is randomly drawn from the continuous uniform distributionon the interval (0,1). The average correlation between the source s(1)n and the cor-83responding source s(k)n (k = 2,3) is about 0.85; the average correlation between thesource s(2)n and the corresponding source s(3)n is about 0.7, both of which can beregarded as highly correlated.4.7.1 Simulation 1: Audio SignalsThe sources used in this simulation include 8 audio signals, such as two pieces ofsound from the cable news network (CNN) news and a piece of sound of an anony-mous singer, all of which are publicly available1. The sampling rate is 8000Hz.The mixing matrices are generated randomly with elements following the uniformdistribution U [\u22121,1]. For simplicity, each column of the mixing matrices is nor-malized into a unit vector. Three datasets are generated following (4.32). In ourfirst setting, 5 sources are mixed into 4 observations in each dataset and the cor-responding sources in the different datasets are highly correlated. With differentsignal-to-noise ratios (SNRS)), we compare the proposed UJBSS-m method witha commonly-used single-set UBSS method, SOBIUM [31], when it is applied toeach dataset separately. We also test the performance of our recent work on UJBSSfor two datasets, UJBSS-2 [124], when two datasets are available, e.g., X (1) andX (2). We repeat the simulation 1000 times and the performance is shown in Fig.4.2. Results are given according to the SNR level in the range of -5dB - 40dB.Benefiting from dependence information between different datasets, the proposedUJBSS can provide more accurate estimation of the mixing matrices, while SO-BIUM neglects the possible inter-dataset information. Compared with UJBSS-2,the proposed UJBSS-m takes into account more dependence information, amongthree datasets rather than between two datasets, and yields better performance.We also note that the Error measure from the single set UBSS and UJBSS meth-ods decreases with the increase of the SNR. The proposed UJBSS-m consistently1http:\/\/research.ics.aalto.fi\/ica\/cocktail\/sounds.html84provides the best results over the whole SNR range, suggesting the performancestability of the proposed algorithm.We also examine the performance of the proposed method with the decrementof the under-determinacy level, i.e., the number of the observations increases from4 to 7 while the number of sources is set to be 8. As noted in Fig. 4.3, the estima-tion performance is getting better when more observations are available. Besidesthe degraded estimation precision, a higher under-determinacy level also requireshigher computational complexity. The performance is getting better when the SNRis increased from -5dB to 20dB. The change of estimation error is not obviouswhen the SNR is greater than 20dB even there are some fluctuations. It is shownthat the estimation performance relies upon several factors such as the noise levelSNR, under-determinacy level (i.e., the number of sources for a given number ofsensors) and the correlation between each pair of datasets.We recover the latent sources from each dataset based on the estimated mixingmatrices. In an illustrative example, we linearly mix 4 audio sources into 3 obser-vations in each dataset. Fig. 4.4 shows the separation results of the first dataset inthe time domain. The top four subfigures of Fig. 4.4 represent the original sources,the middle three subfigures are the mixed observations and the bottom four sub-figures are the recovered sources via the proposed UJBSS-m method. In addition,we compare the proposed method with other three single-set UBSS methods, in-cluding SOBIUM, UBSS based on subspace representation (UBSS-SR for short)[59] and UBSS based on sparse coding (UBSS-SC for short) [119], as well as theJBSS method MCCA [72] and UJBSS for two datasets UJBSS-2 [124], in termof the PCC between the original sources and the recovered ones. Both UBSS-SR and UBSS-SC are based on single source detection, which assumes that the TFpoints are occupied by a single source or the corresponding single source possessesdominant energy. However, the performance of estimating the mixing matrix de-85\u221210 \u22125 0 5 10 15 20 25 30 35 40\u221214\u221212\u221210\u22128\u22126\u22124\u221220SNR (dB)Error (dB) UBSS SOBIUM X(1)UJBSS\u22122 X(1)&X(2)UJBSS\u22122 X(1)&X(3)UJBSS\u2212m X(1),X(2)&X(3)Figure 4.2: Simulation 1: performance comparisons on audio signals whenusing the proposed UJBSS-m method and other UBSS methods, includ-ing the single-set UBSS method SOBIUM [31] and the UJBSS methodfor two dataset, i.e., UJBSS-2 [124]. Here the number of sources N =5 and the number of observations M = 4. The number of time delays L= 20 and the step size of time delays (i.e., \u03c4l \u2212 \u03c4l\u22121) is 2 data samples,corresponding to 0.25ms. Similar results are observed for A(2) and A(3).teriorates when this assumption is not satisfied. Furthermore, given that the time-frequency analysis method [116, 124] is memory-intensive and time-consuming,we estimate the mixing matrices via UJBSS-2 and SOBIUM respectively, and thenextract the sources using the same method as in UBSS-SR [59]. The performanceresults of these six methods are reported in Table 4.1.Despite adopting the same technology in extracting sources, the performanceof the proposed method is significantly better than that of the single-set SOBIUM86\u22125 0 5 10 15 20 25 30 35 40\u221210\u22129\u22128\u22127\u22126\u22125\u22124\u22123\u22122\u221210SNR (dB)Error (dB) M = 4, N = 8M = 5, N = 8M = 6, N = 8M = 7, N = 8Figure 4.3: Simulation 1: estimation error of A(1) when employing the pro-posed UJBSS method. Here the number of sources N = 8 and the num-ber of observations M varies from 4 to 7. The number of time delays L= 20 and the step size of time delays (i.e., \u03c4l \u2212 \u03c4l\u22121) is 2 data samples,corresponding to 0.25ms.and UBSS-SR method. This observation confirms the importance of estimatingmixing matrices accurately. In addition, the proposed method also outperforms arecently proposed UBSS method UBSS-SC. The main reason is that such UBSSmethods always require the sparsity of the sources to some extent, while the as-sumption may not be satisfied in reality. MCCA, which has been successfully usedin many fields [27], assumes that the number of sources is equal to the numberof observations in each dataset and it could not be used to separate sources in theunderdetermined case directly. We add one observation in each dataset so that87AmplitudeTime Index0 1 2 3 4 5x 104\u2212505Original s(1)10 1 2 3 4 5x 104\u22126\u22124\u221220246Original s(1)20 1 2 3 4 5x 104\u221210\u22125051015Original s(1)30 1 2 3 4 5x 104\u22126\u22124\u22122024Original s(1)40 1 2 3 4 5x 104\u221210\u221250510Observation x(1)10 1 2 3 4 5x 104\u2212505Observation x(1)20 1 2 3 4 5x 104\u221210\u22125051015Observation x(1)30 1 2 3 4 5x 104\u2212505Estimated s(1)10 1 2 3 4 5x 104\u22124\u221220246Estimated s(1)20 1 2 3 4 5x 104\u221210\u22125051015Estimated s(1)30 1 2 3 4 5x 104\u22126\u22124\u22122024Estimated s(1)4Figure 4.4: Simulation 1: an illustrative example from the proposed UJBSS-m method. First row: The original 4 sources; Second row: 3 channelsof the mixed observations; Third row: the recovered 4 sources from thefirst dataset.MCCA can be applied. Therefore it is not really a fair setting and comparison tothe proposed method. However, we note that the performance of MCCA is notas good as that of the proposed method, even with an additional observation sig-nal. The following reasons could contribute to the worse performance of MCCA:it is mainly due to the fact that the correlation coefficients between sources in twodatasets are quite close [20, 72]; the performance of MCCA may suffer from erroraccumulation of the deflation-based separation methods [69].4.7.2 Simulation 2: Physiological SignalsIn this experiment, we employ four physiological signals as sources, includingECG, EEG, EOG and EMG from a publicly available database [45]. The samplingrate is 1000 Hz. The sources corresponding to the other two datasets are generatedfollowing (4.32). We get similar results as that in the simulation 1. As can be seenfrom Fig. 4.5, the estimation performance is getting better with the increase of the88Table 4.1: PCC performance results in Simulation 1.Methods s1 s2 s3 s4Dataset 1UJBSS-m 0.993 1.000 0.881 0.992UJBSS-2[124] 0.989 1.000 0.859 0.990SOBIUM[31] 0.980 1.000 0.512 0.967UBSS-SC[119] 0.951 0.965 -0.066 0.934UBSS-SR[59] 0.901 0.999 0.226 0.967MCCA\u2217[72] 0.571 0.909 0.675 0.732Dataset 2UJBSS-m 0.910 0.998 0.740 0.944UJBSS-2[124] 0.897 0.998 0.714 0.892SOBIUM[31] 0.877 0.998 0.686 0.962UBSS-SC[119] 0.795 0.940 -0.401 0.607UBSS-SR[59] 0.885 0.981 0.730 0.954MCCA\u2217[72] -0.574 0.911 -0.678 -0.733Dataset 3UJBSS-m 0.899 1.000 0.783 0.870UJBSS-2[124] 0.720 1.000 0.515 0.869SOBIUM[31] 0.756 -0.951 0.078 0.880UBSS-SC[119] 0.381 -0.434 -0.614 -0.630UBSS-SR[59] -0.557 0.987 0.620 -0.561MCCA\u2217[72] 0.577 -0.911 -0.679 0.732(1) \u2217We add one additional observation in each dataset when we evaluate theMCCA.SNR of observations. In the whole SNR range, the proposed UJBSS-m method es-timates the mixing matrices with higher accuracy than the single-set UBSS methodSOBIUM and UJBSS method for two datasets UJBSS-2.We also investigate the effect of the time delays, as shown in Fig. 4.6. Athigh SNR level, e.g. SNR = 20dB, the average Error of the proposed UJBSS-mis -11.57dB when the step size of the time delays is 5 data samples,correspondingto 5ms. However, the average Error corresponding to 3 data samples is -6.67dB,significantly larger than that of 5 data samples. The main reason is that the changeof the covariance matrices is not obvious for the small step size and the covariance89matrices related to these delays could not provide enough information to estimatethe common factors, i.e. the mixing matrices. If the time delay is too large, suchas more than 500 data samples (corresponding to 500ms), the covariance betweentwo datasets will be close to 0. Here, we select 5 data samples as the step size ofthe time delays. In practice, we should select the time delays empirically based onthe characters of the sources, e.g., we suggest the time delays smaller than 100msfor physiological signals. In addition, we evaluate the role of the number of timedelays and find that it has less impact on the performance. In this chapter, we setthe number of time delays to 20.We further show the performances in term of the PCC results between the orig-inal sources and the estimated ones. As shown in Table 4.2, the proposed methodyields promising results when it is used to separate the latent and underdeterminedmixtures. Compared to the classical JBSS method MCCA, the proposed UJBSSapproach needs fewer number of observations in each dataset, while it yields abetter performance.4.8 A Case Study: Solve A Single Set UBSS ProblemBased on UJBSS-mIn this section, we show that the proposed UJBSS-m method can be employed tosolve a single set UBSS problem, the noise enhanced signal processing problem,with a superior performance. As in Simulation 1, we employ 5 real audio signalsas the sources. These 5 audio signals are mixed into 4 observations with a mixingmatrix A(1) whose elements follow the uniform distribution U [\u22121,1]. We generatethree datasets asX (1) = A(1)S(1)X (2) = awgn(X (1),20 dB)X (3) = awgn(X (1),20 dB),(4.33)90\u221210 \u22125 0 5 10 15 20 25 30 35 40\u221214\u221212\u221210\u22128\u22126\u22124\u22122SNR (dB)Error (dB) UBSS SOBIUM X(1)UJBSS\u22122 X(1)&X(2)UJBSS\u22122 X(1)&X(3)UJBSS\u2212m X(1),X(2)&X(3)Figure 4.5: Simulation 2: performance comparisons on physiological signalsbetween the proposed UJBSS-m method and two other methods (i.e.,the single-set UBSS method SOBIUM [31] and UJBSS method for twodataset, UJBSS-2 [124]). Here the number of sources N = 4 and thenumber of observations M = 3. The number of time delays L = 20 andthe step size of time delays (i.e., \u03c4l \u2212 \u03c4l\u22121) is 2 data samples. Similarresults are observed for A(2) and A(3).where awgn(X (1),20 dB) represents adding white Gaussian noise to the signalsX (1) (i.e., the real observations) with SNR of 20dB. Noise, traditionally regardedas the unwanted signal, can play a very important constructive role in estimationproblems, which is known as noise enhanced signal processing. X (2) and X (3) arerandom noise added signals based on X (1).The problem of interest here is to estimate the mixing matrix A(1). Tradition-ally, we can estimate A(1) from the dataset X (1) based on the single set UBSS91\u22125 0 5 10 15 20\u221212\u221210\u22128\u22126\u22124\u221220SNR (dB)Error (dB) 13579Figure 4.6: Simulation 2: performance of the proposed UJBSS-m methodwhen the step size of time delays (i.e., \u03c4l \u2212 \u03c4l\u22121) varies from 1 to 9.Here the number of sources N = 4 and the number of observations M =3. The number of time delays L = 20. Similar results are observed forA(2) and A(3).method SOBIUM. Here we also can apply the proposed algorithm as mentionedin Section 4.5. Then we recover the sources via the method mentioned in Section4.6 based on the estimated mixing matrix A(1). We repeat the experiment for 1000times and calculate the sum of the absolute PCC (SAPCC) between the recoveredsources and the original ones, which is calculated asSAPCC =5\u2211n=1abs(PCC(s(1)n , s\u02c6(1)n )), (4.34)where abs(\u00b7) represents the absolute value function. Fig. 4.7 shows the distributionof the performance for 1000 repeats of the experiments. The average SAPCC for92Table 4.2: PCC performance results in Simulation 2.Methods ECG EOG EEG EMGDataset 1UJBSS-m 0.985 0.994 0.910 0.831UJBSS-2[124] 0.975 0.986 0.872 0.781SOBIUM [31] -0.815 -0.985 0.800 -0.382UBSS-SC[119] 0.090 0.949 -0.707 -0.679UBSS-SR[59] 0.504 -0.906 0.115 0.503MCCA\u2217[72] 0.613 -0.754 0.695 0.674Dataset 2UJBSS-m 0.956 0.821 1.000 0.832UJBSS-2[124] 0.883 0.705 0.942 0.308SOBIUM [31] 0.851 -0.467 0.999 0.726UBSS-SC[119] -0.420 0.544 0.776 0.726UBSS-SR[59] -0.894 0.288 -0.967 -0.203MCCA\u2217[72] 0.634 -0.754 0.678 -0.676Dataset 3UJBSS-m 0.779 0.738 0.998 0.997UJBSS-2[124] 0.777 0.738 0.998 0.997SOBIUM [31] 0.443 0.697 -0.993 -0.983UBSS-SC[119] 0.394 0.638 0.722 0.848UBSS-SR[59] 0.582 0.609 0.426 0.871MCCA\u2217[72] -0.624 -0.756 0.689 0.676(1) \u2217We add one additional observation in each dataset when we evaluate theMCCA.UBSS is 4.53 while that for the proposed UJBSS-m is 4.76, even with the samesource extraction technology. The one-way Analysis of Variance (ANOVA) is per-formed on the results provided by these 1000 repeats. The obtained p value is1.5677e-11, which means that the results of the proposed UJBSS-m method andsingle set UBSS method are significantly different. The proposed UJBSS-m algo-rithm demonstrates more robust and better performance. This example illustratesthat the estimation accuracy could be improved by adding suitable noises to theinput signals.93UBSS SOBIUM UJBSS\u2212m44.14.24.34.44.54.64.74.84.95SAPCCFigure 4.7: The sum of the absolute correlation coefficients between the re-covered sources and the original ones. The blue asterisks represent theaverages and the red lines stand for the medians. The edges of the boxare the lower and upper quartiles.4.9 Conclusions and DiscussionThis chapter is the third work in my PhD study. Benefiting from the dependenceinformation between two datasets, the UJBSS method proposed in chapter 3 gainedpromising performance. It is nature to generalize the idea for two datasets intomultiple datasets. However, the UJBSS-2 for two datasets cannot be directly uti-lized to solve the problem in multiple datasets. In this chapter, we generalize theUJBSS for two datasets (i.e., UJBSS-2) to the case of multiple datasets. The basicidea is similar to that in UJBSS-2, which estimate the mixing matrices jointly firstand then restore the source signals. In this chapter, we exploit the cross correlationof the observations between each pair of datasets and present a novel underdeter-94mined joint blind source separation method, namely UJBSS-m, to jointly estimatethe mixing matrices from multiple datasets when the number of observations issmaller than that of the sources. The mixing matrices are accurately estimatedthrough joint canonical polyadic decomposition of a sequence of specialized ten-sors in which a set of covariance matrices are stacked. Further the sources arerecovered based on the estimated mixing matrices. Numerical results on multi-ple datasets demonstrate the superior performances of the proposed method whencompared to the commonly used JBSS and single-set UBSS methods.As an example application for noise enhanced signal processing, we also showthat the proposed UJBSS-m method can be utilized to solve the single-set UBSSproblem when suitable noise is added to the observations. In addition, the proposedUJBSS-m method does not rely upon sparsity of signals and therefore it can beapplied to a wide class of signals when: 1) the sources within each dataset areuncorrelated and 2) the sources across different datasets are correlated only oncorresponding indices.95Chapter 5Removing Muscle Artifacts fromEEG Data via UnderdeterminedBlind Source SeparationEEG recordings are often contaminated by artifacts from EMG. These artifacts re-duce the quality of the EEG signals and disturb further analysis of EEG, such asin brain connectivity modeling. If enough number of EEG recordings are avail-able, then there exists a considerable range of BSS methods which can suppressor remove the distorting effect of such artifacts. However, for many practical ap-plications, such as the ambulatory health-care monitoring, the number of sensorsused to collect EEG is limited. As a result, the conventional BSS methods, suchas CCA and ICA, do not work in such cases. Considering the increasing needfor biomedical signal processing in ambulatory environment, this chapter proposesa novel underdetermined BSS method exploring the cross correlation and auto-correlation of underlying sources. We evaluate the performance of our proposedmethod through numerical simulations in which EEG recordings are contaminated96with muscle artifacts. The results demonstrate that the proposed method can ef-fectively and efficiently remove muscle artifacts meanwhile preserving the EEGactivity successfully. This is a promising tool for real-world biomedical signalprocessing applications.5.1 Motivation and ObjectivesEEG is extensively used in brain science research, such as neuroscience and cogni-tive science [106]. However, it is susceptible to various physiological factors otherthan neural activities. ECG from cardiac activities, EOG from ocular movements,and EMG from muscular activities are the most common artifacts. These unde-sired artifacts interferer with the signal of interest and disturb subsequent analysisof EEG signals. Compared to other types of artifacts, it is generally more chal-lenging to remove artifacts from the contracting head muscles (i.e. EMG signals)[79]. The main reasons for this difficulty are four folds: 1) EMG signals alwayshave higher amplitude than the smaller EEG signals; 2) EMG signals have a widespectral distribution, and especially overlaps with the beta activity in 15-30 Hzof EEG signals; 3) EMG signals have a broad anatomical distribution and can bedetected across the entire scalp; 4) EMG signals also exhibit less repetition andconsequently are harder to stereotype [18, 21, 106].A artifacts removal is clearly an important issue and is a prerequisite step forour subsequent analysis. In order to remove the EMG artifacts, a number of ap-proaches have been proposed, such as filtering, regression and EMD. An alterna-tive strategy is based on BSS, which is more commonly used and demonstratedto be effective for removing artifacts from EEG. As one of the most popular BSSmethods, ICA has been extensively explored for this purpose [21, 28], aiming toseparate multichannel EEG into statistical independent components (ICs, i.e. un-derlying sources). Then the ICs determined to be artifacts can be discarded and the97remaining ICs can be used to reconstruct the artifact-free EEG. However, due to theissue of crosstalk between brain and muscle activity, ICs containing EEG signalsare still contaminated by EMG [65, 83]. Therefore, ICA itself may not performeffectively in removing EMG from EEG.Second order blind identification makes use of the temporal correlation and isshown as an effective alternative to ICA in removing EMG artifacts. However,this method is designed for stationary signal and it may suffer when the under-lying source is nonstationary, such as in the case of transient muscular activities[23]. More recently, CCA has been explored as a more reliable method to re-move EMG from scalp EEG. It aims to find mutually uncorrelated sources whichare maximally autocorrelated. Compared with EEG, EMG has relatively low au-tocorrelation. Taking advantage of this distinguishable feature, sources with highautocorrelation should correspond to EEG while the source with relatively lowautocorrelation is regarded as the EMG artifact. Then the underlying EMG com-ponents are discarded to reconstruct the EEG signals. As recently suggested in[21, 29, 43], CCA can achieve superior performance over ICA.It is worth noting that the above mentioned BSS algorithms generally assumethat the number of sources is equal to or less than that of the observations. How-ever, for many practical applications, such as ambulatory health-care monitoring,it is desirable to collect the mixed signals using fewer sensors. In these cases, theabove assumption does not hold and UBSS is required. Again, considering theincreasing need for biomedical signal processing in ambulatory environment, thischapter proposes a novel UBSS method that investigates second-order statistics ofthe observations. Same as the existing UBSS methods, this proposed method con-sists of two steps: the mixing matrix is estimated first, followed by the separationof underlying sources based on the estimated result of the mixing matrix.More specifically, inspired by stochastic resonance [78], we add tiny random98noise to the EEG recordings and construct multiple datasets across which the un-derlying sources are highly correlated. We further explore the cross correlationand autocorrelation of underlying sources, rather than solely the partial cross-correlation as in our previous paper [111, 124]. The mixing matrix is estimatedaccurately via joint polyadic tensor decompositions of a set of tensors where spa-tial covariance matrices corresponding to different time delays are stacked. Fur-thermore, the underlying sources, including EEG and EMG, are inferred based onthe estimated mixing matrix from the EEG observations. Sources related to mus-cle activity are identified and removed during EEG reconstruction. We evaluatethe performance of the proposed method through numerical simulations in whichEEG recordings are contaminated with muscle artifacts. The results demonstratethat the proposed method can effectively and efficiently remove muscle artifactswhile preserving the EEG successfully.5.2 Problem FormulationThe problem of interest here is to recover the underlying N sources with a limitednumber of observations. In this study, EEG observation signals are denoted bya matrix X(t) = [x1(t);x2(t); . . . ;xM(t)] \u2208 RM\u00d7T , where M represents the numberof EEG observations and T is the number of data samples. It is assumed that theunderlying N sources S, including the signal of interest (i.e. EEG) and undesiredartifacts (i.e. EMG), are linearly mixed into the M observations X . The mixingprocess is modeled as follows,X = AS+E. (5.1)A \u2208RM\u00d7N with M < N (i.e., the underdetermined case) denotes the unknown mix-ing matrix. E is the possible additive noise which is generally assumed to be the99zero mean, temporally white and uncorrelated with the source signals.5.3 Proposed MethodIt is suggested in [111] that the UJBSS method can be used to solve the single-set UBSS problem. However, this only utilizes part of the cross-correlation [111]and neglects the autocorrelation and the other parts of cross-correlation. In thischapter, we fully exploit the cross-correlation and autocorrelation of the sourcesand propose a novel UBSS algorithm. We add tiny noise to the observations, X(which is also expressed as X (1)), and construct the other two datasets as,X (1) =X = [x(1)1 ,x(1)2 , . . . ,x(1)N ]TX (2) =awgn(X (1),20dB)X (3) =awgn(X (1),20dB),(5.2)in which awgn(X (1),20dB) adds white Gaussian noise to the measured EEG obser-vation signals X (1), with the signal-to-noise ratio equaling to 20dB. (5.2) ensuresdependence between each pair of datasets. Similar to CCA, it is reasonable to as-sume that:(1) The sources in each dataset are uncorrelated.E{s(k)i (t)(s(k)j (t+ \u03c4))T}= 0\u2200\u03c4, 1\u2264 i 6= j \u2264 N, k = 1,2,3,(I)where s(k)i (t) is the i-th source in dataset k and s(k)j (t+\u03c4) represents the j-th sourcewith the time delay \u03c4 in dataset k.(2) The corresponding sources from two different datasets have non-zero cor-100relations and sources with different indices across datasets are not correlated.D(\u03c4) =E{S(k1)(t)(S(k2)(t+ \u03c4))T}=Diag(\u03c11(\u03c4),\u03c12(\u03c4), . . . ,\u03c1N(\u03c4)),(II)where Diag(\u00b7) represents the diagonal matrix, and the \u03c1n(\u03c4) = E{s(k1)n (t)(s(k2)n (t +\u03c4))T} denotes the covariance between s(k1)n (t) and s(k2)n (t + \u03c4). This assumptionsuggests that the corresponding sources in multiple datasets are second-order cor-related with each other.In this chapter, we fully exploit the second order auto covariance and cross co-variance of EEG signals and propose a novel and effective algorithm to estimatethe mixing matrix. The problem is reformulated as a joint canonical polyadic de-composition of a sequence of third-order tensors, which share the common factormatrix A(1). The auto covariance of the EEG signals can be formulated asE{X (1)(t)X (1)(t+ \u03c4)T}=(A(1))E{S(1)(t)S(1)(t+ \u03c4)T}(A(1))T ,(5.3)where \u03c4 represents the time delay. The covariance matrices corresponding to dif-ferent time delays, \u03c41 to \u03c4L, satisfy,B(1) = E{X (1)(t)X (1)(t+ \u03c41)T}=(A(1))C(\u03c41)(A(1))T...B(L) = E{X (1)(t)X (1)(t+ \u03c4L)T}=(A(1))C(\u03c4L)(A(1))T ,(5.4)in which C(\u03c41) = E{S(1)(t)S(1)(t + \u03c4l)T} is diagonal, l = 1 . . .L. We stack the auto101covariance matrices {B(l)} in a tensorB \u2208 RM\u00d7M\u00d7L as follows:(B)i, j,l = (B(l))i, j, (5.5)in which i = 1,2, . . . ,M, j = 1,2, . . . ,M, l = 1,2, . . . ,L. We define the matrix C ofsize L\u00d7N with the element Cl,n = (C(\u03c4l))n,n, for l = 1, 2, ..., L, n = 1, 2, ..., N. Thenwe have:B =N\u2211n=1a(1)n \u25e6a(1)n \u25e6 cn, (5.6)in which the \u25e6 denotes the outer product operation, a(1)n is the nth column of themixing matrix A(1), and cn is the nth column of the matrix C.The cross covariance between the EEG signals X (1) and the noise-added signals(i.e. X (2) and X (3)) with time delay \u03c4 , can be formulated as:E{X (1)(t)X (k)(t+ \u03c4)T}=(A(1))E{S(1)(t)S(k)(t+ \u03c4)T}(A(k))TE{X (1)(t+ \u03c4)X (k)(t)T}=(A(1))E{S(1)(t+ \u03c4)S(k)(t)T}(A(k))T(5.7)in which k = 2 or 3. Considering the correlations within and between each pair ofdatasets, the cross covariance matrices corresponding to time delay \u03c4l satisfy:F(l) = E{X (1)(t)X (2)(t+ \u03c4l)T}= (A(1))G(\u03c4l)(A(2))TH(l) = E{X (1)(t+ \u03c4l)X (2)(t)T}= (A(1))I(\u03c4l)(A(2))TJ(l) = E{X (1)(t)X (3)(t+ \u03c4l)T}= (A(1))K(\u03c4l)(A(3))TP(l) = E{X (1)(t+ \u03c4l)X (3)(t)T}= (A(1))Q(\u03c4l)(A(3))T ,(5.8)in which the cross variance between the sources across each pair of datasets, G(\u03c4l), I(\u03c4l),K(\u03c4l),Q(\u03c4l),102are diagonal. Similar to (5.5), these sets of cross variance matrices are stacked intotensorsF ,H ,J ,P , which can be represented as,F =N\u2211n=1a(1)n \u25e6a(2)n \u25e6gnH =N\u2211n=1a(1)n \u25e6a(2)n \u25e6 inJ =N\u2211n=1a(1)n \u25e6a(3)n \u25e6 knP =N\u2211n=1a(1)n \u25e6a(3)n \u25e6qn.(5.9)Considering the common latent structure where each pair of tensors share thefactor matrix A(1), the mixing matrix can be estimated via joint CPD of a collectionof tensors B,F ,H ,J ,P . In this chapter, we generalize the idea of coupledmatrix and tensor factorization (CMTF) and jointly decompose these tensors viathe gradient-based optimization method [3, 100, 109]. The objective function canbe expressed as:f (A(1),A(2),A(3),C,G, I,K,Q)=12\u2016B\u2212 [[A(1),A(1),C]]\u20162+ 12\u2016F \u2212 [[A(1),A(2),G]]\u20162+12\u2016H \u2212 [[A(1),A(2), I]]\u20162+ 12\u2016J \u2212 [[A(1),A(3),K]]\u20162+12\u2016P\u2212 [[A(1),A(3),Q]]\u20162(5.10)where [[\u00b7]] denotes the canonical polyadic approximation of a given tensor. Thisequation simultaneously takes the coupling information between different tensorsinto account. We propose to solve this problem via a gradient-based optimizationmethod. The partial derivative of the objective function f with respect to each103column of A(1) is:\u2202 f\u2202a(1)n=\u22122N\u2211n=1(B\u00d73 cn)a(1)n +2N\u2211d=1(cTn cd)((a(1)n )T a(1)d )a(1)d\u2212F \u00d72 a(2)n \u00d73 gn\u2212H \u00d72 a(2)n \u00d73 in\u2212J \u00d72 a(3)n \u00d73 kn\u2212P\u00d72 a(3)n \u00d73 qn+N\u2211d=1[(a(2)n )T a(2)d (gn)T gd +(a(2)n )T a(2)d (in)T id +(a(3)n )T a(3)d (kn)T kd+(a(3)n )T a(3)d (qn)T qd ]a(1)d .(5.11)Similarly, we can calculate the partial derivative of f with respect to other fac-tor matrices and obtain the gradient. Then the mixing matrix A(1) can be calcu-lated based on any first-order optimization method. In this chapter, considering thecompetitive advantage of \u2018efficiency\u2019 and \u2018requiring less memory\u2019, we employ thenonlinear conjugate gradient algorithm (NCG) implemented in [38] to solve theunconstrained optimization problem and further estimate the mixing matrix A(1).Once the mixing matrix is estimated, extracting the sources is a classic inverseproblem. Here, we adopt a recently-developed subspace representation method[59] to recover the latent sources based on the estimated mixing matrix. The de-tails of this method can be found in Chapter 4. Next, the recovered sources aresorted in terms of their autocorrelations. Due to the relatively low autocorrelationof EMG signals, muscle artifacts are isolated and set to 0 during reconstruction.Subsequently, the cleaned signals Xeeg can be obtained. The major steps of theproposed method are summarized in Algorithm 7.5.4 Data Generation and Performance IndicesIn order to evaluate the performance of the proposed method, obtaining the groundtruth, i.e. the pure EEG and EMG signals, is quite necessary. In previous studies[29], to obtain the ground truth EEG signals, experienced neurophysiologist in-104Algorithm 7 The proposed method for removing muscle artifact from EEG signalsInput: M-dimensional observations XOutput: The artifact-free EEG data Xeeg.1: Create the other two data via adding Gaussian white noise to the EEG obser-vations X ;2: Calculate the auto covariance and the cross covariance as in (5.4) and (5.8)with different time delays, and construct a sequence of third-order tensors;3: Calculate the joint CPD of the tensors constructed in step 2 and estimate themixing matrix A (is also expressed as A(1));4: Recover the underlying sources S(1), including EEG and EMG artifacts;5: Sort the recovered sources S(1) in term of their autocorrelations and recognizethe EMG artifacts from them;6: Set the rows of S(1) corresponding to muscle artifacts to be zero and get S(1)new;7: Reconstruct the artifact-free EEG signals Xeeg via Xeeg = A(1)S(1)new.spected many EEG recordings and select the clean EEG signals from them. How-ever, frequent difficulties have surfaced in acquiring artifact-free EEG signals in re-ality, and it is even more difficult to ensure that the selected signals are completelyfree of muscle activities. In this section, we generate synthetic EEG and EMG sig-nals, and examine the performance of the proposed method when the ground truthis available.The simulated EEG sources are generated according to the phase-resetting the-ory proposed by Markinen et al. [75]. As in [117] and [21], we generate each EEGsource by adding 4 sinusoids with frequencies randomly chosen from the rangeof 4 Hz to 30 Hz. To illustrate the performance of the proposed method, N EEGsources, SEEG, are produced independently. Here, we set N to be 4. Analogous tothe work of Delorme et al. [33], an EMG source, SEMG, is simulated using randomnoise band-pass filtered between 20 and 60 Hz. The sampling rate here is 250 Hzand each channel is 40 seconds long. In addition, the 4-channel EEG observationsare modeled as,105X = AS= A[SEEG;SEMG],(5.12)where S includes 4 EEG sources and 1 EMG source, and A is the mixing matrixgenerated randomly with elements following the uniform distribution U [\u22121,1]. Forsimplicity, each column of the mixing matrix is normalized into a unit vector.To fairly compare the proposed method with existing EMG artifacts removalmethods, 1000 independent simulations are implemented and three performanceindices are employed. The first performance measurement is the mean relativeestimation error of the mixing matrix A, which is defined as:Error = 10log10{mean( ||A\u2212 A\u02c6||||A|| )}, (5.13)where A\u02c6 denotes the optimally ordered estimate of A. The second measure is theMean of Absolute Correlation (MAC) between the estimated sources and the origi-nal ones, which is defined asMAC = mean(1Nn=N\u2211n=1|cov(sn, s\u02c6n)\u03c3sn\u03c3s\u02c6n|), (5.14)where s\u02c6n represents the estimate of the source sn, cov(\u00b7, \u00b7) represents the covariancebetween two variables and \u03c3 denotes the standard deviation. The Relative RootMean Squared Error (RRMSE) is the third measure used to evaluate the effect ofmuscle artifact removal, which is defined as:RRMSE =RMS(XEEG\u2212 X\u02c6EEG)RMS(XEEG), (5.15)where RMS(\u00b7) denotes the root mean squared (RMS) value of a matrix\/vector. For106instance, the RMS value of the EEG observations X is expressed as,RMS(X) =\u221a1M \u00b7TM\u2211m=1T\u2211t=1Xm,t , (5.16)where M is the number of EEG observations (which is 4 in this chapter), and Trepresents the number of data samples.5.5 Numerical Study for the Synthetic EEG DataThe original sources in our study include 4 EEG sources (represented by SEEG) and1 EMG source (SEMG), which are linearly mixed into 4 observations X following(5.1). The other two datasets, X (2) and X (3), are generated following (5.2). In ourprevious paper [111], we discussed the effect of the step size and the number oftime delays. Considering the difference in the sampling rate, here we select 1 datasample corresponding to 4ms as the step size of time delays. Compared to the stepsize, the number of time delays has less impact on the performance. To furtherenhance the time efficiency, we set the number of time delays to 10.Fig.5.1 shows the estimation error as a function of SNRS. We compare the pro-posed method with a commonly-used single-set UBSS method (SOBIUM) and thepreviously developed underdetermined joint BSS method (UJBSS-m) [31, 111].SOBIUM exploits the autocorrelation of the sources and reformulate the problemof estimating the mixing matrix as decomposing a higher-order tensor. UJBSS-mmodels the cross correlation between each pair of two datasets and it is consideredas a great alternative to SOBIUM in solving the single-set underdetermined BSSproblem [111]. Compared with the SOBIUM and UJBSS-m, which only utilizedautocorrelation or part of cross correlation, the proposed method fully exploitsthe second-order statistics of the observations. Benefiting from this, the proposedmethod consistently yields the best results over the entire SNRS range, which also107suggests the stability of the proposed method.We estimate the mixing matrix A via the single-set UBSS method SOBIUM,the underdetermined joint BSS method UJBSS-m and our proposed method, andfurther recover the latent sources using the subspace representation method [59].The source with the lowest autocorrelation is then recognized as the EMG source.Next the EMG source is set to 0 and the EEG sources are used to reconstruct theartifact-free EEG signals XEEG. As an illustrative example, Fig.5.2 demonstratesthe original XEEG and the corresponding reconstruction results. It is shown thatour proposed method is able to remove the EMG artifact perfectly and the recon-structed EEG signals are highly correlated with the original artifact-free EEG sig-nals. In addition, we also compared the proposed method with a recently developedEMG artifact removal method, EEMD-CCA [21]. EEMD-CCA is a single-channeltechnique for muscle activity removal and is therefore suitable for removing arti-facts with a limited number of observations. It was shown that this EEMD-CCAtechnique outperforms the multichannel technique based on CCA for removingmuscle artifacts from EEG signals [18, 21]. In this chapter, we apply this EEMD-CCA method to each channel of the EEG observations X and decompose eachchannel into multiple IMFS, which are the input of the CCA method. Given the rel-ative lower autocorrelation, the last canonical variate (CV, i.e. output of the CCA) isset to 0 during EEG reconstruction. We also test the effect of the number of canon-ical variates which are selected as the EMG artifacts and discarded during EEGreconstruction. The best performance is gained when the number of CV resmblingEMG is set to 1.In this chapter, we repeat the experiments 1000 times and the resulting averageperformance is shown in Table 5.1. We further compare the proposed method withthree other artifact removal methods, including SOBIUM, UJBSS-m and EEMD-CCA, in terms of MAC, RRMSE and time efficiency. All three of the compared108BSS methods utilize the same technology to recover the sources when the mix-ing matrix is estimated separately. Despite this, the performance of the proposedmethod is significantly better than that of SOBIUM and UJBSS-m. This also sug-gests the importance of estimating mixing matrices accurately. In addition, we testthe computational cost of all four of these respective artifact removal methods. Toremove the muscle artifact from 10000-datapoint 4-channel EEG observations, theaverage computational time for the proposed method is 52.810s while that of SO-BIUM, UJBSS-m and EEMD-CCA are 11.195s, 26.907s and 52.910s respectively.The implementation is completed in MATLAB on a computer with Intel Core i7-4770 3.40 GHz CPU and 8.00G RAM. All the MATLAB codes used in this chapterare available upon request from the authors via email liangzou@ece.ubc.ca.Table 5.1: Performance comparison between the proposed method and theother three methods (SOBIUM, UJBSS-m, EEMD-CCA)MAC RRMSE Average time cost (second)SOBIUM [31] 0.863 0.147 11.195UJBSS-m [111] 0.930 0.099 26.907EEMD-CCA [21] NA 0.168 52.910The proposed method 0.940 0.085 52.8105.6 Conclusions and DiscussionIn this chapter, we propose an effective and novel method to remove muscle ar-tifacts from EEG signals. Compared with SOBIUM and UJBSS-m, which onlyutilize autocorrelation or a portion of cross-correlation, our proposed method fullyexploits the second-order statistics of observations. The mixing matrices are ac-curately estimated through joint CPD of a set of specialized tensors in whichcovariance matrices corresponding to different time delays are stacked. Subse-1090 5 10 15 20 25 30 35 40\u22129\u22128\u22127\u22126\u22125\u22124\u22123\u22122SNR (dB)Error (dB) SOBIUMUJBSS\u2212mProposed methodFigure 5.1: Estimation error of A with the change of signal-to-noise ratios.Here, the number of time delays L equals to 10, and the step size oftime delays (i.e. \u03c4l\u2212 \u03c4l\u22121) is 1 data sample corresponding to 4 ms.quently, sources are recovered based on these estimated mixing matrices. Com-pared with EEMD-CCA whose performance relies on the artifact level of the con-taminated EEG signals (ratio between the power of pure EEG and EMG), the pro-posed method is based on the statistical properties of the underlying sources, andtherefore is more robust. We evaluate the performance of the proposed methodthrough numerical simulations in which EEG recordings are contaminated withmuscle artifact. Our results demonstrate that the proposed method can effectivelyand efficiently remove muscle artifacts while preserving the EEG activity success-fully. Therefore, it is a promising tool for real-world biomedical signal processingapplications.1100 50 100 150 200 250 300 350 400 450 500\u22123\u22122\u2212101234Channel 1: Correlation = 0.99427 0 50 100 150 200 250 300 350 400 450 500\u22124\u221220246Channel 2: Correlation = 0.99940 50 100 150 200 250 300 350 400 450 500\u22126\u22124\u22122024Channel 3: Correlation = 0.999040 50 100 150 200 250 300 350 400 450 500\u22124\u22123\u22122\u221210123Channel 4: Correlation = 0.99953Original EEGReconstructed EEGFigure 5.2: An illustrative example of the reconstructed EEG signals basedon the proposed EMG removal method.This is the last piece of my PhD work. It is a new and interesting application ofUJBSS. To the best of our knowledge, we are the first to apply the underdeterminedBSS method to remove EMG artifacts from a limited number of EEG observations.Further, we note that it is also applicable to remove other kinds of artifacts, such asECG and EOG.111Chapter 6Conclusion and Future Work6.1 ConclusionIn this dissertation, we have developed a set of novel underdetermined blind sourceseparation approaches for recovering underlying sources when the number of sourcesis greater than that of the observations. The proposed algorithms aim to addressseveral challenges in real applications, including limited number of observations,self\/cross dependence information and source inference. The proposed methodswere evaluated on synthetic data and\/or real physiological signals. It should benoted that since the underlying ground truth is unavailable in real physiologicalstudies, the performance evaluation in such cases basically depends on visual in-spection by experts in the field. The main contributions and findings of this disser-tation are summarized as follows.In Chapter 2, a novel UBSS framework, termed NAMEMD-MCCA, is pro-posed to extract the heart beat signal from multi-channel NF-based sensor signals.Considering various potential artifacts residing in the measured signals, recover-ing the underlying heart beat signal is an underdetermined problem. We inves-112tigate state-of-the-art EMD-BSS based methods for exacting RHBR informationaccurately based on the nano-sensor data and further propose the NAMEMD-MCCA for improved RHBR monitoring. Considering inter-channel information,NAMEMD processes the input NF-based signals in high dimensional space andcan effectively overcome the problems of uniqueness and mode mixing [77]. Asan extension of CCA, MCCA is able to jointly extract sources through maximizingthe correlations of the extracted sources across datasets. The combination of thesetwo methods (NAMEMD-MCCA) benefits from the use of cross-channel informa-tion and increased robustness to artifacts. We first apply the proposed methods tosynthetic data to illustrate their performance where the underlying truth is known.We then apply the proposed method to real nano-sensor data collected when thesubject performs 11 tasks and it is shown that the proposed NAMEMD-MCCAmethod can achieve superior performance.Another challenging question that we have posed in the underdetermined BSSfield is the underdetermined joint BSS problem, which aims to jointly estimate themixing matrices and\/or extract the underlying source from multiple datasets whenthe number of sources is greater than that of observations in each dataset. Tra-ditional joint BSS methods are designed for the determined case, which assumethat the number of sources is equal to or less than that of the observations. Asmentioned, this assumption may not be true in certain practical applications due toconcerns including cost or time [60]. However, to the best of our knowledge, in thecurrent literature only very limited work has been done on JBSS methods specifi-cally designed for the underdetermined case. In order to address this concern, inChapter 3, we propose an underdetermined JBSS for 2 datasets, termed as UJBSS-2.Considering the dependence information between two datasets, we exploit second-order statistics of the observations. The problem of jointly estimating mixing ma-trices is tackled via CPD of a specialized tensor in which a set of spatial covariance113matrices are stacked. Numerical results demonstrate the competitive performanceof UJBSS-2 when compared to a commonly used JBSS method, MCCA, and thesingle-set UBSS method, UBSS-FAS.In Chapter 4, we generalize the idea of UJBSS-2 for two datasets into mul-tiple datasets [124]. In this work, we propose a novel and effective method tojointly estimate the mixing matrices for multiple datasets. Moving from our workin Chapter 3, here the dependence information is modeled in a set of three-ordertensors, rather than one single tensor. Considering the latent common structure ofthese constructed tensors, we jointly estimate the mixing matrices via joint canon-ical polyadic decomposition of these specialized tensors. In order to accuratelyinfer the source signals, we recover them by further utilizing a novel subspace rep-resentation based method. This proposed UJBSS-m method does not rely uponthe sparsity of signals and therefore it can be applied to a wide class of signals.In addition, we also show that UJBSS-m can be utilized to solve the single-setUBSS problem when suitable noise is added to the observations. Numerical resultson both audio and physiological signals demonstrate the superior performances ofthis proposed method.In Chapter 5, we propose a novel underdetermined blind source separationmethod for removing muscle artifacts from EEG signals. EEG recordings are oftencontaminated by various artifacts, among which the artifact from EMG is particu-larly difficult to eliminate. Such EMG artifacts reduce the quality of EEG signalsand disturb further analysis of EEG, as in brain connectivity modeling. If a highenough number of EEG recordings are available, we can remove or to some ex-tent suppress the distortion effect of such artifacts via a considerable range of BSSmethods, such as ICA and CCA. However, for many practical applications, such asambulatory health-care monitoring, a small number of sensors used to collect EEGis preferred and conventional BSS methods like CCA and ICA, will fail. Consider-114ing the recent increasing need for biomedical signal processing in the ambulatoryenvironment, we explore cross correlation and autocorrelation of the underlyingsources and propose a novel underdetermined BSS method. We conduct a perfor-mance comparison through numerical simulations in which 4 EEG recordings arecontaminated with 1 muscle artifact. It is demonstrated that the proposed methodcan effectively and efficiently remove the muscle artifact meanwhile successfullypreserving the EEG activity.6.2 Future Work6.2.1 Multiple Datasets GenerationIn Chapter 4 and Chapter 5, we demonstrate that underdetermined joint BSS meth-ods can be utilized to solve single set underdetermined BSS problems. In order toensure the relative high correlation between sources, we construct multiple datasetsthrough adding a certain amount of weak Gaussian white noise (e.g., with SNR =20dB) to the observations. Then we apply the proposed underdetermined joint BSSmethod to these noise assisted datasets. To make a fair comparison, we repeat thesimulation 1000 times in both studies. It is demonstrated that the proposed meth-ods gain better performance on average when we use these noise assisted datasets.For instance, the performance of the proposed method in Chapter 4 is better thanthat of SOBIUM with high confidence (80 percent probability). However, there isnot currently a strict theoretical analysis on how to add noise to ensure obtainingbetter performance with a higher probability. Thus, one possible research directionis to investigate a more rigorous method to add assisting noise adaptively to thegiven observations.1156.2.2 Estimate the Number of Source Signals in Determined andUnderdetermined BSSThe classical BSS problem includes two aspects of research: an estimation of thenumber of sources and source separation. The \u2018blind\u2019 aspect of BSS refers to thefact that there is generally no prior information available on the number of sourcesor on the mixing model. However, for conceptual and computational simplicity,most BSS algorithms usually require that the number of sources is specified in ad-vance. ICA makes the assumption that the number of sources is not greater thanthat of observations. MCCA assumes that the number of sources equals to thatof observations [56]. Further, the matrix diagonalization-based technique for theunderdetermined BSS has an upper limit for the number of sources. We see thatwe can obtain a unique solution only when the number of sources satisfies certainconditions [31]. However, these assumptions may not be the case in reality. Oneclassic example of the source separation problem is the cocktail party problem,which has been previously explained, where a number of people are talking simul-taneously. Without prior knowledge of the number of sources, we do not knowwhich BSS method is most suitable for solving this problem given the recordingsignals. In addition, results can vary if the number of sources is set differently. Indifferent scenarios, we may need to choose different types of BSS methods.An accurate estimation of the number of sources is shown to be of high impor-tance, and it is generally estimated before further source separation. As a result,several approaches have been proposed for estimating the number of sources. Waxand Kailath introduced the eigenvalue-based estimation method and investigatedthe observation that the number of dominant eigenvalues of the correlation ma-trix is equal to the number of sources in determined cases [41]. This method wasimproved by introducing Akaike Information Criterion (AIC) and Minimum De-scription Length (MDL) and some other measures in the estimation [40, 73, 94].116However, it is challenging to estimate the true number of sources when the ob-servations are noisy. To address this concern, sources number estimation methodswhich are robust to noisy observations are highly desired.In addition, to the best of our knowledge, there are only a few researchersdiscussing different ways to estimate the number of sources in UBSS, and theygenerally make use of the sparsity of source signals. For instance, SSPS on thetime domain or frequency domains are detected and similarity-based clusteringmethods are used to estimate the number of sources. However, signals may not beas sparse as assumed in the existing methods. Therefore, it is necessary to relax thesparsity constraint. Furthermore, the existing methods for estimating the numberof sources for UBSS cases are limited to instantaneous mixing models. We plan todevelop advanced methods to estimate the number of sources when the sources aremixed in a convolutional model. Lastly, the number of sources may be related tocertain physiological processes, such as depth of sleep. It would thus be of interestin some applications to monitor the dynamic change of the number of sources andthe corresponding mixing structure.6.2.3 Online Underdetermined BSSFor many practical applications, such as ambulatory health-care monitoring, it isdesirable to collect mixed signals using fewer sensors. In order to recover sourcesor remove unwanted noise, underdetermined BSS is preferred in these situations.Generally underdetermined BSS is more difficult to implement due to the lowernumber of available observations. Underdetermined BSS methods generally con-sist of two separate steps: mixing matrix estimation and underlying source infer-ence, which are very time consuming. To better serve such practical applications,real-time underdetermined BSS methods with light computational complexity arepreferred.117In addition, most existing BSS algorithms assume that the sources are physi-cally stationary, i.e., mixing filters are fixed. However, this assumption does notalways hold in real applications. For instance, in the cocktail party problem, it ishighly possible that both sources and sensors are not stationary in the room andtherefore the mixing model may be time varying. In these situations, it is nec-essary to model the mixing matrix as time-varying and develop UBSS methodsaccordingly.118Bibliography[1] F. Abrard and Y. Deville. A time\u2013frequency blind signal separation methodapplicable to underdetermined mixtures of dependent sources. SignalProcessing, 85(7):1389\u20131403, 2005. \u2192 pages 7[2] E. Acar, D. M. Dunlavy, and T. G. Kolda. A scalable optimizationapproach for fitting canonical tensor decompositions. Journal ofChemometrics, 25(2):67\u201386, 2011. \u2192 pages 67, 71, 73[3] E. Acar, T. G. Kolda, and D. M. Dunlavy. All-at-once optimization forcoupled matrix and tensor factorizations. arXiv preprint arXiv:1105.3422,2011. \u2192 pages 77, 103[4] T. Adali, M. Anderson, and G.-S. Fu. Diversity in independent componentand vector analyses: Identifiability, algorithms, and applications in medicalimaging. IEEE Signal Processing Magazine, 31(3):18\u201333, 2014. \u2192 pages64[5] T. Adali, Y. Levin-Schwartz, and V. D. Calhoun. Multimodal data fusionusing source separation: Two effective models based on ica and iva andtheir properties. Proceedings of the IEEE, 103(9):1478\u20131493, 2015. \u2192pages 50, 65[6] M. Aharon, M. Elad, and A. Bruckstein. rmk-svd: An algorithm fordesigning overcomplete dictionaries for sparse representation. IEEETransactions on signal processing, 54(11):4311\u20134322, 2006. \u2192 pages 9[7] A. Aissa-El-Bey, N. Linh-Trung, K. Abed-Meraim, A. Belouchrani, andY. Grenier. Underdetermined blind separation of nondisjoint sources in thetime-frequency domain. IEEE Transactions on Signal Processing, 55(3):897\u2013907, 2007. \u2192 pages 7119[8] M. Anderson, T. Adali, and X.-L. Li. Joint blind source separation withmultivariate gaussian model: Algorithms and performance analysis. IEEETransactions on Signal Processing, 60(4):1672\u20131683, 2012. \u2192 pages 10[9] S. Arberet, R. Gribonval, and F. Bimbot. A robust method to count andlocate audio sources in a stereophonic linear anechoic mixture. InAcoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEEInternational Conference on, volume 3, pages III\u2013745. IEEE, 2007. \u2192pages 7[10] B. Arons. A review of the cocktail party effect. Journal of the AmericanVoice I\/O Society, 12(7):35\u201350, 1992. \u2192 pages 2[11] M. Bin Altaf, T. Gautama, T. Tanaka, and D. P. Mandic. Rotation invariantcomplex empirical mode decomposition. In Acoustics, Speech and SignalProcessing, 2007. ICASSP 2007. IEEE International Conference on,volume 3, pages III\u20131009. IEEE, 2007. \u2192 pages 25[12] B. Bouachache and P. Flandrin. Wigner-ville analysis of time-varyingsignals. In Acoustics, Speech, and Signal Processing, IEEE InternationalConference on ICASSP \u201982., volume 7, pages 1329\u20131332, May 1982. \u2192pages 54[13] L. Brechet, M.-F. Lucas, C. Doncarli, and D. Farina. Compression ofbiomedical signals with mother wavelet optimization and best-basiswavelet packet selection. Biomedical Engineering, IEEE Transactions on,54(12):2186\u20132192, 2007. \u2192 pages 21[14] V. D. Calhoun, J. Liu, and T. Adal\u0131. A review of group ica for fmri data andica for joint inference of imaging, genetic, and erp data. Neuroimage, 45(1):S163\u2013S172, 2009. \u2192 pages 10, 11[15] A. T. Cemgil, C. Fe\u00b4votte, and S. J. Godsill. Variational and stochasticinference for bayesian source separation. Digital Signal Processing, 17(5):891\u2013913, 2007. \u2192 pages 9[16] J.-C. Chao. On the design of robust criteria and algorithms for blind sourceseparation. PhD thesis, Southern Methodist University, 2007. \u2192 pages 5[17] X. Chen, C. He, Z. J. Wang, and M. J. McKeown. An ic-pls framework forgroup corticomuscular coupling analysis. Biomedical Engineering, IEEETransactions on, 60(7):2022\u20132033, 2013. \u2192 pages 50120[18] X. Chen, C. He, and H. Peng. Removal of muscle artifacts fromsingle-channel eeg based on ensemble empirical mode decomposition andmultiset canonical correlation analysis. Journal of Applied Mathematics,2014, 2014. \u2192 pages 97, 108[19] X. Chen, A. Liu, M. J. McKeown, H. Poizner, and Z. J. Wang. Aneemd-iva framework for concurrent multidimensional eeg andunidimensional kinematic data analysis. IEEE Transactions on BiomedicalEngineering, 61(7):2187\u20132198, 2014. \u2192 pages 11, 22[20] X. Chen, Z. J. Wang, and M. J. McKeown. A three-step multimodalanalysis framework for modeling corticomuscular activity with applicationto parkinsons disease. Biomedical and Health Informatics, IEEE Journalof, 18(4):1232\u20131241, 2014. \u2192 pages 38, 47, 60, 88[21] X. Chen, A. Liu, J. Chiang, Z. J. Wang, M. J. McKeown, and R. K. Ward.Removing muscle artifacts from eeg data: Multichannel or single-channeltechniques? IEEE Sensors Journal, 16(7):1986\u20131997, 2016. \u2192 pages 97,98, 105, 108, 109[22] X. Chen, Z. J. Wang, and M. McKeown. Joint blind source separation forneurophysiological data analysis: Multiset and multimodal methods. IEEESignal Processing Magazine, 33(3):86\u2013107, 2016. \u2192 pages 64[23] S. Choi, A. Cichocki, and A. Beloucharni. Second order nonstationarysource separation. The Journal of VLSI Signal Processing, 32(1):93\u2013104,2002. \u2192 pages 98[24] A. Cichocki and S.-i. Amari. Adaptive blind signal and image processing:learning algorithms and applications, volume 1. John Wiley & Sons, 2002.\u2192 pages 5[25] P. Comon and C. Jutten. Handbook of Blind Source Separation:Independent component analysis and applications. Academic press, 2010.\u2192 pages 5[26] M. Congedo, R. Phlypo, and J. Chatel-Goldman. Orthogonal andnon-orthogonal joint blind source separation in the least-squares sense. InSignal Processing Conference (EUSIPCO), 2012 Proceedings of the 20thEuropean, pages 1885\u20131889. IEEE, 2012. \u2192 pages 66[27] N. M. Correa, T. Adali, Y.-O. Li, and V. D. Calhoun. Canonical correlationanalysis for data fusion and group inferences. Signal Processing Magazine,IEEE, 27(4):39\u201350, 2010. \u2192 pages 59, 87121[28] M. Crespo-Garcia, M. Atienza, and J. L. Cantero. Muscle artifact removalfrom human sleep eeg by using independent component analysis. Annals ofbiomedical engineering, 36(3):467\u2013475, 2008. \u2192 pages 97[29] W. De Clercq, A. Vergult, B. Vanrumste, W. Van Paesschen, andS. Van Huffel. Canonical correlation analysis applied to remove muscleartifacts from the electroencephalogram. IEEE transactions on BiomedicalEngineering, 53(12):2583\u20132587, 2006. \u2192 pages 21, 98, 104[30] L. De Lathauwer. A link between the canonical decomposition inmultilinear algebra and simultaneous matrix diagonalization. SIAMJournal on Matrix Analysis and Applications, 28(3):642\u2013666, 2006. \u2192pages 54, 73[31] L. De Lathauwer and J. Castaing. Blind identification of underdeterminedmixtures by simultaneous matrix diagonalization. Signal Processing, IEEETransactions on, 56(3):1096\u20131105, 2008. \u2192 pages xiii, 7, 8, 51, 54, 58, 65,66, 74, 84, 86, 89, 91, 93, 107, 109, 116[32] L. De Lathauwer, J. Castaing, and J.-F. Cardoso. Fourth-ordercumulant-based blind identification of underdetermined mixtures. IEEETransactions on Signal Processing, 55(6):2965\u20132973, 2007. \u2192 pages 8[33] A. Delorme, T. Sejnowski, and S. Makeig. Enhanced detection of artifactsin eeg data using higher-order statistics and independent componentanalysis. Neuroimage, 34(4):1443\u20131449, 2007. \u2192 pages 105[34] I. Domanov and L. De Lathauwer. Canonical polyadic decomposition ofthird-order tensors: relaxed uniqueness conditions and algebraic algorithm.Linear Algebra and its Applications, 513:342\u2013375, 2017. \u2192 pages 73[35] I. Domanov and L. D. Lathauwer. Canonical polyadic decomposition ofthird-order tensors: reduction to generalized eigenvalue decomposition.SIAM Journal on Matrix Analysis and Applications, 35(2):636\u2013660, 2014.\u2192 pages 73[36] I. Domanov and L. D. Lathauwer. Generic uniqueness conditions for thecanonical polyadic decomposition and indscal. SIAM Journal on MatrixAnalysis and Applications, 36(4):1567\u20131589, 2015. \u2192 pages 73[37] T. Dong, Y. Lei, and J. Yang. An algorithm for underdetermined mixingmatrix estimation. Neurocomputing, 104:26\u201334, 2013. \u2192 pages 7122[38] D. M. Dunlavy, T. G. Kolda, and E. Acar. Poblano v1. 0: A matlab toolboxfor gradient-based optimization. Sandia National Laboratories,Albuquerque, NM and Livermore, CA, Tech. Rep. SAND2010-1422, 2010.\u2192 pages 78, 104[39] J. Escudero, R. Hornero, D. Aba\u00b4solo, and A. Ferna\u00b4ndez. Quantitativeevaluation of artifact removal in real magnetoencephalogram signals withblind source separation. Annals of biomedical engineering, 39(8):2274\u20132286, 2011. \u2192 pages 22[40] E. Fishler and H. V. Poor. Estimation of the number of sources inunbalanced arrays via information theoretic criteria. IEEE Transactions onSignal Processing, 53(9):3543\u20133553, 2005. \u2192 pages 116[41] E. Fishler, M. Grosmann, and H. Messer. Detection of signals byinformation theoretic criteria: General asymptotic performance analysis.IEEE Transactions on Signal Processing, 50(5):1027\u20131036, 2002. \u2192pages 116[42] P. Flandrin, G. Rilling, and P. Goncalves. Empirical mode decompositionas a filter bank. Signal Processing Letters, IEEE, 11(2):112\u2013114, 2004. \u2192pages 40[43] J. Gao, C. Zheng, and P. Wang. Online removal of muscle artifact fromelectroencephalogram signals based on canonical correlation analysis.Clinical EEG and neuroscience, 41(1):53\u201359, 2010. \u2192 pages 98[44] S. Ge, Q. Yang, R. Wang, P. Lin, J. Gao, Y. Leng, Y. Yang, and H. Wang. Abrain-computer interface based on a few-channel eeg-fnirs bimodal system.IEEE Access, 5:208\u2013218, 2017. \u2192 pages 64[45] A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov,R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley.Physiobank, physiotoolkit, and physionet components of a new researchresource for complex physiologic signals. Circulation, 101(23):e215\u2013e220,2000. \u2192 pages 60, 88[46] X.-F. Gong, X.-L. Wang, and Q.-H. Lin. Generalized non-orthogonal jointdiagonalization with lu decomposition and successive rotations. IEEETRANSACTIONS ON SIGNAL PROCESSING, 63(5), 2015. \u2192 pages 66,74123[47] R. Gribonval and S. Lesage. A survey of sparse component analysis forblind source separation: principles, perspectives, and new challenges. InESANN\u201906 proceedings-14th European Symposium on Artificial NeuralNetworks, pages 323\u2013330. d-side publi., 2006. \u2192 pages 4[48] A. R. Groves, C. F. Beckmann, S. M. Smith, and M. W. Woolrich. Linkedindependent component analysis for multimodal data fusion. Neuroimage,54(3):2198\u20132217, 2011. \u2192 pages 65[49] H. Hotelling. Relations between two sets of variates. Biometrika, 28(3\/4):321\u2013377, 1936. \u2192 pages 10, 23, 27[50] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N.-C.Yen, C. C. Tung, and H. H. Liu. The empirical mode decomposition andthe hilbert spectrum for nonlinear and non-stationary time series analysis.In Proceedings of the Royal Society of London A: Mathematical, Physicaland Engineering Sciences, volume 454, pages 903\u2013995. The Royal Society,1998. \u2192 pages 21[51] A. Hyva\u00a8rinen and E. Oja. Independent component analysis: algorithms andapplications. Neural networks, 13(4):411\u2013430, 2000. \u2192 pages 6[52] Y. Ichimaru and G. Moody. Development of the polysomnographicdatabase on cd-rom. Psychiatry and Clinical Neurosciences, 53(2):175\u2013177, 1999. \u2192 pages 32[53] M. T. Jensen, J. L. Marott, P. Lange, J. Vestbo, P. Schnohr, O. W. Nielsen,J. S. Jensen, and G. B. Jensen. Resting heart rate is a predictor of mortalityin copd. European Respiratory Journal, 42(2):341\u2013349, 2013. \u2192 pages 20[54] S. Junnila, H. Kailanto, J. Merilahti, A.-M. Vainio, A. Vehkaoja,M. Zakrzewski, and J. Hyttinen. Wireless, multipurpose in-home healthmonitoring platform: Two case trials. Information Technology inBiomedicine, IEEE Transactions on, 14(2):447\u2013455, 2010. \u2192 pages 20[55] W. Karlen, S. Raman, J. M. Ansermino, and G. A. Dumont. Multiparameterrespiratory rate estimation from the photoplethysmogram. BiomedicalEngineering, IEEE Transactions on, 60(7):1946\u20131953, 2013. \u2192 pages 32[56] J. R. Kettenring. Canonical analysis of several sets of variables.Biometrika, pages 433\u2013451, 1971. \u2192 pages 10, 27, 50, 64, 116[57] H. A. Kiers. Towards a standardized notation and terminology in multiwayanalysis. Journal of chemometrics, 14(3):105\u2013122, 2000. \u2192 pages 67, 69124[58] S. Kim and C. D. Yoo. Underdetermined blind source separation based ongeneralized gaussian distribution. In 2006 16th IEEE Signal ProcessingSociety Workshop on Machine Learning for Signal Processing, pages103\u2013108. IEEE, 2006. \u2192 pages 80[59] S. Kim and C. D. Yoo. Underdetermined blind source separation based onsubspace representation. IEEE Transactions on Signal processing, 57(7):2604\u20132614, 2009. \u2192 pages 9, 65, 79, 80, 85, 86, 89, 93, 104, 108[60] M. Kleinsteuber and H. Shen. Blind source separation with compressivelysensed linear mixtures. Signal Processing Letters, IEEE, 19(2):107\u2013110,2012. \u2192 pages 51, 65, 113[61] T. G. Kolda and B. W. Bader. Tensor decompositions and applications.SIAM review, 51(3):455\u2013500, 2009. \u2192 pages 53, 54, 67, 69, 71, 73[62] Z. Koldovsky, P. Tichavsk, A. H. Phan, and A. Cichocki. A two-stagemmse beamformer for underdetermined signal separation. IEEE SignalProcessing Letters, 20(12):1227\u20131230, 2013. ISSN 1070-9908.doi:10.1109\/LSP.2013.2285932. \u2192 pages 51, 65, 71[63] E. Kristal-Boneh, H. Silber, G. Harari, and P. Froom. The association ofresting heart rate with cardiovascular, cancer and all-cause mortality. eightyear follow-up of 3527 male israeli employees (the cordis study).European heart journal, 21(2):116\u2013124, 2000. \u2192 pages 20[64] J. B. Kruskal. Three-way arrays: rank and uniqueness of trilineardecompositions, with application to arithmetic complexity and statistics.Linear algebra and its applications, 18(2):95\u2013138, 1977. \u2192 pages 72[65] D. Labate, F. La Foresta, G. Morabito, I. Palamara, and F. C. Morabito.Entropic measures of eeg complexity in alzheimer\u2019s disease through amultivariate multiscale approach. IEEE Sensors Journal, 13(9):3284\u20133292,2013. \u2192 pages 98[66] J. Lee, D. D. McManus, S. Merchant, and K. H. Chon. Automatic motionand noise artifact detection in holter ecg data using empirical modedecomposition and statistical approaches. Biomedical Engineering, IEEETransactions on, 59(6):1499\u20131506, 2012. \u2192 pages 21[67] J.-H. Lee, T.-W. Lee, F. A. Jolesz, and S.-S. Yoo. Independent vectoranalysis (iva): multivariate approach for fmri group study. Neuroimage, 40(1):86\u2013109, 2008. \u2192 pages 9, 10125[68] X.-L. Li, M. Anderson, and T. Adal\u0131. Second and higher-order correlationanalysis of multiple multidimensional variables by joint diagonalization. InInternational Conference on Latent Variable Analysis and SignalSeparation, pages 197\u2013204. Springer, 2010. \u2192 pages 10[69] X.-L. Li, T. Adal\u0131, and M. Anderson. Joint blind source separation bygeneralized joint diagonalization of cumulant matrices. Signal Processing,91(10):2314\u20132322, 2011. \u2192 pages 50, 60, 64, 65, 70, 88[70] Y. Li, A. Cichocki, and S.-i. Amari. Analysis of sparse representation andblind source separation. Neural computation, 16(6):1193\u20131234, 2004. \u2192pages 9[71] Y. Li, S.-I. Amari, A. Cichocki, D. W. Ho, and S. Xie. Underdeterminedblind source separation based on sparse representation. IEEE Transactionson signal processing, 54(2):423\u2013437, 2006. \u2192 pages 7, 9[72] Y.-O. Li, T. Adali, W. Wang, and V. D. Calhoun. Joint blind sourceseparation by multiset canonical correlation analysis. Signal Processing,IEEE Transactions on, 57(10):3918\u20133929, 2009. \u2192 pages 10, 23, 27, 47,50, 59, 60, 61, 64, 70, 85, 88, 89, 93[73] A. P. Liavas and P. A. Regalia. On the behavior of information theoreticcriteria for model order selection. IEEE Transactions on Signal Processing,49(8):1689\u20131695, 2001. \u2192 pages 116[74] J. Lin and A. Zhang. Fault feature separation using wavelet-ica filter. NDT& E International, 38(6):421\u2013427, 2005. \u2192 pages 22[75] V. Ma\u00a8kinen, H. Tiitinen, and P. May. Auditory event-related responses aregenerated independently of ongoing brain activity. Neuroimage, 24(4):961\u2013968, 2005. \u2192 pages 105[76] D. Mandic et al. Filter bank property of multivariate empirical modedecomposition. Signal Processing, IEEE Transactions on, 59(5):2421\u20132426, 2011. \u2192 pages 22[77] D. P. Mandic, N. U. Rehman, Z. Wu, and N. E. Huang. Empirical modedecomposition-based time-frequency analysis of multivariate signals: thepower of adaptive data analysis. Signal Processing Magazine, IEEE, 30(6):74\u201386, 2013. \u2192 pages 21, 23, 113126[78] M. D. McDonnell and D. Abbott. What is stochastic resonance?definitions, misconceptions, debates, and its relevance to biology. PLoSComput Biol, 5(5):e1000348, 2009. \u2192 pages 98[79] B. W. McMenamin, A. J. Shackman, L. L. Greischar, and R. J. Davidson.Electromyogenic artifacts and electroencephalographic inferencesrevisited. Neuroimage, 54(1):4\u20139, 2011. \u2192 pages 97[80] L. Mesin, A. Holobar, and R. Merletti. Blind source separation:Application to biomedical signals. 2011. \u2192 pages 5[81] B. Mijovic, M. De Vos, I. Gligorijevic, J. Taelman, and S. Van Huffel.Source separation from single-channel recordings by combiningempirical-mode decomposition and independent component analysis.Biomedical Engineering, IEEE Transactions on, 57(9):2188\u20132196, 2010.\u2192 pages 21, 22, 27, 28[82] G. B. Moody and R. G. Mark. The mit-bih arrhythmia database on cd-romand software for use with it. In Computers in Cardiology 1990,Proceedings., pages 185\u2013188. IEEE, 1990. \u2192 pages 32[83] H. Nam, T.-G. Yim, S. K. Han, J.-B. Oh, and S. K. Lee. Independentcomponent analysis of ictal eeg in medial temporal lobe epilepsy.Epilepsia, 43(2):160\u2013164, 2002. \u2192 pages 98[84] D. Nion, K. N. Mokios, N. D. Sidiropoulos, and A. Potamianos. Batch andadaptive parafac-based blind separation of convolutive speech mixtures.IEEE Transactions on Audio, Speech, and Language Processing, 18(6):1193\u20131207, 2010. \u2192 pages 8[85] P. Palatini, E. Casiglia, P. Pauletto, J. Staessen, N. Kaciroti, and S. Julius.Relationship of tachycardia with high blood pressure and metabolicabnormalities a study with mixture analysis in three populations.Hypertension, 30(5):1267\u20131273, 1997. \u2192 pages 20[86] P. Palatini, E. Casiglia, S. Julius, and A. C. Pessina. High heart rate: a riskfactor for cardiovascular death in elderly men. Archives of internalmedicine, 159(6):585\u2013592, 1999. \u2192 pages 20[87] C. Park, D. Looney, A. Ahrabian, D. Mandic, et al. Classification of motorimagery bci using multivariate empirical mode decomposition. NeuralSystems and Rehabilitation Engineering, IEEE Transactions on, 21(1):10\u201322, 2013. \u2192 pages 22127[88] M. Rajih, P. Comon, and R. A. Harshman. Enhanced line search: A novelmethod to accelerate parafac. SIAM Journal on Matrix Analysis andApplications, 30(3):1128\u20131147, 2008. \u2192 pages 54[89] N. Rehman and D. P. Mandic. Multivariate empirical mode decomposition.In Proceedings of the Royal Society of London A: Mathematical, Physicaland Engineering Sciences, page rspa20090502. The Royal Society, 2009.\u2192 pages 25[90] N. U. Rehman and D. P. Mandic. Filter bank property of multivariateempirical mode decomposition. Signal Processing, IEEE Transactions on,59(5):2421\u20132426, 2011. \u2192 pages 23, 25, 26, 40, 46[91] V. G. Reju, S. N. Koh, and Y. Soon. An algorithm for mixing matrixestimation in instantaneous blind source separation. Signal Processing, 89(9):1762\u20131773, 2009. \u2192 pages 8, 65[92] G. Rilling, P. Flandrin, P. Gonc\u00b8alves, and J. M. Lilly. Bivariate empiricalmode decomposition. Signal Processing Letters, IEEE, 14(12):936\u2013939,2007. \u2192 pages 25[93] D. Safieddine, A. Kachenoura, L. Albera, G. Birot, A. Karfoul, A. Pasnicu,A. Biraben, F. Wendling, L. Senhadji, and I. Merlet. Removal of muscleartifact from eeg data: comparison between stochastic (ica and cca) anddeterministic (emd and wavelet-based) approaches. EURASIP Journal onAdvances in Signal Processing, 2012(1):1\u201315, 2012. \u2192 pages 21[94] H. Sawada, R. Mukai, S. Araki, and S. Makino. Estimating the number ofsources using independent component analysis. Acoustical science andtechnology, 26(5):450\u2013452, 2005. \u2192 pages 116[95] H. Sawada, S. Araki, and S. Makino. Underdetermined convolutive blindsource separation via frequency bin-wise clustering and permutationalignment. IEEE Transactions on Audio, Speech, and LanguageProcessing, 19(3):516\u2013527, 2011. \u2192 pages 13[96] T. Shany, S. J. Redmond, M. R. Narayanan, and N. H. Lovell.Sensors-based wearable systems for monitoring of human movement andfalls. Sensors Journal, IEEE, 12(3):658\u2013670, 2012. \u2192 pages 20[97] H. Snoussi and J. Idier. Bayesian blind separation of generalizedhyperbolic processes in noisy and underdeterminate mixtures. IEEETransactions on Signal Processing, 54(9):3257\u20133269, 2006. \u2192 pages 9128[98] S. Soltanian, A. Servati, R. Rahmanian, F. Ko, and P. Servati. Highlypiezoresistive compliant nanofibrous sensors for tactile and epidermalelectronic applications. Journal of Materials Research, 30(01):121\u2013129,2015. \u2192 pages 20, 34[99] L. Sorber, M. Van Barel, and L. De Lathauwer. Optimization-basedalgorithms for tensor decompositions: Canonical polyadic decomposition,decomposition in rank-(l r,l r,1) terms, and a new generalization. SIAMJournal on Optimization, 23(2):695\u2013720, 2013. \u2192 pages 73[100] L. Sorber, M. Van Barel, and L. De Lathauwer. Structured data fusion.IEEE Journal of Selected Topics in Signal Processing, 9(4):586\u2013600, 2015.\u2192 pages 77, 103[101] M. S\u00f8rensen, I. Domanov, and L. De Lathauwer. Coupled canonicalpolyadic decompositions and (coupled) decompositions in multilinearrank-(l r,n,l r,n,1) terms\u2014part ii: Algorithms. SIAM Journal on MatrixAnalysis and Applications, 36(3):1015\u20131045, 2015. \u2192 pages 77[102] G. Strang. Introduction to Linear Algebra. Wellesley-Cambridge Press,2003. ISBN 9780961408893. \u2192 pages 80[103] K. T. Sweeney, S. F. McLoone, and T. E. Ward. The use of ensembleempirical mode decomposition with canonical correlation analysis as anovel artifact removal technique. Biomedical Engineering, IEEETransactions on, 60(1):97\u2013105, 2013. \u2192 pages 22, 28[104] P. Tichavsky and Z. Koldovsky. Weight adjusted tensor method for blindseparation of underdetermined mixtures of nonstationary sources. IEEETransactions on Signal Processing, 59(3):1037\u20131047, 2011. \u2192 pages 65[105] N. ur Rehman, C. Park, N. E. Huang, and D. P. Mandic. Emd via memd:multivariate noise-aided computation of standard emd. Advances inAdaptive Data Analysis, 5(02), 2013. \u2192 pages 29[106] J. A. Urigu\u00a8en and B. Garcia-Zapirain. Eeg artifact removalstate-of-the-artand guidelines. Journal of neural engineering, 12(3):031001, 2015. \u2192pages 97[107] H. L. Van Trees. Detection, estimation, and modulation theory, optimumarray processing. John Wiley & Sons, 2004. \u2192 pages 79129[108] R. R. Va\u00b4zquez, H. Velez-Perez, R. Ranta, V. L. Dorr, D. Maquin, andL. Maillard. Blind source separation, wavelet denoising and discriminantanalysis for eeg artefacts and noise cancelling. Biomedical SignalProcessing and Control, 7(4):389\u2013400, 2012. \u2192 pages 21[109] N. Vervliet, O. Debals, and L. De Lathauwer. Tensorlab 3.0-numericaloptimization strategies for large-scale constrained and coupledmatrix\/tensor factorization. In 2016 Conference Record of the 50thAsilomar Conference on Signals, Systems and Computers. IEEE, 2016. \u2192pages 77, 78, 103[110] E. Vincent, R. Gribonval, and C. Fe\u00b4votte. Performance measurement inblind audio source separation. IEEE transactions on audio, speech, andlanguage processing, 14(4):1462\u20131469, 2006. \u2192 pages 5[111] L. Z. Wang, X. Chen, X. Ji, and Z. J. Underdetermined joint blind sourceseparation of multiple datasets. IEEE Access, 5:7474\u20137487, 2017. \u2192pages 8, 99, 100, 107, 109[112] E. Wigner. On the quantum correction for thermodynamic equilibrium.Physical Review, 40(5):749\u2013759, 1932. \u2192 pages 54[113] Z. Wu and N. E. Huang. A study of the characteristics of white noise usingthe empirical mode decomposition method. Proceedings of the RoyalSociety of London. Series A: Mathematical, Physical and EngineeringSciences, 460(2046):1597\u20131611, 2004. \u2192 pages 24[114] Z. Wu and N. E. Huang. Ensemble empirical mode decomposition: anoise-assisted data analysis method. Advances in adaptive data analysis, 1(01):1\u201341, 2009. \u2192 pages 22[115] G. Wunder, H. Boche, T. Strohmer, and P. Jung. Sparse signal processingconcepts for efficient 5g system design. IEEE Access, 3:195\u2013208, 2015. \u2192pages 65[116] S. Xie, L. Yang, J.-M. Yang, G. Zhou, and Y. Xiang. Time-frequencyapproach to underdetermined blind source separation. Neural Networksand Learning Systems, IEEE Transactions on, 23(2):306\u2013316, 2012. \u2192pages 51, 54, 55, 56, 59, 60, 61, 65, 66, 79, 86[117] N. Yeung, R. Bogacz, C. B. Holroyd, S. Nieuwenhuis, and J. D. Cohen.Theta phase resetting and the error-related negativity. Psychophysiology,44(1):39\u201349, 2007. \u2192 pages 105130[118] T. Yilmaz, R. Foster, and Y. Hao. Detecting vital signs with wearablewireless sensors. Sensors, 10(12):10837\u201310862, 2010. \u2192 pages 20[119] L. Zhen, D. Peng, Z. Yi, Y. Xiang, and P. Chen. Underdetermined blindsource separation using sparse coding. IEEE Transactions on NeuralNetworks and Learning Systems, 2016. \u2192 pages 9, 65, 85, 89, 93[120] G. Zhou and A. Cichocki. Canonical polyadic decomposition based on asingle mode blind source separation. Signal Processing Letters, IEEE, 19(8):523\u2013526, 2012. \u2192 pages 53, 71[121] G. Zhou, A. Cichocki, Y. Zhang, and D. P. Mandic. Group componentanalysis for multiblock data: Common and individual feature extraction.IEEE Transactions on Neural Networks and Learning Systems, PP(99):1\u201314, 2015. ISSN 2162-237X. doi:10.1109\/TNNLS.2015.2487364. \u2192pages 50[122] G. Zhou, Q. Zhao, Y. Zhang, T. Adali, S. Xie, and A. Cichocki. Linkedcomponent analysis from matrices to high-order tensors: Applications tobiomedical data. Proceedings of the IEEE, 104(2):310\u2013331, 2016. ISSN0018-9219. doi:10.1109\/JPROC.2015.2474704. \u2192 pages 50, 51, 64[123] L. Zou, X. Chen, A. Servati, P. Servati, and M. J. McKeown. A heart beatrate detection framework using multiple nanofiber sensor signals. In Signaland Information Processing (ChinaSIP), 2014 IEEE China Summit &International Conference on, pages 242\u2013246. IEEE, 2014. \u2192 pages 28, 30[124] L. Zou, X. Chen, and Z. J. Wang. Underdetermined joint blind sourceseparation for two datasets based on tensor decomposition. IEEE SignalProcessing Letters, 23(5):673\u2013677, 2016. \u2192 pages xiii, 8, 63, 65, 74, 84,85, 86, 89, 91, 93, 99, 114[125] L. Zou, Z. J. Wang, X. Chen, and X. Ji. Underdetermined joint blind sourceseparation based on tensor decomposition. In Electrical and ComputerEngineering (CCECE), 2016 IEEE Canadian Conference on, pages 1\u20134.IEEE, 2016. \u2192 pages 66131Appendix ADerivationsThe Appendix is the proof of Proposition 1 in Chapter 4, asThe partial derivative of the objective function f with respect to each column ofthe desired matrices , i.e., {a(k)n }, un, vr and wn, are given by\u2202 f\u2202a(1)n=\u2212P\u00d72 a(2)n \u00d73 un\u2212Q\u00d72 a(3)n \u00d73 vn+N\u2211c=1[(a(2)n )T a(2)c (un)T uc+(a(3)n )T a(3)c (vn)T vc]a(1)c\u2202 f\u2202a(2)n=\u2212P\u00d71 a(1)n \u00d73 un\u2212R\u00d72 a(3)n \u00d73 wn+N\u2211c=1[(a(1)n )T a(1)c (un)T uc+(a(3)n )T a(3)c (wn)T wc]a(2)c\u2202 f\u2202a(3)n=\u2212Q\u00d71 a(1)n \u00d73 vn\u2212R\u00d71 a(2)n \u00d73 wn+N\u2211c=1[(a(1)n )T a(1)c (vn)T vc+(a(2)n )T a(2)c (wn)T wc]a(3)c\u2202 f\u2202un=\u2212P\u00d71 a(1)n \u00d72 a(2)n +N\u2211c=1[(a(1)n )T a(1)c (a(2)n )T a(2)c ]uc\u2202 f\u2202vn=\u2212Q\u00d71 a(1)n \u00d72 a(3)n +N\u2211c=1[(a(1)n )T a(1)c (a(3)n )T a(3)c ]vc\u2202 f\u2202wn=\u2212R\u00d71 a(2)n \u00d72 a(3)n +N\u2211c=1[(a(2)n )T a(2)c (a(2)n )T a(3)c ]wc.132A.1 Proof for Proposition 1Proo f . The three components of the objective function in (4.21), i.e., f (1)(A(1),A(2),U),f (2)(A(1),A(3),V ) and f (3)(A(2),A(3),W ), share similar structure, which is the dif-ference between one tensor and the corresponding estimated results. Therefore,we take f (1)(A(1),A(2),U) and its partial derivative with respect to a(1)n for furtheranalysis. It can be rewritten asf (1)(A(1),A(2),U)=\u2016P\u2212 [[A(1),A(2),U ]]\u20162=\u2016P\u20162\ufe38 \ufe37\ufe37 \ufe38f (1)1\u22122\ufe38 \ufe37\ufe37 \ufe38f (1)2+\u2016[[A(1),A(2),U ]]\u20162\ufe38 \ufe37\ufe37 \ufe38f (1)3.(A.1)The first summand f (1)1 does not involve any variable and therefore\u2202 f (1)1\u2202a(1)n= 0, (A.2)where 0 is the zero vector with the same length as a(1)n . The second summand f(1)2is the inner product of the tensorP with its its polyadic decomposition, and it canbe computed asf (1)2 =

=

=N\u2211n=1M\u2211i1=1M\u2211i2=1L\u2211i3=1pi1,i2,i3a(1)i1,na(2)i2,nui3,n=N\u2211n=1(P\u00d71 a(1)n \u00d72 a(2)n \u00d73 un)=N\u2211n=1(P\u00d72 a(2)n \u00d73 un)T a(1)n .(A.3)133The partial derivative of f (1)2 with respect to each column of A(1) is\u2202 f (1)2\u2202a(1)n=P\u00d72 a(2)n \u00d73 un. (A.4)The third summand is the square of the Frobenius norm of P\u2019s polyadic decom-position, and it can be computed asf (1)3 =\u2016[[A(1),A(2),U ]]\u20162==N\u2211b=1N\u2211c=1((a(1)b )T (a(1)c )(a(2)b )T (a(2)c )(ub)T (uc))\ufe38 \ufe37\ufe37 \ufe38F(b,c)=F(n,n)+N\u2211b=1b6=nN\u2211c=1c 6=nF(b,c)+2N\u2211c=1c 6=nF(n,c),(A.5)where b and c denote the indices of the factor matrices. The first summand of f (1)3isF(n,n) = (a(1)n )T (a(1)n )(a(2)n )T (a(2)n )(un)T (un), (A.6)and its partial derivative with respect to the nth column of the factor matrix A(1) is\u2202F(n,n)\u2202a(1)n= 2((a(2)n )T a(2)n uTn un)a(1)n . (A.7)The second summand of f (1)3 does not involve the variable a(1)n and therefore thecorresponding partial derivative with respect to a(1)n is the zero vector with the samelength as a(1)n . The third summand of f(1)3 is2N\u2211c=1c 6=nF(n,c) = 2N\u2211c=1c 6=n(a(1)n )T (a(1)c )(a(2)n )T (a(2)c )(un)T (uc), (A.8)134and its partial derivative with respect to the a(1)n can be computed as 2\u2211Nc=1c 6=n[(a(2)n )T a(2)c (un)T uc]a(1)c .Therefore,\u2202 f (1)3\u2202a(1)n=2((a(2)n )T a(2)n uTn un)a(1)n+2N\u2211c=1c 6=n[(a(2)n )T a(2)c (un)T uc]a(1)c=2N\u2211c=1[(a(2)n )T a(2)c (un)T uc]a(1)c .(A.9)Combining all the above results, i.e., equation (A.2), (A.4) and (A.9), the partialderivative of f (1)(A(1),A(2),U) with respect to the a(1)n can be computed as\u2202 f (1)(A(1),A(2),U)\u2202a(1)n=\u2202 f (1)1\u2202a(1)n\u22122\u2202 f(1)2\u2202a(1)n+\u2202 f (1)3\u2202a(1)n=\u22122P\u00d72 a(2)n \u00d73 un+2N\u2211c=1[(a(2)n )T a(2)c (un)T uc]a(1)c .(A.10)Similarly, we can calculate the partial derivative of f (2)(A(1),A(3),V ) with respectto the a(1)n as\u2202 f (2)(A(1),A(3),V )\u2202a(1)n=\u22122Q\u00d72 a(3)n \u00d73 vn+2N\u2211c=1[(a(3)n )T a(3)c (vn)T vc]a(1)c .(A.11)f (3)(A(2),A(3),W ) does not involve the variable a(1)n and therefore\u2202 f (3)(A(2),A(3),W )\u2202a(1)n= 0. (A.12)135Consequently, the partial derivative of the objective function with respect to a(1)n is\u2202 f (A(1),A(2),A(3),U ,V ,W )\u2202a(1)n=12\u2202 f (1)\u2202a(1)n+12\u2202 f (2)\u2202a(1)n+12\u2202 f (3)\u2202a(1)n=\u2212P\u00d72 a(2)n \u00d73 un\u2212Q\u00d72 a(2)n \u00d73 vn+N\u2211c=1[(a(2)n )T a(2)c (un)T uc+(a(3)n )T a(3)c (vn)T vc]a(1)c .(A.13)This completes the proof of the first equation in Proposition 1. The proof of otherequations is similar to that of (A.13) and thus omitted here.136","type":"literal","lang":"en"}],"http:\/\/www.europeana.eu\/schemas\/edm\/hasType":[{"value":"Thesis\/Dissertation","type":"literal","lang":"en"}],"http:\/\/vivoweb.org\/ontology\/core#dateIssued":[{"value":"2017-11","type":"literal","lang":"en"}],"http:\/\/www.europeana.eu\/schemas\/edm\/isShownAt":[{"value":"10.14288\/1.0355539","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/language":[{"value":"eng","type":"literal","lang":"en"}],"https:\/\/open.library.ubc.ca\/terms#degreeDiscipline":[{"value":"Electrical and Computer Engineering","type":"literal","lang":"en"}],"http:\/\/www.europeana.eu\/schemas\/edm\/provider":[{"value":"Vancouver : University of British Columbia Library","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/publisher":[{"value":"University of British Columbia","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/rights":[{"value":"Attribution-NonCommercial-NoDerivatives 4.0 International","type":"literal","lang":"*"}],"https:\/\/open.library.ubc.ca\/terms#rightsURI":[{"value":"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/","type":"literal","lang":"*"}],"https:\/\/open.library.ubc.ca\/terms#scholarLevel":[{"value":"Graduate","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/title":[{"value":"Underdetermined joint blind source separation with application to physiological data","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/type":[{"value":"Text","type":"literal","lang":"en"}],"https:\/\/open.library.ubc.ca\/terms#identifierURI":[{"value":"http:\/\/hdl.handle.net\/2429\/63013","type":"literal","lang":"en"}]}}