• The opportune

moment of fusion: Existing MvC adopt three fusion strategies for multi-view

data in the clustering process, i.e., fusion in the data, fusion in the

projected features, and fusion in the results. Most of the current research

works of MvC focus on the second fusion strategy. However, there is no

theoretical foundation to decide which one is the best. Theoretical and

methodological research needs to be conducted

to uncover the essence of them.

• Incomplete

MvC: Although some attempts have been done for incomplete multi-view data, as

we mentioned in each section of the category, incomplete MvC is still a

challenging problem. In real-life, data loss occurs frequently. While the research in incomplete MvC has not

been extensive. It is expected to put the effort

in the research of incomplete MvC.

• Multi-task

multi-view clustering: This direction is a new trend in the research of MvC. A

few challenges

are accompanied

by this trend, e.g., how to explore the relationships

between different tasks and different views, and how to transfer the

knowledge between each other views.

the first

challenge of multi-view clustering is how to discriminate different views in

clustering algorithm 6.

how to maximize the clustering quality within

each view, meanwhile, take the clustering consistency across different views

into consideration.

Besides, incomplete multi-view data, where

some data objects could be missing their observation on one view (i.e., missing

objects) or could be available only for their partial features on that view

(i.e., missing feature), also brings challenges to MvC

1.1. Multi-view Clustering

Algorithms

co-EM 3, co-testing 4, and robust

co-training 5 belong to the co-training

style algorithm. Sparse multi-view SVMs 6, multi-view

TSVMs 7, multi-view Laplacian SVMs 8 and multi-view Laplacian TSVMs 9 are

representative algorithms for co-regularization style algorithms margin-consistency

style algorithms are recently proposed to make use of the latent consistency of

classification results from multiple views 10–13. Besides the latest proposed

multi-view learning strategies, some detailed multi-view learning algorithms

are successively put forward for specific machine learning tasks. These algorithms

can be summarized as multi-view transfer learning 15–17, multi-view dimensionality reduction 18–20,

multi-view clustering 21– 28, multi-view discriminant analysis 29,30,

multi-view semi-supervised learning 8,9

and multi-task multi-view learning 31– 35

Equation 1

1.1.

Rough Set Theory

When asking to a

computer scientist about rough set the first two common words they use are lower and the upper

approximation. In fact, beyond the commons words rough set theory deal with

uncertainty, vagueness and discernibility. In 1991, Pawlak introduced the rough

set theory toward its fundamental concept of funding the lower and upper

approximation. However, over the time the concept evolved. A different set theoretic approach which also uses the

concept of membership functions, namely rough sets (introduced by Pawlak in

1982 668), is sometimes confused with fuzzy sets. While both fuzzy sets and

rough sets make use of membership functions, rough sets differ in the sense

that a lower and upper approximation to the rough set is determined. The lower

approximation consists of all elements that belong with full certainty to the

corresponding set, while the upper approximation consists of elements that may

possibly belong to the set. Rough sets are frequently used in machine learning

as classifier, where they are used to find the smallest number of features to

discern between classes. Rough sets are also used for extracting knowledge from

incomplete data Computational Intelligence Second edition p.452. For

the good understanding of RST let first define an information system then we

use that to give more detail.

Information system:

Let assume

that, an ordered pair ? = (U, A), where U is the universe

of discourse and A is a non-empty set of attributes. The universe of discourse is a set of objects

(or patterns, examples), while the attributes define the characteristics of a

single object. Each attribute a ? A is a

function a: U ? Va, where Va is the range of values for

attribute a.

We call lower

approximation the region with the highest probability to find the object and

upper approximation its opposite. In some case it may be not available enough

information whether the object belong to the upper or the lower region, such

objects are regrouped in the boundary region which is full of uncertainty. In

other case it may appear that two objects have the same values for these

attributes. If so, they are indiscernible.

The indiscernibility

relation is defined as:

(2)

where B ? A. With U/IND(B) is denoted the set of equivalence

classes in the relation IND(B). That is, U/IND(B) contains one

class for each set of objects that satisfy IND(B) over all attributes in B. Objects are therefore grouped

together, where the objects in different groups cannot be discerned between.

A discernibility matrix is a

two-dimensional matrix where the equivalence classes form the indices, and each

element is the set of attributes that can be used to discern between the

corresponding classes. Formally, for a set of attributes B ? A in A = (U, A), the

discernibility matrix MD(B) is defined as

(3)

for 1 ? i,j ? n, and n = |U/IND(B)|, with

(4)

for i,j = 1,···,n; a(Ei) indicates that attribute a belongs to equivalence class Ei.

Using the discernibility

matrix, discernibility functions can be defined to compute the minimal number

of attributes necessary to discern equivalence classes from one another. The

discernibility function f(B), with B ? A, is defined as

(5)

Equation

6

where (23.4)

(7)

and is the Boolean variable

associated with a, and is

the disjunction over the set of Boolean variables, and ? denotes conjunction.

The discernibility function f(B)

finds the minimal set of attributes required to discern any equivalence class

from all others. Alternatively, the relative discernibility function f(E,

B) finds the minimal set of attributes required to discern a given class, E, from the other classes, using the set

of attributes, B. That is,

(8)