• incomplete multi-view data, as we mentioned in

• The opportune
moment of fusion: Existing MvC adopt three fusion strategies for multi-view
data in the clustering process, i.e., fusion in the data, fusion in the
projected features, and fusion in the results. Most of the current research
works of MvC focus on the second fusion strategy. However, there is no
theoretical foundation to decide which one is the best. Theoretical and
methodological research needs to be conducted
to uncover the essence of them.

• Incomplete
MvC: Although some attempts have been done for incomplete multi-view data, as
we mentioned in each section of the category, incomplete MvC is still a
challenging problem. In real-life, data loss occurs frequently. While the research in incomplete MvC has not
been extensive. It is expected to put the effort
in the research of incomplete MvC.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

• Multi-task
multi-view clustering: This direction is a new trend in the research of MvC. A
few challenges

are accompanied
by this trend, e.g., how to explore the relationships
between different tasks and different views, and how to transfer the
knowledge between each other views.

the first
challenge of multi-view clustering is how to discriminate different views in
clustering algorithm 6.

  how to maximize the clustering quality within
each view, meanwhile, take the clustering consistency across different views
into consideration.

  Besides, incomplete multi-view data, where
some data objects could be missing their observation on one view (i.e., missing
objects) or could be available only for their partial features on that view
(i.e., missing feature), also brings challenges to MvC

1.1.  Multi-view Clustering

co-EM 3, co-testing 4, and robust
co-training 5 belong to the co-training
style algorithm. Sparse multi-view SVMs 6, multi-view
TSVMs 7, multi-view Laplacian SVMs 8 and multi-view Laplacian TSVMs 9 are
representative algorithms for co-regularization style algorithms margin-consistency
style algorithms are recently proposed to make use of the latent consistency of
classification results from multiple views 10–13. Besides the latest proposed
multi-view learning strategies, some detailed multi-view learning algorithms
are successively put forward for specific machine learning tasks. These algorithms
can be summarized as multi-view transfer learning 15–17, multi-view dimensionality reduction 18–20,
multi-view clustering 21– 28, multi-view discriminant analysis 29,30,
multi-view semi-supervised learning 8,9
and multi-task multi-view learning 31– 35

Equation 1

Rough Set Theory

When asking to a
computer scientist about rough set the first two common words they use are lower and the upper
approximation. In fact, beyond the commons words rough set theory deal with
uncertainty, vagueness and discernibility. In 1991, Pawlak introduced the rough
set theory toward its fundamental concept of funding the lower and upper
approximation. However, over the time the concept evolved. A different set theoretic approach which also uses the
concept of membership functions, namely rough sets (introduced by Pawlak in
1982 668), is sometimes confused with fuzzy sets. While both fuzzy sets and
rough sets make use of membership functions, rough sets differ in the sense
that a lower and upper approximation to the rough set is determined. The lower
approximation consists of all elements that belong with full certainty to the
corresponding set, while the upper approximation consists of elements that may
possibly belong to the set. Rough sets are frequently used in machine learning
as classifier, where they are used to find the smallest number of features to
discern between classes. Rough sets are also used for extracting knowledge from
incomplete data Computational Intelligence Second edition p.452. For
the good understanding of RST let first define an information system then we
use that to give more detail.

Information system:

Let assume
that, an ordered pair ? = (U, A), where U is the universe
of discourse and A is a non-empty set of attributes. The universe of discourse is a set of objects
(or patterns, examples), while the attributes define the characteristics of a
single object. Each attribute a ? A is a
function a: U ? Va, where Va is the range of values for
attribute a.

We call lower
approximation the region with the highest probability to find the object and
upper approximation its opposite. In some case it may be not available enough
information whether the object belong to the upper or the lower region, such
objects are regrouped in the boundary region which is full of uncertainty. In
other case it may appear that two objects have the same values for these
attributes. If so, they are indiscernible.

The indiscernibility
relation is defined as:


where B ? A. With U/IND(B) is denoted the set of equivalence
classes in the relation IND(B). That is, U/IND(B) contains one
class for each set of objects that satisfy IND(B) over all attributes in B. Objects are therefore grouped
together, where the objects in different groups cannot be discerned between.

A discernibility matrix is a
two-dimensional matrix where the equivalence classes form the indices, and each
element is the set of attributes that can be used to discern between the
corresponding classes. Formally, for a set of attributes B ? A in A = (U, A), the
discernibility matrix MD(B) is defined as


for 1 ? i,j ? n, and n = |U/IND(B)|, with


for i,j = 1,···,n; a(Ei) indicates that attribute a belongs to equivalence class Ei.

Using the discernibility
matrix, discernibility functions can be defined to compute the minimal number
of attributes necessary to discern equivalence classes from one another. The
discernibility function f(B), with B ? A, is defined as



where                                                                                                                                           (23.4)


and  is the Boolean variable
associated with a, and is
the disjunction over the set of Boolean variables, and ? denotes conjunction.

The discernibility function f(B)
finds the minimal set of attributes required to discern any equivalence class
from all others. Alternatively, the relative discernibility function f(E,
B) finds the minimal set of attributes required to discern a given class, E, from the other classes, using the set
of attributes, B. That is,





I'm Morris!

Would you like to get a custom essay? How about receiving a customized one?

Check it out