Workshop in Vienna

目录 DM, Life







Visualization of My Research Interests

目录 DM


  1. Mendeley
  2. Python


  1. Export papers in Mendeley to bib file.
  2. Cleaning via Python nltk.
  3. Frequency counter via Python (Github)
  4. Cleaning via Python nltk.
    • remove phrases with low frequency.
    • remove meaningless phrases
  5. Import the output file to WordArt
  6. Visualization

Lecture Note of Scientometrics

目录 DM

Rectenly, Prof. Liu JianGuo, from SUFE, gave a lecture about scientometrics in our center. The content is mainly resolved to real applications, and it shows very interesting problems. The lecture has four parts:

  • Ranking of Research Institutions Based on Citation Relations
  • Research of Deep Learning Based Quantitative Trading Strategies
  • Quantitative Trading Strategies Based on the Degree of Attention of Stocks
  • Which Kinds of Disclosed Information Can Help You Get a Loan From P2P


目录 DM




  • 某个地区出租车的供需预测
  • 交通灯控制(红绿灯 – 交通摄像头)

其中,第一个问题可以抽象成 时序数据上的回归问题,和传统的机器学习比较接近。第二个问题,在我自己了解到的方向中较少出现,因为是涉及到决策的东西。摄像头提供交通流数据,以此作为切换红绿灯或者红绿灯频率的决策基础,而决策也会影响下一个时间点的交通流。这个过程可以建模成强化学习。


  • coordination in the transportation network
  • continual adaption with changing environment
  • events detection
  • data quality(sparsity, noise)




虽然研究的内容是传统的数据挖掘与机器学习,以shallow model为主,但其中涉及到的基础问题也是DL中关注的问题。与一开始就做DL的同学相比,我们对于各种基础理论的学习,如压缩感知,线性代数,各种凸优化非凸优化;以及一些不入流的“小技巧”,锚点的学习,信息的传播和最大化间隔,等等;要更加熟悉。这些思想,在DL中也存在,也还待发掘。保持一种积极学习的心态,兼收并蓄,相信长期的积累,还是会带来回报。


Structural SVMs with its Application in Recommender System [Seminar Note]

目录 DM
Paper Sharing: Predicting Diverse Subsets Using Structural SVMs [ICML’08]

Diversity in retrieval tasks, reduces redundancy, showing more information.
e.g. A set of documents with high topic diversity ensures that fewer users abandon the query because no results are relevant to them.
In short, high diversity will cover more needs for different users, though the accuracy may not be good.

Candidate set: x = {x_i}, i = 1 … n
Topic set: T = {T_i}, i = 1 … n; T_i contains x_i, different topic sets may overlap.

The topic set T is unknow, thus the learning problem is to find a function for predicting y in the absence of T.
Is T the latent variable ??
– In general, the subtopics are unknown. We instead assume that the candidate set contains discriminating features which separates subtopics from each other, and these are primarily based on word frequencies.
The goal is to select K documents from x which maximizes subtopic coverage.

Keypoint: Diversity -> Covering more subtopics -> Covering more words

Method Overview

D1, D2, D3 are three documents, V1, V2, … , V5 are words.
weight word importance (more distinct words = more information)

After D1 is selected in the first iteration, which covers V3, V4, V5;
In the second iteration, we only focus on V1 and V2.

– Feature space based on word frequency
– Optimizes for subtopic loss (Structural-SVM)

The process of this model is sophisticated. Feature space is based on word frequency, and it further divided into “bag of words” (subtopic).

From my point of view, a reason should be: each document has too many words, so dividing document into subtopics is reasonable, and this approach will reduce the overlapping between subtopics of different documents.
In each iteration, we learn the most representative subtopic, then choose the related document until we get K documents.

Remark: Structural-SVM repeatedly finds the next most violated constraint until set of constraints is a good approximation.

This paper is very interesting.
My doubts are:

  1. Can frequency of word reflect the true relevance of the document to a certain topic?
  2. How to find subtopics?

Further Reading
Learning to Recommend Accurate and Diverse Items [WWW’17]

An Intro to Subspace Clustering [Seminar Note]

目录 DM

Subspace Clustering

For the generation of clusters, often a part of features are relevant, especially for the high dimensional data. From this point of view, a number of clustering methods are proposed to find clusters in different subspaces within a dataset.

Two Perspectives

  • There exists subspace in data, so we search for the most representative feature subsets.
  • As there are clusters in different subspaces, features are more dense in each subspace cluster. Instance within a cluster can be represented by other instances within the same cluster. From this perspective, we seek to learn a representation of data, which yield X=XC.

The first perspective motivates many data mining algorithms (see survey by Parsons L. et. [1]). But due to the complexity of those algorithms, they can not handle large scale datasets. Recently, the majority of subspace clustering researchs are considering from the second perspective. By making different assumptions: the sparsity or the low-rank property, these methods can be generally divided into sparse subspace clustering [2] or low-rank subspace clustering [3].

Note that learning a representation of itself in the form of X=XC is very simple. To improve this model, a sort of algorithms consider using dictionary learning in subspace clustering, to learn a clean dictionary and an informative code, which yield X=DC. That is a big topic, see survey by Zhang Z. etc. [4] and survey by C Bao. etc [5].

Paper Sharing: Deep Adaptive Clustering [6]

In image clustering, existing methods often ignore the combination between feature learning and clustering.

DAC is based on deep network, so we just give the flowchart. Firstly, a trained ConvNet is given to generate features, which guarantee the basic capacity of separation. Based on the learned features, traditional similarity learning is applied to find similar pairs and dissimilar pairs, similar to must-link and cannot-link in network mining. After obtaining those constraints, DAC goes back to train the ConvNet. That is one iteration.

The hint of DAC are that:

  • It adopts a classification framework for clustering.
  • The learned features tend to be one-hot vectors by introducing a constraint into DAC. Thus clustering can be performed by locating the largest response of the learned features.

The performance of DAC is strongly dependent on the initialization of ConvNet. It is learned by another method.

There are other ideas that using “supervised” model to clustering task. For example [7]


[1] Parsons L, Haque E, Liu H. Subspace clustering for high dimensional data: a review[J]. Acm Sigkdd Explorations Newsletter, 2004, 6(1): 90-105.

[2] Elhamifar E, Vidal R. Sparse subspace clustering[C]//Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009: 2790-2797.

[3] Vidal R, Favaro P. Low rank subspace clustering (LRSC)[J]. Pattern Recognition Letters, 2014, 43: 47-61.

[4] Zhang Z, Xu Y, Yang J, et al. A survey of sparse representation: algorithms and applications[J]. IEEE access, 2015, 3: 490-530.

[5] Bao C, Ji H, Quan Y, et al. Dictionary learning for sparse coding: Algorithms and convergence analysis[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 38(7): 1356-1369.

[6] Chang J, Wang L, Meng G, et al. Deep Adaptive Image Clustering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 5879-5887.

[7] Liu H, Han J, Nie F, et al. Balanced Clustering with Least Square Regression[C]//AAAI. 2017: 2231-2237.

Feature Selection相关笔记

目录 DM

计划做一篇属性网络(Attributed Network)有关的工作。不同于拓扑网络,属性网络中的节点带有“特征”,因此,该问题中,除了网络的邻接矩阵,也存在节点属性构成的矩阵。研究属性网络,对理解网络的生成机制具有很现实的意义,如社交网络中好友关系的形成,也可以更好地解决社团挖掘(community detection)、链路预测(link predicting)这些衍生问题。

在了解的过程中,发现,属性网络的研究,有一部分是做在 特征选择 上。因为属性网络中节点的特征维度一般很低,且不稀疏,所以存在特征选择的可行性。联想到社交网络中,连边的产生并不一定是因为两节点所有的特征都很相似,可能是因为某些特别的地方,两节点之间的连边产生了,因此特征选择在理论上也具有一定合理性。