Loading...
Bimonthly    Since 1986
ISSN 1004-9037
/
Indexed in:
SCIE, Ei, INSPEC, JST, AJ, MR, CA, DBLP, etc.
Publication Details
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Published by: SCIENCE PRESS, BEIJING, CHINA
Distributed by:
China: All Local Post Offices
 
  • Table of Content
      05 May 2012, Volume 27 Issue 3   
    For Selected: View Abstracts Toggle Thumbnails
    Special Issue on Social Network Mining
    Preface
    Qiang Yang, Jie Tang, Lei Zhang, Bin Cao
    Journal of Data Acquisition and Processing, 2012, 27 (3): 451-454. 
    Abstract   PDF(347KB) ( 1985 )  
    Recently, there has been a dramatic increase in research on data mining on social networks and social media. The ubiquitous nature of Web-enabled devices, including desktops, laptops, tablets, and mobile phones, enables users to participate and interact with each other in various Web communities. Examples of such communities include forums, newsgroups, blogs, microblogs, bookmarking services, photo sharing platforms, and location-based services. The rapidly evolving social Web provides a platform for communication, information sharing, and collaboration. A vast amount of heterogeneous data (composed of e.g., text, photos, video, links) has been generated by the users of various social communities, which offers excellent opportunity for studying novel theories and technologies for social Web search and mining.
    Community Detection in Dynamic Social Networks Based on Multiobjective Immune Algorithm
    Mao-Guo Gong (公茂果), Senior Member, CCF, Member, ACM, IEEE, Ling-Jun Zhang (张岭军), Jing-Jing Ma (马晶晶), and Li-Cheng Jiao (焦李成), Senior Member, CCF, IEEE
    Journal of Data Acquisition and Processing, 2012, 27 (3): 455-467. 
    Abstract   PDF(3288KB) ( 4315 )  
    Community structure is one of the most important properties in social networks, and community detection has received an enormous amount of attention in recent years. In dynamic networks, the communities may evolve over time so that pose more challenging tasks than in static ones. Community detection in dynamic networks is a problem which can naturally be formulated with two contradictory objectives and consequently be solved by multiobjective optimization algorithms. In this paper, a novel multiobjective immune algorithm is proposed to solve the community detection problem in dynamic networks. It employs the framework of nondominated neighbor immune algorithm to simultaneously optimize the modularity and normalized mutual information, which quantitatively measure the quality of the community partitions and temporal cost, respectively. The problem-specific knowledge is incorporated in genetic operators and local search to improve the effectiveness and efficiency of our method. Experimental studies based on four synthetic datasets and two real-world social networks demonstrate that our algorithm can not only find community structure and capture community evolution more accurately but also be more steadily than the state-of-the-art algorithms.
    Balanced Multi-Label Propagation for Overlapping Community Detection in Social Networks
    Zhi-Hao Wu (武志昊), You-Fang Lin (林友芳), Steve Gregory, Huai-Yu Wan (万怀宇), Student Member, CCF, and Sheng-Feng Tian (田盛丰)
    Journal of Data Acquisition and Processing, 2012, 27 (3): 468-479. 
    Abstract   PDF(6468KB) ( 4292 )  
    In this paper, we propose a balanced multi-label propagation algorithm (BMLPA) for overlapping community detection in social networks. As well as its fast speed, another important advantage of our method is good stability, which other multi-label propagation algorithms, such as COPRA, lack. In BMLPA, we propose a new update strategy, which requires that community identifiers of one vertex should have balanced belonging coefficients. The advantage of this strategy is that it allows vertices to belong to any number of communities without a global limit on the largest number of community memberships, which is needed for COPRA. Also, we propose a fast method to generate "rough cores", which can be used to initialize labels for multi-label propagation algorithms, and are able to improve the quality and stability of results. Experimental results on synthetic and real social networks show that BMLPA is very efficient and effective for uncovering overlapping communities.
    Discovering Typed Communities in Mobile Social Networks
    Huai-Yu Wan (万怀宇), Student Member, CCF, You-Fang Lin (林友芳), Zhi-Hao Wu (武志昊), and Hou-Kuan Huang (黄厚宽), Senior Member, CCF
    Journal of Data Acquisition and Processing, 2012, 27 (3): 480-491. 
    Abstract   PDF(3939KB) ( 1917 )  
    Mobile social networks, which consist of mobile users who communicate with each other using cell phones, are reflections of people's interactions in social lives. Discovering typed communities (e.g., family communities or corporate communities) in mobile social networks is a very promising problem. For example, it can help mobile operators to determine the target users for precision marketing. In this paper we propose discovering typed communities in mobile social networks by utilizing the labels of relationships between users. We use the user logs stored by mobile operators, including communication and user movement records, to collectively label all the relationships in a network, by employing an undirected probabilistic graphical model, i.e., conditional random fields. Then we use two methods to discover typed communities based on the results of relationship labeling: one is simply retaining or cutting relationships according to their labels, and the other is using sophisticated weighted community detection algorithms. The experimental results show that our proposed framework performs well in terms of the accuracy of typed community detection in mobile social networks.
    Mining Trust Relationships from Online Social Networks
    Yu Zhang (张宇), Member, CCF, ACM, and Tong Yu (于彤)
    Journal of Data Acquisition and Processing, 2012, 27 (3): 492-505. 
    Abstract   PDF(3159KB) ( 3417 )  
    With the growing popularity of online social network, trust plays a more and more important role in connecting people to each other. We rely on our personal trust to accept recommendations, to make purchase decisions and to select transaction partners in the online community. Therefore, how to obtain trust relationships through mining online social networks becomes an important research topic. There are several shortcomings of existing trust mining methods. First, trust is category-dependent. However, most of the methods overlook the category attribute of trust relationships, which leads to low accuracy in trust calculation. Second, since the data in online social networks cannot be understood and processed by machines directly, traditional mining methods require much human effort and are not easily applied to other applications. To solve the above problems, we propose a semantic-based trust reasoning mechanism to mine trust relationships from online social networks automatically. We emphasize the category attribute of pairwise relationships and utilize Semantic Web technologies to build a domain ontology for data communication and knowledge sharing. We exploit role-based and behavior-based reasoning functions to infer implicit trust relationships and category-specific trust relationships. We make use of path expressions to extend reasoning rules so that the mining process can be done directly without much human effort. We perform experiments on real-life data extracted from Epinions. The experimental results verify the effectiveness and wide application use of our proposed method.
    Spam Short Messages Detection via Mining Social Networks
    Jian-Yun Liu (刘建芸), Yu-Hang Zhao (赵宇航), Member, CCF, Zhao-Xiang Zhang (张兆翔), Member, CCF, ACM, IEEE, Yun-Hong Wang (王蕴红), Member, CCF, ACM, IEEE, Xue-Mei Yuan (袁雪梅), Lei Hu (胡磊), Member, CCF, and Zhe
    Journal of Data Acquisition and Processing, 2012, 27 (3): 506-514. 
    Abstract   PDF(475KB) ( 2176 )  
    Short message service (SMS) is now becoming an indispensable way of social communication, and the problem of mobile spam is getting increasingly serious. We propose a novel approach for spam messages detection. Instead of conventional methods that focus on keywords or flow rate filtering, our system is based on mining under a more robust structure: the social network constructed with SMS. Several features, including static features, dynamic features and graph features, are proposed for describing activities of nodes in the network in various ways. Experimental results operated on real dataset prove the validity of our approach.
    Summarizing Large-Scale Database Schema Using Community Detection
    Xue Wang (王雪), Xuan Zhou (周烜), and Shan Wang; (王珊), Senior Member, CCF, Member, ACM
    Journal of Data Acquisition and Processing, 2012, 27 (3): 515-526. 
    Abstract   PDF(2884KB) ( 1532 )  
    Schema summarization on large-scale databases is a challenge. In a typical large database schema, a great proportion of the tables are closely connected through a few high degree tables. It is thus difficult to separate these tables into clusters that represent different topics. Moreover, as a schema can be very big, the schema summary needs to be structured into multiple levels, to further improve the usability. In this paper, we introduce a new schema summarization approach utilizing the techniques of community detection in social networks. Our approach contains three steps. First, we use a community detection algorithm to divide a database schema into subject groups, each representing a specific subject. Second, we cluster the subject groups into abstract domains to form a multi-level navigation structure. Third, we discover representative tables in each cluster to label the schema summary. We evaluate our approach on Freebase, a real world large-scale database. The results show that our approach can identify subject groups precisely. The generated abstract schema layers are very helpful for users to explore database.
    Personalized Tag Recommendation Using Social Influence
    Jun Hu (胡军), Bing Wang (王兵), Yu Liu (刘禹), Member, CCF, and De-Yi Li (李德毅), Fellow, CCF
    Journal of Data Acquisition and Processing, 2012, 27 (3): 527-540. 
    Abstract   PDF(991KB) ( 2538 )  
    Tag recommendation encourages users to add more tags in bridging the semantic gap between human concept and the features of media object, which provides a feasible solution for content-based multimedia information retrieval. In this paper, we study personalized tag recommendation in a popular online photo sharing site —— Flickr. Social relationship information of users is collected to generate an online social network. From the perspective of network topology, we propose node topological potential to characterize user's social influence. With this metric, we distinguish different social relations between users and find out those who really have influence on the target users. Tag recommendations are based on tagging history and the latent personalized preference learned from those who have most influence in user's social network. We evaluate our method on large scale real-world data. The experimental results demonstrate that our method can outperform the non-personalized global co-occurrence method and other two state-of-the-art personalized approaches using social networks. We also analyze the further usage of our approach for the cold-start problem of tag recommendation.
    Effective and Efficient Multi-Facet Web Image Annotation
    Jia Chen (陈佳), Yi-He Zhu (朱一和), Hao-Fen Wang (王昊奋), Wei Jin (晋薇), and Yong Yu (俞勇)
    Journal of Data Acquisition and Processing, 2012, 27 (3): 541-553. 
    Abstract   PDF(1537KB) ( 1828 )  
    The vast amount of images available on the Web request for an effective and efficient search service to help users find relevant images. The prevalent way is to provide a keyword interface for users to submit queries. However, the amount of images without any tags or annotations are beyond the reach of manual efforts. To overcome this, automatic image annotation techniques emerge, which are generally a process of selecting a suitable set of tags for a given image without user intervention. However, there are three main challenges with respect to Web-scale image annotation: scalability, noise-resistance and diversity. Scalability has a twofold meaning: first an automatic image annotation system should be scalable with respect to billions of images on the Web; second it should be able to automatically identify several relevant tags among a huge tag set for a given image within seconds or even faster. Noise-resistance means that the system should be robust enough against typos and ambiguous terms used in tags. Diversity represents that image content may include both scenes and objects, which are further described by multiple different image features constituting different facets in annotation. In this paper, we propose a unified framework to tackle the above three challenges for automatic Web image annotation. It mainly involves two components: tag candidate retrieval and multi-facet annotation. In the former content-based indexing and concept-based codebook are leveraged to solve scalability and noise-resistance issues. In the latter the joint feature map has been designed to describe different facets of tags in annotations and the relations between these facets. Tag graph is adopted to represent tags in the entire annotation and the structured learning technique is employed to construct a learning model on top of the tag graph based on the generated joint feature map. Millions of images from Flickr are used in our evaluation. Experimental results show that we have achieved 33% performance improvements compared with those single facet approaches in terms of three metrics: precision, recall and F1 score.
    ImprovingWeb Document Clustering through Employing User-Related Tag Expansion Techniques
    Peng Li (李鹏), Bin Wang (王斌), Senior Member, CCF, Member, ACM, IEEE, and Wei Jin (晋薇)
    Journal of Data Acquisition and Processing, 2012, 27 (3): 554-566. 
    Abstract   PDF(515KB) ( 2681 )  
    As high quality descriptors of web page semantics, social annotations or tags have been used for web document clustering and achieved promising results. However, most web pages have few tags (less than 10). This sparsity seriously limits the usage of tags for clustering. In this work, we propose a user-related tag expansion method to overcome this problem, which incorporates additional useful tags into the original tag document by utilizing user tagging data as background knowledge. Unfortunately, simply adding tags may cause topic drift, i.e., the dominant topic(s) of the original document may be changed. To tackle this problem, we have designed a novel generative model called Folk-LDA, which jointly models original and expanded tags as independent observations. Experimental results show that 1) our user-related tag expansion method can be effectively applied to over 90% tagged web documents; 2) Folk-LDA can alleviate topic drift in expansion, especially for those topic-specific documents; 3) the proposed tag-based clustering methods significantly outperform the word-based methods, which indicates that tags could be a better resource for the clustering task.
    Exponential Fuzzy C-Means for Collaborative Filtering
    Kiatichai Treerattanapitak and Chuleerat Jaruskulchai
    Journal of Data Acquisition and Processing, 2012, 27 (3): 567-576. 
    Abstract   PDF(481KB) ( 1805 )  
    Collaborative filtering (CF) is one of the most popular techniques behind the success of recommendation system. It predicts the interest of users by collecting information from past users who have the same opinions. The most popular approaches used in CF research area are Matrix factorization methods such as SVD. However, many well-known recommendation systems do not use this method but still stick with Neighborhood models because of simplicity and explainability. There are some concerns that limit neighborhood models to achieve higher prediction accuracy. To address these concerns, we propose a new exponential fuzzy clustering (XFCM) algorithm by reformulating the clustering's objective function with an exponential equation in order to improve the method for membership assignment. The proposed method assigns data to the clusters by aggressively excluding irrelevant data, which is better than other fuzzy C-means (FCM) variants. The experiments show that XFCM-based CF improved 6.9% over item-based method and 3.0% over SVD in terms of mean absolute error for 100K and 1M MovieLens dataset.
    An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method
    Farnoush Farhadi, Maryam Sorkhi, Sattar Hashemi, and Ali Hamzeh
    Journal of Data Acquisition and Processing, 2012, 27 (3): 577-590. 
    Abstract   PDF(2945KB) ( 1673 )  
    The growth of social networks in modern information systems has enabled the collaboration of experts at a scale that was unseen before. Given a task and a graph of experts where each expert possesses some skills, we tend to find an effective team of experts who are able to accomplish the task. This team should consider how team members collaborate in an effective manner to perform the task as well as how efficient the team assignment is, considering each expert has the minimum required level of skill. Here, we generalize the problem in multiple perspectives. First, a method is provided to determine the skill level of each expert based on his/her skill and collaboration among neighbors. Second, the graph is aggregated to the set of skilled expert groups that are strongly correlated based on their skills as well as the best connection among them. By considering the groups, search space is significantly reduced and moreover it causes to prevent from the growth of redundant communication costs and team cardinality while assigning the team members. Third, the existing RarestFirst algorithm is extended to more generalized version, and finally the cost definition is customized to improve the efficiency of selected team. Experiments on DBLP co-authorship graph show that in terms of efficiency and effectiveness, our proposed framework is achieved well in practice.
    Personalized Semantic Based Blog Retrieval
    Godfrey Winster Sathianesan and Swamynathan Sankaranarayanan
    Journal of Data Acquisition and Processing, 2012, 27 (3): 591-598. 
    Abstract   PDF(1459KB) ( 2827 )  
    Blog retrieval is a complex task because of the informal language usage. Blogs deviate from the language which is used in traditional corpora largely due to various reasons. Spelling errors, grammatical irregularity, over use of abbreviations and symbolic characters like emotions are a few reasons of irregular corpus blogs. To make the retrieval of blogs easier, the novel idea of personalized semantic based blog retrieval (PSBBR) system is discussed in this paper. The blogs are tagged with a relationship to one another with reference to ontology. The meanings of the blog content and key term are tagged as XML tags. The query term accesses the XML tags to retrieve entire blog content. The system is evaluated with a huge number of blogs extracted from various blog sources. Relevance score is calculated for every blog associated with keywords and content-based importance (CBI) gives the content similarity to the query word. The experimental result shows the system performs well for the blog retrieval process.
    Community-Aware Resource Profiling for Personalized Search in Folksonomy
    Hao-Ran Xie (谢浩然), Qing Li (李青), Senior Member, CCF, IEEE, and Yi Cai (蔡毅), Member, CCF
    Journal of Data Acquisition and Processing, 2012, 27 (3): 599-610. 
    Abstract   PDF(1512KB) ( 2333 )  
    In recent years, there is a fast proliferation of collaborative tagging (a.k.a. folksonomy) systems in Web 2.0 communities. With the increasingly large amount of data, how to assist users in searching their interested resources by utilizing these semantic tags becomes a crucial problem. Collaborative tagging systems provide an environment for users to annotate resources, and most users give annotations according to their perspectives or feelings. However, users may have different perspectives or feelings on resources, e.g., some of them may share similar perspectives yet have a conflict with others. Thus, modeling the profile of a resource based on tags given by all users who have annotated the resource is neither suitable nor reasonable. We propose, to tackle this problem in this paper, a community-aware approach to constructing resource profiles via social filtering. In order to discover user communities, three different strategies are devised and discussed. Moreover, we present a personalized search approach by combining a switching fusion method and a revised needs-relevance function, to optimize personalized resources ranking based on user preferences and user issued query. We conduct experiments on a collected real life dataset by comparing the performance of our proposed approach and baseline methods. The experimental results verify our observations and effectiveness of proposed method.
    Performance Characterization of Game Recommendation Algorithms on Online Social Network Sites
    Philip Leroux, Student Member, IEEE, Bart Dhoedt, Member, IEEE, Piet Demeester, Fellow, IEEE, and Filip De Turck, Senior Member, IEEE
    Journal of Data Acquisition and Processing, 2012, 27 (3): 611-623. 
    Abstract   PDF(1817KB) ( 1826 )  
    Since years, online social networks have evolved from profile and communication websites to online portals where people interact with each other, share and consume multimedia-enriched data and play different types of games. Due to the immense popularity of these online games and their huge revenue potential, the number of these games increases every day, resulting in a current offering of thousands of online social games. In this paper, the applicability of neighborhood-based collaborative filtering (CF) algorithms for the recommendation of online social games is evaluated. This evaluation is based on a large dataset of an online social gaming platform containing game ratings (explicit data) and online gaming behavior (implicit data) of millions of active users. Several similarity metrics were implemented and evaluated on the explicit data, implicit data and a combination thereof. It is shown that the neighborhood-based CF algorithms greatly outperform the content-based algorithm, currently often used on online social gaming websites. The results also show that a combined approach, i.e., taking into account both implicit and explicit data at the same time, yields overall good results on all evaluation metrics for all scenarios, while only slightly performing worse compared to the strengths of the explicit or implicit only approaches. The best performing algorithms have been implemented in a live setup of the online game platform.
    Topology-Based Recommendation of Users in Micro-Blogging Communities
    Marcelo G. Armentano, Daniela Godoy, and Analia Amandi
    Journal of Data Acquisition and Processing, 2012, 27 (3): 624-634. 
    Abstract   PDF(1460KB) ( 9580 )  
    Nowadays, more and more users share real-time news and information in micro-blogging communities such as Twitter, Tumblr or Plurk. In these sites, information is shared via a followers/followees social network structure in which a follower will receive all the micro-blogs from the users he/she follows, named followees. With the increasing number of registered users in this kind of sites, finding relevant and reliable sources of information becomes essential. The reduced number of characters present in micro-posts along with the informal language commonly used in these sites make it difficult to apply standard content-based approaches to the problem of user recommendation. To address this problem, we propose an algorithm for recommending relevant users that explores the topology of the network considering different factors that allow us to identify users that can be considered good information sources. Experimental evaluation conducted with a group of users is reported, demonstrating the potential of the approach.
    Exploiting Consumer Reviews for Product Feature Ranking
    Su-Ke Li (李素科), Zhi Guan (关志), Li-Yong Tang (唐礼勇), and Zhong Chen (陈钟), Member, CCF, IEEE
    Journal of Data Acquisition and Processing, 2012, 27 (3): 635-649. 
    Abstract   PDF(3804KB) ( 1878 )  
    Web 2.0 technology leads Web users to publish a large number of consumer reviews about products and services on various websites. Major product features extracted from consumer reviews may let product providers find what features are mostly cared by consumers, and also may help potential consumers to make purchasing decisions. In this work, we propose a linear regression with rules-based approach to ranking product features according to their importance. Empirical experiments show our approach is effective and promising. We also demonstrate two applications using our proposed approach. The first application decomposes overall ratings of products into product feature ratings. And the second application seeks to generate consumer surveys automatically.
    Phrase-Level Sentiment Polarity Classification Using Rule-Based Typed Dependencies and Additional Complex Phrases Consideration
    Luke Kien-Weng Tan (陈坚永), Jin-Cheon Na (罗镇川), Member, ACM, Yin-Leng Theng (邓燕玲), and Kuiyu Chang (张圭煜)
    Journal of Data Acquisition and Processing, 2012, 27 (3): 650-666. 
    Abstract   PDF(510KB) ( 3160 )  
    The advent of Web 2.0 has led to an increase in user-generated content on the Web. This has provided an extensive collection of free-style texts with opinion expressions that could influence the decisions and actions of their readers. Providers of such content exert a certain level of influence on the receivers and this is evident from blog sites having effect on their readers' purchase decisions, political view points, financial planning, and others. By detecting the opinion expressed, we can identify the sentiments on the topics discussed and the influence exerted on the readers. In this paper, we introduce an automatic approach in deriving polarity pattern rules to detect sentiment polarity at the phrase level, and in addition consider the effects of the more complex relationships found between words in sentiment polarity classification. Recent sentiment analysis research has focused on the functional relations of words using typed dependency parsing, providing a refined analysis on the grammar and semantics of textual data. Heuristics are typically used to determine the typed dependency polarity patterns, which may not comprehensively identify all possible rules. We study the use of class sequential rules (CSRs) to automatically learn the typed dependency patterns, and benchmark the performance of CSR against a heuristic method. Preliminary results show CSR leads to further improvements in classification performance achieving over 80% F1 scores in the test cases. In addition, we observe more complex relationships between words that could influence phrase sentiment polarity, and further discuss on possible approaches to handle the effects of these complex relationships.
SCImago Journal & Country Rank
 

ISSN 1004-9037

         

Home
Editorial Board
Author Guidelines
Subscription
Journal of Data Acquisition and Processing
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China

E-mail: info@sjcjycl.cn
 
  Copyright ©2015 JCST, All Rights Reserved