資料載入中.....
|
請使用永久網址來引用或連結此文件:
https://nccur.lib.nccu.edu.tw/handle/140.119/69229
|
題名: | 資源感知之社群媒體資料搜集平台:以推特為例 A resource-aware data collection platform for Twitter |
作者: | 許矢勇 Shiu, Shih Yung |
貢獻者: | 陳恭 Chen, Kung 許矢勇 Shiu, Shih Yung |
關鍵詞: | 推特 資源感知 社群媒體 Twitter Resource-aware Social media |
日期: | 2013 |
上傳時間: | 2014-08-25 15:21:49 (UTC+8) |
摘要: | 近年來社群媒體如推特、臉書、新浪微博等蓬勃地發展,不僅用戶數持續成長,也已成為人們日常生活中與朋友交流以及獲取資訊的一個重要管道。對於傳播與社會學者而言,社群媒體巨擘們掌握的巨量資料,是進行相關主題研究的一個重要資源。各大社群媒體雖然都有適度提供資料擷取的程式介面(API),但也或多或少地對資料搜集者加諸某些限制,導致資料的搜集發生困難。簡言之,研究人員必須在這些社群媒體提供的有限資源的限制下,設法優化所能取的資料集的質與量。有鑑於此,本研究以推特(twitter)為標的,實作一具資源感知之社群媒體資料搜集平台來協助學者蒐集推文(tweet)。 首先,本平台採用事件-工作的概念,讓使者用針對所關注的事件,選定不同的關鍵字進行蒐集的資料,這些不同的關鍵字即對應到系統的工作。其次,每個工作必須擁有存取代幣(access tokens)才能以蒐集推文,而每個代幣在一定時間內只能取得一定數量的推文,所以代幣是本平台的主要資源。為因應特殊事件發生時,推文暴增的常見情況,本平台提供了一個代幣池(token pool)的機制,讓眾多工作得以分享代幣資源,並善用推特API的存取選項,提供使用者可依蒐集資料時間點的差異,進行可取得推文數量的優化。在系統核心設計上,本研究提出「豪宅家務服務群(Mansion Household Service)」的概念,透過服務群內隨從(minion)們的分工合作,系統能夠在資源有限的情況下,仍然能夠同步執行多個不同的工作,有效降低推特所加諸的限制,對於推文搜集所造成的衝擊。我們並以實證方式,驗證我們平台的推文蒐集能力。 Recently, with the rapid development of social media such as Twitter, Facebook and Weibo, people have employed social media as a major channel for inter-personal communication and a daily source of various kinds of information. From the viewpoints of social science and humanity scholars, the digital footprints that people left on these social media are a rich resource for the study of human behaviors. However, these social media usually impose certain resource restrictions such as rate limiting on how scholars may use their API to retrieve their data. Therefore, we design and implement a resource-aware data collection platform for Twitter to help scholars retrieve historical tweets in an effective and efficient manner. Our platform employs the event-job approach to help users organize the tasks and the tweets to be collected. As each job requires an access token to fetch tweets, our platform provides a pool of tokens for system jobs to share so that access tokens will be maximally utilized. Besides, we leverage the tweet-id options in Twitter API and enable users to optimize the number of tweets to be collected depending on the timing of tweet collection. In the organization of the system core of tweet collection, we propose a so-called “Mansion Household System,” in which four-minions will corporate with each other to launch different jobs simultaneously and thus alleviate the impact from the restrictions which Twitter imposes via access tokens. To validate our design, we have conducted a series of experiments and the results are quite satisfying. |
參考文獻: | 【1】 Shamanth Kumar ,Fred Morstatter, Huan Liu. August 19,2013. Twitter Data Analytics. 【2】 周玉駿. 2013. 實作推特社群媒體的資料蒐集與管理服務. 【3】 Adam Marcus, Michael S.Bernstein, Osama Badar, David R.Karger, Samuel Madden, Robert C.Miller. 2012. Processing and Visualizing the Data in Tweets. 【4】 Lance Reagan Vick, Titus Soporan, Daniel Robert Lewis, Jane Brooks Zurn. 2012. Hybrid Browser/Server Collection of Streaming Social Media Data for Scalable Real-Time Analysis. 【5】 Matko Bosnjak, Eduardo Oliveira, Jose Martins, Eduarda Mendes Rodrigues, Luis Sarmento. 2012. TwitterEcho-A Distributed Focused Crawler to Support Open Research with Twitter Data. 【6】 Axel Bruns ,Yuxian Eugene Liang. Apr, 2012. Tools and methods for capturing Twitter data during natural disasters. 【7】 Twitter Application-only authentication: https://dev.twitter.com/docs/auth/application-only-auth 【8】 Twitter Search API: https://dev.twitter.com/docs/using-search 【9】 Aditi Das. Jan 17,2008. Understanding JPA,Part1: The object-oriented paradigm of data persistence. http://www.javaworld.com/article/2077817/java-se/understanding-jpa-part-1-the-object-oriented-paradigm-of-data-persistence.html 【10】 Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides. August 1994. Design Patterns Elements of Reusable Object-Oriented Software. 【11】 Adam Green, February 15,2013. Twitter API Engagement Programming with PHP and MySQL. |
描述: | 碩士 國立政治大學 資訊科學學系 100971001 102 |
資料來源: | http://thesis.lib.nccu.edu.tw/record/#G0100971001 |
資料類型: | thesis |
顯示於類別: | [資訊科學系] 學位論文
|
文件中的檔案:
檔案 |
大小 | 格式 | 瀏覽次數 |
100101.pdf | 2670Kb | Adobe PDF2 | 445 | 檢視/開啟 |
|
在政大典藏中所有的資料項目都受到原著作權保護.
|