How to write 5 paragraph essay: December 2019

Monday, December 30, 2019

Tips to Improve Your Spelling

Nothing makes your writing look unpolished like misspelled words. While we can depend on technology like spell checkers to let us know when weve made errors, there are limits to what technology can do.Ã‚ Read over this list of techniques and try to make them a part of your routine.Ã‚ 1. Make Yourself a List of Problem Words If there are certain words that you know you misspell frequently, make yourself a spelling list. Practice writing these words ten times each, just like you did in elementary school. Use flashcards to practice a little every night and eliminate words when you feel youve conquered them. 2. Keep a Problem Word File in Your Computer Each time you run a spell-checker and find a word that youve misspelled, copy and paste the word into your file. Later you can add it to your list (above). 3. Each Time You Practice a Word, Spell It Out Loud Later, you will recall how the word sounded as you spelled it right. Youll be surprised how well this works! 4. Review the Rules for Prefixes and Suffixes Youll avoid many mistakes once you understand the difference between inter and intra, for example. 5. Study Common Root Words of Words With Greek and Latin Origins This is a trick used by many Spelling Bee participants. Understanding etymology can add a layer of logic to word spellings that will make them easier to remember. 6. Memorize Clumps of Words That Belong to Special Groups For example, you will find that the group of words that contain ough (rhyming with tough) is finite and manageable. By observing words that do and dont belong together, you will reduce uncertainty about many similar words that dont make the list. More lists of special groups would include: aire words like questionnaire and millionairemn words like hymn and columnps words like psychology and pseudonymible words like edible and audible Be sure to revisit this list frequently. 7. Read Many words become familiar to us because we see them often. The more you read, the more words you will see, and the more youll memorize Ã¢â‚¬â€ even though you wont realize it. 8. Use a Pencil You can mark your books with light pencil marks to indicate words youd like to practice. Just remember to go back and erase! If you happen to use an eReader, be sure to highlight and bookmark words youd like to practice. 9. Practice With a Few Online Spelling Quizzes This is a good way to find frequently-misspelled or commonly-confused words. 10. Visualize Yourself Carrying Out an Activity to Match a Problem Word For example, if you have trouble remembering how to spell edible, conjure up and image of the word in your head, then picture yourself nibbling on the word. (Silly activities are often the most effective.) Any effort you make to improve your reading skills will have a surprising effect. Youll find that spelling becomes much easier with practice.

Sunday, December 22, 2019

Analysis Of The Battle Of Borodino - 1014 Words

This paper will analyze the Battle of Borodino. Within this analysis, I will examine weather TolstoyÃ¢â‚¬â„¢s treatment of NapoleonÃ¢â‚¬â„¢s statements, to include whether or not Tolstoy accurately describes the battle and if he expresses the horror of battle to which Napoleon alluded. Additionally, I will analyze whether or not the passage supports NapoleonÃ¢â‚¬â„¢s assessment of the French as victors and the Russians as invincible. Last I will analyze TolstoyÃ¢â‚¬â„¢s view of warfare was. This paper will give a basic understanding of the Battle of Borodino. To begin is to understand why Napoleon wanted this war and how the events in this conflict took place. Napoleon, is a well know person from history and stories of hi adventures are taught in most history classes. The text says Ã¢â‚¬Å"Napoleon began the war with Russia because he could not resist going to Dresden, could not help having his head turned by the homage he received, could not help donning a Polish uniform and yielding to the stimulating influence of a June morning, and could not refrain from bursts of anger in the presence of Kurakin and then of Balashev.Ã¢â‚¬ Tolstoy, L. (1869). Napoleon was a very proud and prideful person who wanted to be seen among the public. He liked the attention that he received when entering these conflicts and being in the lime light. He also believed that he could win any conflict that he entered and put this thinking above the lives of the men that would be doing the fighting. Ã¢â‚¬Å"The French, with the memory of all

Saturday, December 14, 2019

Open Domain Event Extraction from Twitter Free Essays

string(212) " approaches to event categorization would require \? st designing annotation guidelines $including selecting an appropriate set of types to annotate$, then annotating a large corpus of events found in Twitter\." Open Domain Event Extraction from Twitter Alan Ritter University of Washington Computer Sci. Eng. Seattle, WA aritter@cs. We will write a custom essay sample on Open Domain Event Extraction from Twitter or any similar topic only for you Order Now washington. edu Mausam University of Washington Computer Sci. Eng. Seattle, WA mausam@cs. washington. edu Oren Etzioni University of Washington Computer Sci. Eng. Seattle, WA etzioni@cs. washington. edu Sam Clark? Decide, Inc. Seattle, WA sclark. uw@gmail. com ABSTRACT Tweets are the most up-to-date and inclusive stream of information and commentary on current events, but they are also fragmented and noisy, motivating the need for systems that can extract, aggregate and categorize important events. Previous work on extracting structured representations of events has focused largely on newswire text; TwitterÃ¢â‚¬â„¢s unique characteristics present new challenges and opportunities for open-domain event extraction. This paper describes TwiCalÃ¢â‚¬â€ the ? rst open-domain event-extraction and categorization system for Twitter. We demonstrate that accurately extracting an open-domain calendar of signi? cant events from Twitter is indeed feasible. In addition, we present a novel approach for discovering important event categories and classifying extracted events based on latent variable models. By leveraging large volumes of unlabeled data, our approach achieves a 14% increase in maximum F1 over a supervised baseline. A continuously updating demonstration of our system can be viewed at http://statuscalendar. com; Our NLP tools are available at http://github. com/aritter/ twitter_nlp. Entity Steve Jobs iPhone GOP Amanda Knox Event Phrase died announcement debate verdict Date 10/6/11 10/4/11 9/7/11 10/3/11 Type Death ProductLaunch PoliticalEvent Trial Table 1: Examples of events extracted by TwiCal. vents. Yet the number of tweets posted daily has recently exceeded two-hundred million, many of which are either redundant [57], or of limited interest, leading to information overload. 1 Clearly, we can bene? t from more structured representations of events that are synthesized from individual tweets. Previous work in event extraction [21, 1, 54, 18, 43, 11, 7] has focused largely on news articles, as historically this genre of text has been the best source of information on curr ent events. Read also Twitter Case Study In the meantime, social networking sites such as Facebook and Twitter have become an important complementary source of such information. While status messages contain a wealth of useful information, they are very disorganized motivating the need for automatic extraction, aggregation and categorization. Although there has been much interest in tracking trends or memes in social media [26, 29], little work has addressed the challenges arising from extracting structured representations of events from short or informal texts. Extracting useful structured representations of events from this disorganized corpus of noisy text is a challenging problem. On the other hand, individual tweets are short and self-contained and are therefore not composed of complex discourse structure as is the case for texts containing narratives. In this paper we demonstrate that open-domain event extraction from Twitter is indeed feasible, for example our highest-con? dence extracted future events are 90% accurate as demonstrated in Ã‚ §8. Twitter has several characteristics which present unique challenges and opportunities for the task of open-domain event extraction. Challenges: Twitter users frequently mention mundane events in their daily lives (such as what they ate for lunch) which are only of interest to their immediate social network. In contrast, if an event is mentioned in newswire text, it 1 http://blog. twitter. com/2011/06/ 200-million-tweets-per-day. html Categories and Subject Descriptors I. 2. 7 [Natural Language Processing]: Language parsing and understanding; H. 2. [Database Management]: Database applicationsÃ¢â‚¬â€data mining General Terms Algorithms, Experimentation 1. INTRODUCTION Social networking sites such as Facebook and Twitter present the most up-to-date information and buzz about current ? This work was conducted at the University of Washington Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro? t or commercial advantage and that copies bear this notice and the full citation on the ? rst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speci? c permission and/or a fee. KDDÃ¢â‚¬â„¢12, August 12Ã¢â‚¬â€œ16, 2012, Beijing, China. Copyright 2012 ACM 978-1-4503-1462-6 /12/08 Ã¢â‚¬ ¦ $10. 00. is safe to assume it is of general importance. Individual tweets are also very terse, often lacking su? cient context to categorize them into topics of interest (e. g. Sports, Politics, ProductRelease etcÃ¢â‚¬ ¦ ). Further because Twitter users can talk about whatever they choose, it is unclear in advance which set of event types are appropriate. Finally, tweets are written in an informal style causing NLP tools designed for edited texts to perform extremely poorly. Opportunities: The short and self-contained nature of tweets means they have very simple discourse and pragmatic structure, issues which still challenge state-of-the-art NLP systems. For example in newswire, complex reasoning about relations between events (e. g. before and after ) is often required to accurately relate events to temporal expressions [32, 8]. The volume of Tweets is also much larger than the volume of news articles, so redundancy of information can be exploited more easily. To address TwitterÃ¢â‚¬â„¢s noisy style, we follow recent work on NLP in noisy text [46, 31, 19], annotating a corpus of Tweets with events, which is then used as training data for sequence-labeling models to identify event mentions in millions of messages. Because of the terse, sometimes mundane, but highly redundant nature of tweets, we were motivated to focus on extracting an aggregate representation of events which provides additional context for tasks such as event categorization, and also ? lters out mundane events by exploiting redundancy of information. We propose identifying important events as those whose mentions are strongly associated with references to a unique date as opposed to dates which are evenly distributed across the calendar. Twitter users discuss a wide variety of topics, making it unclear in advance what set of event types are appropriate for categorization. To address the diversity of events discussed on Twitter, we introduce a novel approach to discovering important event types and categorizing aggregate events within a new domain. Supervised or semi-supervised approaches to event categorization would require ? st designing annotation guidelines (including selecting an appropriate set of types to annotate), then annotating a large corpus of events found in Twitter. You read "Open Domain Event Extraction from Twitter" in category "Papers" This approach has several drawbacks, as it is apriori unclear what set of types should be annotated; a large amount of e? ort would be required to manually annotate a corpus of ev ents while simultaneously re? ning annotation standards. We propose an approach to open-domain event categorization based on latent variable models that uncovers an appropriate set of types which match the data. The automatically discovered types are subsequently inspected to ? lter out any which are incoherent and the rest are annotated with informative labels;2 examples of types discovered using our approach are listed in ? gure 3. The resulting set of types are then applied to categorize hundreds of millions of extracted events without the use of any manually annotated examples. By leveraging large quantities of unlabeled data, our approach results in a 14% improvement in F1 score over a supervised baseline which uses the same set of types. Stanford NER T-seg P 0. 62 0. 73 R 0. 5 0. 61 F1 0. 44 0. 67 F1 inc. 52% Table 2: By training on in-domain data, we obtain a 52% improvement in F1 score over the Stanford Named Entity Recognizer at segmenting entities in Tweets [46]. 2. SYSTEM OVERVIEW TwiCal extracts a 4-tuple representation of events which includes a named entity, event phrase, calendar date, and event type (see Table 1). This representation was chosen to closely match the way import ant events are typically mentioned in Twitter. An overview of the various components of our system for extracting events from Twitter is presented in Figure 1. Given a raw stream of tweets, our system extracts named entities in association with event phrases and unambiguous dates which are involved in signi? cant events. First the tweets are POS tagged, then named entities and event phrases are extracted, temporal expressions resolved, and the extracted events are categorized into types. Finally we measure the strength of association between each named entity and date based on the number of tweets they co-occur in, in order to determine whether an event is signi? cant. NLP tools, such as named entity segmenters and part of speech taggers which were designed to process edited texts (e. g. news articles) perform very poorly when applied to Twitter text due to its noisy and unique style. To address these issues, we utilize a named entity tagger and part of speech tagger trained on in-domain Twitter data presented in previous work [46]. We also develop an event tagger trained on in-domain annotated data as described in Ã‚ §4. 3. NAMED ENTITY SEGMENTATION NLP tools, such as named entity segmenters and part of speech taggers which were designed to process edited texts (e. g. ews articles) perform very poorly when applied to Twitter text due to its noisy and unique style. For instance, capitalization is a key feature for named entity extraction within news, but this feature is highly unreliable in tweets; words are often capitalized simply for emphasis, and named entities are often left all lowercase. In addition, tweets contain a higher proportion of out -ofvocabulary words, due to TwitterÃ¢â‚¬â„¢s 140 character limit and the creative spelling of its users. To address these issues, we utilize a named entity tagger trained on in-domain Twitter data presented in previous work [46]. Training on tweets vastly improves performance at segmenting Named Entities. For example, performance compared against the state-of-the-art news-trained Stanford Named Entity Recognizer [17] is presented in Table 2. Our system obtains a 52% increase in F1 score over the Stanford Tagger at segmenting named entities. 4. EXTRACTING EVENT MENTIONS This annotation and ? ltering takes minimal e? ort. One of the authors spent roughly 30 minutes inspecting and annotating the automatically discovered event types. 2 In order to extract event mentions from TwitterÃ¢â‚¬â„¢s noisy text, we ? st annotate a corpus of tweets, which is then 3 Available at http://github. com/aritter/twitter_nlp. Temporal Resolution S M T W T F S Tweets POS Tag NER Signi? cance Ranking Calend ar Entries Event Tagger Event Classi? cation Figure 1: Processing pipeline for extracting events from Twitter. New components developed as part of this work are shaded in grey. used to train sequence models to extract events. While we apply an established approach to sequence-labeling tasks in noisy text [46, 31, 19], this is the ? rst work to extract eventreferring phrases in Twitter. Event phrases can consist of many di? erent parts of speech as illustrated in the following examples: Ã¢â‚¬ ¢ Verbs: Apple to Announce iPhone 5 on October 4th?! YES! Ã¢â‚¬ ¢ Nouns: iPhone 5 announcement coming Oct 4th Ã¢â‚¬ ¢ Adjectives: WOOOHOO NEW IPHONE TODAY! CANÃ¢â‚¬â„¢T WAIT! These phrases provide important context, for example extracting the entity, Steve Jobs and the event phrase died in connection with October 5th, is much more informative than simply extracting Steve Jobs. In addition, event mentions are helpful in upstream tasks such as categorizing events into types, as described in Ã‚ §6. In order to build a tagger for recognizing events, we annotated 1,000 tweets (19,484 tokens) with event phrases, following annotation guidelines similar to those developed for the Event tags in Timebank [43]. We treat the problem of recognizing event triggers as a sequence labeling task, using Conditional Random Fields for learning and inference [24]. Linear Chain CRFs model dependencies between the predicted labels of adjacent words, which is bene? cial for extracting multi-word event phrases. We use contextual, dictionary, and orthographic features, and also include features based on our Twitter-tuned POS tagger [46], and dictionaries of event terms gathered from WordNet by Sauri et al. [50]. The precision and recall at segmenting event phrases are reported in Table 3. Our classi? er, TwiCal-Event, obtains an F-score of 0. 64. To demonstrate the need for in-domain training data, we compare against a baseline of training our system on the Timebank corpus. precision 0. 56 0. 48 0. 24 recall 0. 74 0. 70 0. 11 F1 0. 64 0. 57 0. 15 TwiCal-Event No POS Timebank Table 3: Precision and recall at event phrase extraction. All results are reported using 4-fold cross validation over the 1,000 manually annotated tweets (about 19K tokens). We compare against a system which doesnÃ¢â‚¬â„¢t make use of features generated based on our Twitter trained POS Tagger, in addition to a system trained on the Timebank corpus which uses the same set of features. as input a reference date, some text, and parts of speech (from our Twitter-trained POS tagger) and marks temporal expressions with unambiguous calendar references. Although this mostly rule-based system was designed for use on newswire text, we ? d its precision on Tweets (94% estimated over as sample of 268 extractions) is su? ciently high to be useful for our purposes. TempExÃ¢â‚¬â„¢s high precision on Tweets can be explained by the fact that some temporal expressions are relatively unambiguous. Although there appears to be room for improving the recall of temporal extraction on Twitter by handling no isy temporal expressions (for example see Ritter et. al. [46] for a list of over 50 spelling variations on the word Ã¢â‚¬Å"tomorrowÃ¢â‚¬ ), we leave adapting temporal extraction to Twitter as potential future work. . CLASSIFICATION OF EVENT TYPES To categorize the extracted events into types we propose an approach based on latent variable models which infers an appropriate set of event types to match our data, and also classi? es events into types by leveraging large amounts of unlabeled data. Supervised or semi-supervised classi? cation of event categories is problematic for a number of reasons. First, it is a priori unclear which categories are appropriate for Twitter. Secondly, a large amount of manual e? ort is required to annotate tweets with event types. Third, the set of important categories (and entities) is likely to shift over time, or within a focused user demographic. Finally many important categories are relatively infrequent, so even a large annotated dataset may contain just a few examples of these categories, making classi? cation di? cult. For these reasons we were motivated to investigate un- 5. EXTRACTING AND RESOLVING TEMPORAL EXPRESSIONS In addition to extracting events and related named entities, we also need to extract when they occur. In general there are many di? rent ways users can refer to the same calendar date, for example Ã¢â‚¬Å"next FridayÃ¢â‚¬ , Ã¢â‚¬Å"August 12thÃ¢â‚¬ , Ã¢â‚¬Å"tomorrowÃ¢â‚¬ or Ã¢â‚¬Å"yesterdayÃ¢â‚¬ could all refer to the same day, depending on when the tweet was written. To resolve temporal expressions we make use of TempEx [33], which takes Sports Party TV Politics Celebrity Music Movie Food Concert Performance Fitness Interview ProductRelease Meeting Fashion Finance School AlbumRele ase Religion 7. 45% 3. 66% 3. 04% 2. 92% 2. 38% 1. 96% 1. 92% 1. 87% 1. 53% 1. 42% 1. 11% 1. 01% 0. 95% 0. 88% 0. 87% 0. 85% 0. 85% 0. 78% 0. 71% Con? ct Prize Legal Death Sale VideoGameRelease Graduation Racing Fundraiser/Drive Exhibit Celebration Books Film Opening/Closing Wedding Holiday Medical Wrestling OTHER 0. 69% 0. 68% 0. 67% 0. 66% 0. 66% 0. 65% 0. 63% 0. 61% 0. 60% 0. 60% 0. 60% 0. 58% 0. 50% 0. 49% 0. 46% 0. 45% 0. 42% 0. 41% 53. 45% Label Sports Concert Perform TV Movie Sports Politics Figure 2: Complete list of automatically discovered event types with percentage of data covered. Interpretable types representing signi? cant events cover roughly half of the data. supervised approaches that will automatically induce event types which match the data. We adopt an approach based on latent variable models inspired by recent work on modeling selectional preferences [47, 39, 22, 52, 48], and unsupervised information extraction [4, 55, 7]. Each event indicator phrase in our data, e, is modeled as a mixture of types. For example the event phrase Ã¢â‚¬Å"cheeredÃ¢â‚¬ might appear as part of either a PoliticalEvent, or a SportsEvent. Each type corresponds to a distribution over named entities n involved in speci? c instances of the type, in addition to a distribution over dates d on which events of the type occur. Including calendar dates in our model has the e? ct of encouraging (though not requiring) events which occur on the same date to be assigned the same type. This is helpful in guiding inference, because distinct references to the same event should also have the same type. The generative story for our data is based on LinkLDA [15], and is presented as Algorithm 1. This approach has the advantage that information about an event ph raseÃ¢â‚¬â„¢s type distribution is shared across itÃ¢â‚¬â„¢s mentions, while ambiguity is also naturally preserved. In addition, because the approach is based on generative a probabilistic model, it is straightforward to perform many di? rent probabilistic queries about the data. This is useful for example when categorizing aggregate events. For inference we use collapsed Gibbs Sampling [20] where each hidden variable, zi , is sampled in turn, and parameters are integrated out. Example types are displayed in Figure 3. To estimate the distribution over types for a given event, a sample of the corresponding hidden variables is taken from the Gibbs markov chain after su? cient burn in. Prediction for new data is performed using a streaming approach to inference [56]. TV Product Meeting Top 5 Event Phrases tailgate Ã¢â‚¬â€œ scrimmage tailgating Ã¢â‚¬â€œ homecoming Ã¢â‚¬â€œ regular season concert Ã¢â‚¬â€œ presale Ã¢â‚¬â€œ performs Ã¢â‚¬â€œ concerts Ã¢â‚¬â€œ tickets matinee Ã¢â‚¬â€œ musical priscilla Ã¢â‚¬â€œ seeing wicked new season Ã¢â‚¬â€œ season ? nale Ã¢â‚¬â€œ ? nished season episodes Ã¢â‚¬â€œ new episode watch love Ã¢â‚¬â€œ dialogue theme Ã¢â‚¬â€œ inception Ã¢â‚¬â€œ hall pass Ã¢â‚¬â€œ movie inning Ã¢â‚¬â€œ innings pitched Ã¢â‚¬â€œ homered homer presidential debate osama Ã¢â‚¬â€œ presidential candidate Ã¢â‚¬â€œ republican debate Ã¢â‚¬â€œ debate performance network news broadcast Ã¢â‚¬â€œ airing Ã¢â‚¬â€œ primetime drama Ã¢â‚¬â€œ channel stream unveils Ã¢â‚¬â€œ unveiled Ã¢â‚¬â€œ announces Ã¢â‚¬â€œ launches wraps o? shows trading Ã¢â‚¬â€œ hall mtg Ã¢â‚¬â€œ zoning Ã¢â‚¬â€œ brie? g stocks Ã¢â‚¬â€œ tumbled Ã¢â‚¬â€œ trading report Ã¢â‚¬â€œ opened higher Ã¢â‚¬â€œ tumbles maths Ã¢â‚¬â€œ english test exam Ã¢â‚¬â€œ revise Ã¢â‚¬â€œ physics in stores Ã¢â‚¬â€œ album out debut album Ã¢â‚¬â€œ drops on Ã¢â‚¬â€œ hits stores voted o? Ã¢â‚¬â€œ idol Ã¢â‚¬â€œ scotty Ã¢â‚¬â€œ idol season Ã¢â‚¬â€œ dividendpaying sermon Ã¢â‚¬â€œ preaching preached Ã¢â‚¬â€œ worship preach declared war Ã¢â‚¬â€œ war shelling Ã¢â‚¬â€œ opened ? re wounded senate Ã¢â‚¬â€œ legislation Ã¢â‚¬â€œ repeal Ã¢â‚¬â€œ budget Ã¢â‚¬â€œ election winners Ã¢â‚¬â€œ lotto results enter Ã¢â‚¬â€œ winner Ã¢â‚¬â€œ contest bail plea Ã¢â‚¬â€œ murder trial Ã¢â‚¬â€œ sentenced Ã¢â‚¬â€œ plea Ã¢â‚¬â€œ convicted ? lm festival Ã¢â‚¬â€œ screening starring Ã¢â‚¬â€œ ? lm Ã¢â‚¬â€œ gosling live forever Ã¢â‚¬â€œ passed away Ã¢â‚¬â€œ sad news Ã¢â‚¬â€œ condolences Ã¢â‚¬â€œ burried add into Ã¢â‚¬â€œ 50% o? up shipping Ã¢â‚¬â€œ save up donate Ã¢â‚¬â€œ tornado relief disaster relief Ã¢â‚¬â€œ donated Ã¢â‚¬â€œ raise money Top 5 Entities espn Ã¢â‚¬â€œ ncaa Ã¢â‚¬â€œ tigers Ã¢â‚¬â€œ eagles Ã¢â‚¬â€œ varsity taylor swift Ã¢â‚¬â€œ toronto britney spears Ã¢â‚¬â€œ rihanna Ã¢â‚¬â€œ rock shrek Ã¢â‚¬â€œ les mis Ã¢â‚¬â€œ lee evans Ã¢â‚¬â€œ w icked Ã¢â‚¬â€œ broadway jersey shore Ã¢â‚¬â€œ true blood Ã¢â‚¬â€œ glee Ã¢â‚¬â€œ dvr Ã¢â‚¬â€œ hbo net? ix Ã¢â‚¬â€œ black swan Ã¢â‚¬â€œ insidious Ã¢â‚¬â€œ tron Ã¢â‚¬â€œ scott pilgrim mlb Ã¢â‚¬â€œ red sox Ã¢â‚¬â€œ yankees Ã¢â‚¬â€œ twins Ã¢â‚¬â€œ dl obama president obama Ã¢â‚¬â€œ gop Ã¢â‚¬â€œ cnn america nbc Ã¢â‚¬â€œ espn Ã¢â‚¬â€œ abc Ã¢â‚¬â€œ fox mtv apple Ã¢â‚¬â€œ google Ã¢â‚¬â€œ microsoft Ã¢â‚¬â€œ uk Ã¢â‚¬â€œ sony town hall Ã¢â‚¬â€œ city hall club Ã¢â‚¬â€œ commerce Ã¢â‚¬â€œ white house reuters Ã¢â‚¬â€œ new york Ã¢â‚¬â€œ u. . Ã¢â‚¬â€œ china Ã¢â‚¬â€œ euro english Ã¢â‚¬â€œ maths Ã¢â‚¬â€œ german Ã¢â‚¬â€œ bio Ã¢â‚¬â€œ twitter itunes Ã¢â‚¬â€œ ep Ã¢â‚¬â€œ uk Ã¢â‚¬â€œ amazon Ã¢â‚¬â€œ cd lady gaga Ã¢â‚¬â€œ american idol Ã¢â‚¬â€œ america Ã¢â‚¬â€œ beyonce Ã¢â‚¬â€œ glee church Ã¢â‚¬â€œ jesus Ã¢â‚¬â€œ pastor faith Ã¢â‚¬â€œ god libya Ã¢â‚¬â€œ afghanistan #syria Ã¢â‚¬â€œ syria Ã¢â‚¬â€œ nato senate Ã¢â‚¬â€œ house Ã¢â‚¬â€œ congress Ã¢â‚¬â€œ obama Ã¢â‚¬â€œ gop ipad Ã¢â‚¬â€œ award Ã¢â‚¬â€œ facebook Ã¢â ‚¬â€œ good luck Ã¢â‚¬â€œ winners casey anthony Ã¢â‚¬â€œ court Ã¢â‚¬â€œ india Ã¢â‚¬â€œ new delhi supreme court hollywood Ã¢â‚¬â€œ nyc Ã¢â‚¬â€œ la Ã¢â‚¬â€œ los angeles Ã¢â‚¬â€œ new york michael jackson afghanistan john lennon Ã¢â‚¬â€œ young Ã¢â‚¬â€œ peace groupon Ã¢â‚¬â€œ early bird facebook Ã¢â‚¬â€œ @etsy Ã¢â‚¬â€œ etsy japan Ã¢â‚¬â€œ red cross Ã¢â‚¬â€œ joplin Ã¢â‚¬â€œ june Ã¢â‚¬â€œ africa Finance School Album TV Religion Con? ict Politics Prize Legal Movie Death Sale Drive 6. 1 Evaluation To evaluate the ability of our model to classify signi? cant events, we gathered 65 million extracted events of the form Figure 3: Example event types discovered by our model. For each type t, we list the top 5 entities which have highest probability given t, and the 5 event phrases which assign highest probability to t. Algorithm 1 Generative story for our data involving event types as hidden variables. Bayesian Inference techniques are applied to invert the generative process and infer an appropriate set of types to describe the observed events. for each event type t = 1 . . . T do n Generate ? t according to symmetric Dirichlet distribution Dir(? n ). d Generate ? t according to symmetric Dirichlet distribution Dir(? d ). end for for each unique event phrase e = 1 . . . |E| do Generate ? e according to Dirichlet distribution Dir(? ). for each entity which co-occurs with e, i = 1 . . . Ne do n Generate ze,i from Multinomial(? e ). Generate the entity ne,i from Multinomial(? n ). e,i TwiCal-Classify Supervised Baseline Precision 0. 85 0. 61 Recall 0. 55 0. 57 F1 0. 67 0. 59 Table 4: Precision and recall of event type categorization at the point of maximum F1 score. d,i end for end for 0. 6 end for for each date which co-occurs with e, i = 1 . . . Nd do d Generate ze,i from Multinomial(? e ). Generate the date de,i from Multinomial(? zn ). Precision 0. 8 1. 0 listed in Figure 1 (not including the type). We then ran Gibbs Sampling with 100 types for 1,000 iterations of burnin, keeping the hidden variable assignments found in the last sample. One of the authors manually inspected the resulting types and assigned them labels such as Sports, Politics, MusicRelease and so on, based on their distribution over entities, and the event words which assign highest probability to that type. Out of the 100 types, we found 52 to correspond to coherent event types which referred to signi? cant events;5 the other types were either incoherent, or covered types of events which are not of general interest, for example there was a cluster of phrases such as applied, call, contact, job interview, etcÃ¢â‚¬ ¦ hich correspond to users discussing events related to searching for a job. Such event types which do not correspond to signi? cant events of general interest were simply marked as OTHER. A complete list of labels used to annotate the automatically discovered event types along wi th the coverage of each type is listed in ? gure 2. Note that this assignment of labels to types only needs to be done once and produces a labeling for an arbitrarily large number of event instances. Additionally the same set of types can easily be used to lassify new event instances using streaming inference techniques [56]. One interesting direction for future work is automatic labeling and coherence evaluation of automatically discovered event types analogous to recent work on topic models [38, 25]. In order to evaluate the ability of our model to classify aggregate events, we grouped together all (entity,date) pairs which occur 20 or more times the data, then annotated the 500 with highest association (see Ã‚ §7) using the event types discovered by our model. To help demonstrate the bene? s of leveraging large quantities of unlabeled data for event classi? cation, we compare against a supervised Maximum Entropy baseline which makes use of the 500 annotated events using 10-fold c ross validation. For features, we treat the set of event phrases To scale up to larger datasets, we performed inference in parallel on 40 cores using an approximation to the Gibbs Sampling procedure analogous to that presented by Newmann et. al. [37]. 5 After labeling some types were combined resulting in 37 distinct labels. 4 0. 4 Supervised Baseline TwiCal? Classify 0. 0 0. 2 0. 4 Recall 0. 0. 8 Figure 4: types. Precision and recall predicting event that co-occur with each (entity, date) pair as a bag-of-words, and also include the associated entity. Because many event categories are infrequent, there are often few or no training examples for a category, leading to low performance. Figure 4 compares the performance of our unsupervised approach to the supervised baseline, via a precision-recall curve obtained by varying the threshold on the probability of the most likely type. In addition table 4 compares precision and recall at the point of maximum F-score. Our unsupervised approach to event categorization achieves a 14% increase in maximum F1 score over the supervised baseline. Figure 5 plots the maximum F1 score as the amount of training data used by the baseline is varied. It seems likely that with more data, performance will reach that of our approach which does not make use of any annotated events, however our approach both automatically discovers an appropriate set of event types and provides an initial classi? er with minimal e? ort, making it useful as a ? rst step in situations where annotated data is not immediately available. . RANKING EVENTS Simply using frequency to determine which events are signi? cant is insu? cient, because many tweets refer to common events in userÃ¢â‚¬â„¢s daily lives. As an example, users often mention what they are eating for lunch, therefore entities such as McDonalds occur relatively frequently in association with references to most calendar days. Important events can be distinguished as those whi ch have strong association with a unique date as opposed to being spread evenly across days on the calendar. To extract signi? ant events of general interest from Twitter, we thus need some way to measure the strength of association between an entity and a date. In order to measure the association strength between an 0. 8 0. 2 Supervised Baseline TwiCal? Classify 100 200 300 400 tweets. We then added the extracted triples to the dataset used for inferring event types described in Ã‚ §6, and performed 50 iterations of Gibbs sampling for predicting event types on the new data, holding the hidden variables in the original data constant. This streaming approach to inference is similar to that presented by Yao et al. 56]. We then ranked the extracted events as described in Ã‚ §7, and randomly sampled 50 events from the top ranked 100, 500, and 1,000. We annotated the events with 4 separate criteria: 1. Is there a signi? cant event involving the extracted entity which will take place on t he extracted date? 2. Is the most frequently extracted event phrase informative? 3. Is the eventÃ¢â‚¬â„¢s type correctly classi? ed? 4. Are each of (1-3) correct? That is, does the event contain a correct entity, date, event phrase, and type? Note that if (1) is marked as incorrect for a speci? event, subsequent criteria are always marked incorrect. Max F1 0. 4 0. 6 # Training Examples Figure 5: Maximum F1 score of the supervised baseline as the amount of training data is varied. entity and a speci? c date, we utilize the G log likelihood ratio statistic. G2 has been argued to be more appropriate for text analysis tasks than ? 2 [12]. Although FisherÃ¢â‚¬â„¢s Exact test would produce more accurate p-values [34], given the amount of data with which we are working (sample size greater than 1011 ), it proves di? cult to compute FisherÃ¢â‚¬â„¢s Exact Test Statistic, which results in ? ating point over? ow even when using 64-bit operations. The G2 test works su? ciently well in our setti ng, however, as computing association between entities and dates produces less sparse contingency tables than when working with pairs of entities (or words). The G2 test is based on the likelihood ratio between a model in which the entity is conditioned on the date, and a model of independence between entities and date references. For a given entity e and date d this statistic can be computed as follows: G2 = x? {e,Ã‚ ¬e},y? {d,Ã‚ ¬d} 2 8. 2 Baseline To demonstrate the importance of natural language processing and information extraction techniques in extracting informative events, we compare against a simple baseline which does not make use of the Ritter et. al. named entity recognizer or our event recognizer; instead, it considers all 1-4 grams in each tweet as candidate calendar entries, relying on the G2 test to ? lter out phrases which have low association with each date. 8. 3 Results The results of the evaluation are displayed in table 5. The table shows the precision of the systems at di? rent yield levels (number of aggregate events). These are obtained by varying the thresholds in the G2 statistic. Note that the baseline is only comparable to the third column, i. e. , the precision of (entity, date) pairs, since the baseline is not performing event identi? cation and classi? cation. Although in some cases ngrams do correspond to informative calendar entries, the precision of the ngram baseline is extremely low compared wi th our system. In many cases the ngrams donÃ¢â‚¬â„¢t correspond to salient entities related to events; they often consist of single words which are di? ult to interpret, for example Ã¢â‚¬Å"BreakingÃ¢â‚¬ which is part of the movie Ã¢â‚¬Å"Twilight: Breaking DawnÃ¢â‚¬ released on November 18. Although the word Ã¢â‚¬Å"BreakingÃ¢â‚¬ has a strong association with November 18, by itself it is not very informative to present to a user. 7 Our high-con? dence calendar entries are surprisingly high quality. If we limit the data to the 100 highest ranked calendar entries over a two-week date range in the future, the precision of extracted (entity, date) pairs is quite good (90%) Ã¢â‚¬â€œ an 80% increase over the ngram baseline. As expected precision drops as more calendar entries are displayed, but 7 In addition, we notice that the ngram baseline tends to produce many near-duplicate calendar entries, for example: Ã¢â‚¬Å"Twilight BreakingÃ¢â‚¬ , Ã¢â‚¬Å"Breaking DawnÃ¢â‚¬ , and Ã¢â‚¬Å"Twilight Breaking DawnÃ¢â‚¬ . While each of these entries was annotated as correct, it would be problematic to show this many entries describing the same event to a user. Ox,y ? ln Ox,y Ex,y Where Oe,d is the observed fraction of tweets containing both e and d, Oe,Ã‚ ¬d is the observed fraction of tweets containing e, but not d, and so on. Similarly Ee,d is the expected fraction of tweets containing both e and d assuming a model of independence. 8. EXPERIMENTS To estimate the quality of the calendar entries generated using our approach we manually evaluated a sample of the top 100, 500 and 1,000 calendar entries occurring within a 2-week future window of November 3rd. 8. 1 Data For evaluation purposes, we gathered roughly the 100 million most recent tweets on November 3rd 2011 (collected using the Twitter Streaming API6 , and tracking a broad set of temporal keywords, including Ã¢â‚¬Å"todayÃ¢â‚¬ , Ã¢â‚¬Å"tomorrowÃ¢â‚¬ , names of weekdays, months, etc. ). We extracted named entities in addition to event phrases, and temporal expressions from the text of each of the 100M 6 https://dev. twitter. com/docs/streaming-api Mon Nov 7 Justin meet Other Motorola Pro+ kick Product Release Nook Color 2 launch Product Release Eid-ul-Azha celebrated Performance MW3 midnight release Other Tue Nov 8 Paris love Other iPhone holding Product Release Election Day vote Political Event Blue Slide Park listening Music Release Hedley album Music Release Wed Nov 9 EAS test Other The Feds cut o? Other Toca Rivera promoted Performance Alert System test Other Max Day give Other November 2011 Thu Nov 10 Fri Nov 11 Robert Pattinson iPhone show debut Performance Product Release James Murdoch Remembrance Day give evidence open Other Performance RTL-TVI France post play TV Event Other Gotti Live Veterans Day work closed Other Other Bambi Awards Skyrim perform arrives Performance Product Release Sat Nov 12 Sydney perform Other Pullman Ballroom promoted Other Fox ? ght Other Plaza party Party Red Carpet invited Party Sun Nov 13 Playstation answers Product Release Samsung Galaxy Tab launch Product Release Sony answers Product Release Chibi Chibi Burger other Jiexpo Kemayoran promoted TV Event Figure 6: Example future calendar entries extracted by our system for the week of November 7th. Data was collected up to November 5th. For each day, we list the top 5 events including the entity, event phrase, and event type. While there are several errors, the majority of calendar entries are informative, for example: the Muslim holiday eid-ul-azha, the release of several videogames: Modern Warfare 3 (MW3) and Skyrim, in addition to the release of the new playstation 3D display on Nov 13th, and the new iPhone 4S in Hong Kong on Nov 11th. # calendar entries 100 500 1,000 ngram baseline 0. 50 0. 6 0. 44 entity + date 0. 90 0. 66 0. 52 precision event phrase event 0. 86 0. 56 0. 42 type 0. 72 0. 54 0. 40 entity + date + event + type 0. 70 0. 42 0. 32 Table 5: Evaluation of precision at di? erent recall levels (generated by varying the threshold of the G2 statistic). We evaluate the top 100, 500 and 1,000 (entity, date) pairs. In addition we evaluate the precision of the most frequently extracted event phrase, and the predicted event type in association with these calendar entries. Also listed is the fraction of cases where all predictions (Ã¢â‚¬Å"entity + date + event + typeÃ¢â‚¬ ) are correct. We also compare against the precision of a simple ngram baseline which does not make use of our NLP tools. Note that the ngram baseline is only comparable to the entity+date precision (column 3) since it does not include event phrases or types. remains high enough to display to users (in a ranked list). In addition to being less likely to come from extraction errors, highly ranked entity/date pairs are more likely to relate to popular or important events, and are therefore of greater interest to users. In addition we present a sample of extracted future events on a calendar in ? ure 6 in order to give an example of how they might be presented to a user. We present the top 5 entities associated with each date, in addition to the most frequently extracted event phrase, and highest probability event type. 9. RELATED WORK While we are the ? rst to study open domain event extraction within Twitter, there are two key related strands of research: extracting speci? c types of events from Twi tter, and extracting open-domain events from news [43]. Recently there has been much interest in information extraction and event identi? cation within Twitter. Benson et al. 5] use distant supervision to train a relation extractor which identi? es artists and venues mentioned within tweets of users who list their location as New York City. Sakaki et al. [49] train a classi? er to recognize tweets reporting earthquakes in Japan; they demonstrate their system is capable of recognizing almost all earthquakes reported by the Japan Meteorological Agency. Additionally there is recent work on detecting events or tracking topics [29] in Twitter which does not extract structured representations, but has the advantage that it is not limited to a narrow domain. Petrovi? t al. investigate a streaming approach to identic fying Tweets which are the ? rst to report a breaking news story using Locally Sensitive Hash Functions [40]. Becker et al. [3], Popescu et al. [42, 41] and Lin et al. [28] inv estigate discovering clusters of related words or tweets which correspond to events in progress. In contrast to previous work on Twitter event identi? cation, our approach is independent of event type or domain and is thus more widely applicable. Additionally, our work focuses on extracting a calendar of events (including those occurring in the future), extract- . 4 Error Analysis We found 2 main causes for why entity/date pairs were uninformative for display on a calendar, which occur in roughly equal proportion: Segmentation Errors Some extracted Ã¢â‚¬Å"entitiesÃ¢â‚¬ or ngrams donÃ¢â‚¬â„¢t correspond to named entities or are generally uninformative because they are mis-segmented. Examples include Ã¢â‚¬Å"RSVPÃ¢â‚¬ , Ã¢â‚¬Å"BreakingÃ¢â‚¬ and Ã¢â‚¬Å"YikesÃ¢â‚¬ . Weak Association between Entity and Date In some cases, entities are properly segmented, but are uninformative because they are not strongly associated with a speci? c event on the associated date, or are involved in ma ny di? rent events which happen to occur on that day. Examples include locations such as Ã¢â‚¬Å"New YorkÃ¢â‚¬ , and frequently mentioned entities, such as Ã¢â‚¬Å"TwitterÃ¢â‚¬ . ing event-referring expressions and categorizing events into types. Also relevant is work on identifying events [23, 10, 6], and extracting timelines [30] from news articles. 8 Twitter status messages present both unique challenges and opportunities when compared with news articles. TwitterÃ¢â‚¬â„¢s noisy text presents serious challenges for NLP tools. On the other hand, it contains a higher proportion of references to present and future dates. Tweets do not require complex reasoning about relations between events in order to place them on a timeline as is typically necessary in long texts containing narratives [51]. Additionally, unlike News, Tweets often discus mundane events which are not of general interest, so it is crucial to exploit redundancy of information to assess whether an event is signi? cant. Previous work on open-domain information extraction [2, 53, 16] has mostly focused on extracting relations (as opposed to events) from web corpora and has also extracted relations based on verbs. In contrast, this work extracts events, using tools adapted to TwitterÃ¢â‚¬â„¢s noisy text, and extracts event phrases which are often adjectives or nouns, for example: Super Bowl Party on Feb 5th. Finally we note that there has recently been increasing interest in applying NLP techniques to short informal messages such as those found on Twitter. For example, recent work has explored Part of Speech tagging [19], geographical variation in language found on Twitter [13, 14], modeling informal conversations [44, 45, 9], and also applying NLP techniques to help crisis workers with the ? ood of information following natural disasters [35, 27, 36]. 1. ACKNOWLEDGEMENTS The authors would like to thank Luke Zettlemoyer and the anonymous reviewers for helpful feedback on a previous draft. This research was supported in part by NSF grant IIS-0803481 and ONR grant N00014-08-1-0431 and carried out at the University of WashingtonÃ¢â‚¬â„¢s Turing Center. 12. REFERENCES [1] J. Allan, R. Papka, and V . Lavrenko. On-line new event detection and tracking. In SIGIR, 1998. [2] M. Banko, M. J. Cafarella, S. Soderl, M. Broadhead, and O. Etzioni. Open information extraction from the web. In In IJCAI, 2007. [3] H. Becker, M. Naaman, and L. Gravano. Beyond trending topics: Real-world event identi? ation on twitter. In ICWSM, 2011. [4] C. Bejan, M. Titsworth, A. Hickl, and S. Harabagiu. Nonparametric bayesian models for unsupervised event coreference resolution. In NIPS. 2009. [5] E. Benson, A. Haghighi, and R. Barzilay. Event discovery in social media feeds. In ACL, 2011. [6] S. Bethard and J. H. Martin. Identi? cation of event mentions and their semantic class. In EMNLP, 2006. [7] N. Chambers and D. Jurafsky. Template-based information extraction without the templates. In Proceedings of ACL, 2011. [8] N. Chambers, S. Wang, and D. Jurafsky. Classifying temporal relations between events. In ACL, 2007. 9] C. Danescu-Niculescu-Mizil, M. Gamon, and S. Dumais. Mark my words! Linguistic style accommodation in social media. In Proceedings of WWW, pages 745Ã¢â‚¬â€œ754, 2011. [10] A. Das Sarma, A. Jain, and C. Yu. Dynamic relationship and event discovery. In WSDM, 2011. [11] G. Doddington, A. Mitchell, M. Przybocki, L. Ramshaw, S. Strassel, and R. Weischedel. The Automatic Content Extraction (ACE) ProgramÃ¢â‚¬â€œTasks, Data, and Evaluation. LREC, 2004. [12] T. Dunning. Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. , 1993. [13] J. Eisenstein, B. OÃ¢â‚¬â„¢Connor, N. A. Smith, and E. P. Xing. A latent variable model for geographic lexical variation. In EMNLP, 2010. [14] J. Eisenstein, N. A. Smith, and E. P. Xing. Discovering sociolinguistic associations with structured sparsity. In ACL-HLT, 2011. [15] E. Erosheva, S. Fienberg, and J. La? erty. Mixed-membership models of scienti? c publications. PNAS, 2004. [16] A. Fader, S. Soderland, and O. Etzioni. Identifying relations for open information extraction. In EMNLP, 2011. [17] J. R. Finkel, T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In ACL, 2005. [18] E. Gabrilovich, S. Dumais, and E. Horvitz. Newsjunkie: providing personalized newsfeeds via analysis of information novelty. In WWW, 2004. [19] K. Gimpel, N. Schneider, B. OÃ¢â‚¬â„¢Connor, D. Das, D. Mills, J. Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, and N. A. Smith. Part-of-speech tagging 10. CONCLUSIONS We have presented a scalable and open-domain approach to extracting and categorizing events from status messages. We evaluated the quality of these events in a manual evaluation showing a clear improvement in performance over an ngram baseline We proposed a novel approach to categorizing events in an open-domain text genre with unknown types. Our approach based on latent variable models ? rst discovers event types which match the data, which are then used to classify aggregate events without any annotated examples. Because this approach is able to leverage large quantities of unlabeled data, it outperforms a supervised baseline by 14%. A possible avenue for future work is extraction of even richer event representations, while maintaining domain independence. For example: grouping together related entities, classifying entities in relation to their roles in the event, thereby, extracting a frame-based representation of events. A continuously updating demonstration of our system can be viewed at http://statuscalendar. com; Our NLP tools are available at http://github. com/aritter/twitter_nlp. 8 http://newstimeline. googlelabs. com/ [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] for twitter: Annotation, features, and experiments. In ACL, 2011. T. L. Gri? ths and M. Steyvers. Finding scienti? c topics. Proc Natl Acad Sci U S A, 101 Suppl 1, 2004. R. Grishman and B. Sundheim. Message understanding conference Ã¢â‚¬â€œ 6: A brief history. In Proceedings of the International Conference on Computational Linguistics, 1996. Z. Kozareva and E. Hovy. Learning arguments and supertypes of semantic relations using recursive patterns. In ACL, 2010. G. Kumaran and J. Allan. Text classi? cation and named entities for new event detection. In SIGIR, 2004. J. D. La? erty, A. McCallum, and F. C. N. Pereira. Conditional random ? elds: Probabilistic models for segmenting and labeling sequence data. In ICML, 2001. J. H. Lau, K. Grieser, D. Newman, and T. Baldwin. Automatic labelling of topic models. In ACL, 2011. J. Leskovec, L. Backstrom, and J. Kleinberg. Meme-tracking and the dynamics of the news cycle. In KDD, 2009. W. Lewis, R. Munro, and S. Vogel. Crisis mt: Developing a cookbook for mt in crisis situations. In Proceedings of the Sixth Workshop on Statistical Machine Translation, 2011. C. X. Lin, B. Zhao, Q. Mei, and J. Han. PET: a statistical model for popular events tracking in social communities. In KDD, 2010. J. Lin, R. Snow, and W. Morgan. Smoothing techniques for adaptive online language models: Topic tracking in tweet streams. In KDD, 2011. X. Ling and D. S. Weld. Temporal information extraction. In AAAI, 2010. X. Liu, S. Zhang, F. Wei, and M. Zhou. Recognizing named entities in tweets. In ACL, 2011. I. Mani, M. Verhagen, B. Wellner, C. M. Lee, and J. Pustejovsky. Machine learning of temporal relations. In ACL, 2006. I. Mani and G. Wilson. Robust temporal processing of news. In ACL, 2000. R. C. Moore. On log-likelihood-ratios and the signi? cance of rare events. In EMNLP, 2004. R. Munro. Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol. In CoNLL, 2011. G. Neubig, Y. Matsubayashi, M. Hagiwara, and K. Murakami. Safety information mining Ã¢â‚¬â€œ what can NLP do in a disaster -. In IJCNLP, 2011. D. Newman, A. U. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent dirichlet allocation. In NIPS, 2007. D. Newman, J. H. Lau, K. Grieser, and T. Baldwin. Automatic evaluation of topic coherence. In HLT-NAACL, 2010. ? e D. O S? aghdha. Latent variable models of selectional preference. In ACL, ACL Ã¢â‚¬â„¢10, 2010. S. Petrovi? , M. Osborne, and V. Lavrenko. Streaming c ? rst story detection with application to twitter. In HLT-NAACL, 2010. [41] A. -M. Popescu and M. Pennacchiotti. Dancing with the stars, nba games, politics: An exploration of twitter usersÃ¢â‚¬â„¢ response to events. In ICWSM, 2011. [42] A. -M. Popescu, M. Pennacchiotti, and D. A. Paranjpe. Extracting events and event descriptions from twitter. In WWW, 2011. [43] J. Pustejovsky, P. Hanks, R. Sauri, A. See, R. Gaizauskas, A. Setzer, D. Radev, B. Sundheim, D. Day, L. Ferro, and M. Lazo. The TIMEBANK corpus. In Proceedings of Corpus Linguistics 2003, 2003. [44] A. Ritter, C. Cherry, and B. Dolan. Unsupervised modeling of twitter conversations. In HLT-NAACL, 2010. [45] A. Ritter, C. Cherry, and W. B. Dolan. Data-driven response generation in social media. In EMNLP, 2011. [46] A. Ritter, S. Clark, Mausam, and O. Etzioni. Named entity recognition in tweets: An experimental study. EMNLP, 2011. [47] A. Ritter, Mausam, and O. Etzioni. A latent dirichlet allocation method for selectional preferences. In ACL, 2010. [48] K. Roberts and S. M. Harabagiu. Unsupervised learning of selectional restrictions and detection of argument coercions. In EMNLP, 2011. [49] T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In WWW, 2010. [50] R. Saur? R. Knippen, M. Verhagen, and ? , J. Pustejovsky. Evita: a robust event recognizer for qa systems. In HLT-EMNLP, 2005. [51] F. Song and R. Cohen. Tense interpretation in the context of narrative. In Proceedings of the ninth National conference on Arti? cial intelligence Ã¢â‚¬â€œ Volume 1, AAAIÃ¢â‚¬â„¢91, 1991. [52] B. Van Durme and D. Gildea. Topic models for corpus-centric knowledge generalization. In Technical Report TR-946, Department of Computer Science, University of Rochester, Rochester, 2009. [53] D. S. Weld, R. Ho? mann, and F. Wu. Using wikipedia to bootstrap open information extraction. SIGMOD Rec. , 2009. 54] Y. Yang, T. Pierce, and J. Carbonell. A study of retrospective and on-line event detection. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR Ã¢â‚¬â„¢98, 1998. [55] L. Yao, A. Haghighi, S. Riedel, and A. McCallum. Structured relation discovery using generative models. In EMNLP, 2011. [56] L. Ya o, D. Mimno, and A. McCallum. E? cient methods for topic model inference on streaming document collections. In KDD, 2009. [57] F. M. Zanzotto, M. Pennaccchiotti, and K. Tsioutsiouliklis. Linguistic redundancy in twitter. In EMNLP, 2011. How to cite Open Domain Event Extraction from Twitter, Papers

Friday, December 6, 2019

Financial Analysis of Proposed Transaction and Preparation

Question: Discuss about the Financial Analysis of Proposed Transaction and Preparation. Answer: The purpose of this task is to evaluate the financial viability of the transaction that the company is proposing to undertake and preparing information memorandum for management. Theory- Capital budgeting technique for long-term transactions. The Information memorandum is the document defining the work and details of financing. The financial viability of transactions related to long-term projects could be evaluated using different capital budgeting techniques. The techniques of Net Present Value, Internal Rate of Return, payback period etc can be used to evaluate the transactions (Bierman Jr Smidt, 2012). The Information memorandum is prepared so that management could get an overview of the financial viability of the project. Application I plan to use all this technique so that the transaction company enters into is beneficial and useful for both long and short term. In order to provide management the details of the transactions I will prepare a information memorandum providing the details and logic behind the transaction. Task- Research of Industry Sector The purpose of this task is to conduct research of the industry sector. Theory- Comparison of Key financial ratios It is important for the success of the business that management should be aware of its competition and general business environment. The preparation of monthly or weekly financial report and comparing it with the industry and competition is a routine exercise for any successful business. The financial performance of the company can be effectively compared with competitors by calculating key financial ratios. There are few important ratios that reflect the overall conditions of the business and when these ratios are compared with industry and competitors then the financial standing of the company could be ascertained. The important ratios are Current ratio, Debt equity ratio, profitability ratio, return on assets, return on equity etc (Brigham Ehrhardt, 2013). Application I plan to use all these ratios to compare the financial position of the company with the competitors. It is my plan to calculate the ratios on monthly basis and compare it with industry data. This will be helpful because if any corrective measures are required to be taken for improving the performance of the company then it can be immediately undertaken. Task- Preparing financial statement, projected cash flow and income tax projection The purpose of this task is to provide complete financial transaction along with estimates. Theory- Forecasting of financial transactions. The financial statements are the report that reflects the financial position and activities of an enterprise in a structured format. The financial statements are prepared in accordance with the format prescribed by statute. The projected cash flow shows the estimate of cash inflow and cash outflow that the company is expecting. It is important because liquidity is the essential requirement for any business. If the company lacks liquid cash then the functioning of the business might be hampered as a result the company may incur loss. It is mandatory for a company to pay taxes therefore it is prudent to make a projection of tax to be paid so that any penalties could be avoided (Cole Gerhard, 2014). Application I will prepare the financial statements in accordance with applicable format as prescribed by law. In order to prepare a projected cash flow I will prepare a plan of all the cash inflows and cash outflows. In order to prepare a plan I will estimate the overall business activity beforehand. The tax regulations are clear to me so while preparing the estimate I will also prepare a projection of tax that will be helpful to the business. Task- Preparation and proof read of investment management performance, reports and client fees calculation. The purpose of this task is to manage investment efficiently and calculating client fees. Theory- Calculation of IRR using weighted average market value of securities. The market is extremely volatile and it is continuously changing so it is important for an investor to be continuously updated about the market and regulatory changes. There is a theory proposed by the CFA institute that a performance report should at least have rate of return and that should be calculated using weighted average market value of securities (Moten Jr Thron, 2013). The client fees should be calculated using accrual principle. Application I wish to apply the concept of calculating IRR using the market value of investment while preparing the investment performance report. I will also thoroughly proof read the entire report to be sure that there are no mistakes in calculating the performance of investments. I plan to apply the concept of accrual while calculating client fees. Task- Update and maintain spreadsheet for Equity Portfolios: Stock Analysis pages. The purpose of this task is analyze the stock investment by preparing and updating spreadsheets. Theory- Tracking of Investment using excel. The investments activities can be tracked effectively using excel spreadsheets. It can also be used for calculating return, profit or loss using closing price and standard deviations (Walkenbach, 2013). The trackings of investments are important because a loss making investment can cost heavily to the company. Due to the heavy cost that the company has to bear, it becomes such an essential task to track investment effectively. Application I wish to track investment using excel and calculate return on investment. This will help me to always be updated about the money invested in the market. Task-Budget forecast monitoring/ identifies financial status by comparing and analyzing actual sales with plans and forecast. The purpose of the task is to evaluate the financial position of te company by comparing budgeted figures with actual figures. Theory- Budgeting In order to prepare a financial forecast a budget is prepared for internal management purpose. These budgets are prepared on monthly basis for monthly monitoring. The differences between the actual and budgeted figures are calculated and the reasons for such differences are analyzed (Hope Fraser, 2013). Application I wish to apply the same technique for determining the difference between actual and forecasted figures. I will prepare the budget every month so that the reasons for such difference could be analyzed as early as possible. This process will ensure that problems identified are solved and the corrective measures are taken immediately. This task is very important as it will give direction to the company so I look forward to conduct this task with utmost sincerity. Task -Analysis of important benchmarking ratio The purpose of this task is to choose a benchmark and compare the performance of the company with that of benchmark. Theory- Benchmarking and performance analysis. In order to maintain the benchmarking data of the companies the various type of the operations and the income of the company are compared with another company which is considered best according to the industry standards and the selected company is able to compare the results of services (Camp, 2013). This is particularly helpful in financial companies where the data needs to be compared with the best performer in the industry. Application I wish to apply the forecasted ratio of the various types of the financial data on the basis of the ratio calculated in the benchmarking performance of the company and compare the standings of the company based on the present industry standards. Task- Improvement in the financial status and the analysis of the results and continuous monitoring of the variances Theory- Variance Analysis. The various type of the monitoring of the data is based on the graphical and statistical representation is helpful for the purpose of the evaluation of the various types of the changes in the present and the future (Spronk et al., 2016). Application I have used this data for tracking the changes in the financial data are also helpful in the assessment of the future viability of the project and the companies are able to assess the profitability aspect through the simple representations of the graphs. The improvement in the present policies of the project is evaluated based on the changes in the monitoring of the report. Task-Reconciliation of the transactions by comparison and the correcting of the data Theory- Reconciliation Statement. The purpose of the reconciliation of the transactions by the comparison and the correcting of the data are useful for the evaluation of the various type of the graphical representation of the financial data. This data is particularly helpful for the purpose of the comparison of various type of the financial data such as stock market (Shah, 2013). Application I have made use of this technique for reporting and comparison of the various types of the statistical and financial data, which are needed for the purpose of the evaluation of a portfolio of the selected organization I am also able prepare detailed report of the analysis based on the graphical representation of the data. The reconciliation of the transaction is also useful for a quick evaluation of the changes, which are needed to be made in the project and an easy identification of the project problems and works to provide an easy solution. Task - Implementation of Forex knowledge when one price is paid in exchange of another. Theory- Foreign Exchange Management. Foreign exchange is referred to as the process of conversion one currency into another. It is also referred to as the global market in which the currencies are exchanges around the clock (Jacque, 2013). Application I intend to apply the various types of the theory where there is higher amount of hedging cost, which is related to increased currency volatility. I will be also able to assess the various types systematic importance of the big banks, which is involved, with the forecasting of the losses and the potential risk involved with the forex transaction. Although the currency trading is seen as a zero sum policy, still the big banks need to properly manage forex for currency risk hedging. Section B Introduction The industry selected for the purpose of problem evaluation is Insurance industry. The evaluation of the insurance company is based on the report prepared based on the core segments namely General Insurance, Global Life and farmers insurance. The main purpose of the research is to know about the present problem of the company and different issues presently faced by the insurance company. (Cummins Weiss , 2014) The main challenges in the present business situation involves Macroeconomic Trend- The major threat is seen in the dealing with the challenging economic situation as the insurers has to take underwriters margin. In the insurance industry, the major issues are seen in meeting the customer needs with the new expectations this is due to the change in the macroeconomic trends in the industry. The Global financial crisis (GFC) continues to affect the growth of the insurance companies. This mostly evident countries like U.S. and European countries. The state of the economic development is closely related to geopolitical events. The consumers facing the GCF had to control their spending on the insurance policies. (Pwc.com., 2016) Reputational Risk - Some of the other problems of the insurance industry are related to the reputational risk. This is evident due to the aggressive marketing policies for both life and general insurance. In many cases, it has seen that the insurance companies do not provide what was promised at the time of the time of creation of the policy. Sturm, 2013) Availability of cost of capital Some of the main problems in the insurance companies in the insurance companies is mainly evident due to the limited availability of the cash management policies of the companies. In many claim amount needs to be settles quickly especially in cases of overseas payments. In many countries it has been observed that the insurance companies were unable to maintain a suitable allocation of the cost components associated to the commission rates and the service cost for the customer care. This is closely related to the capital budgeting needs (Koijen Yogo, 2014) Attribute that could contribute in terms of adding value to the company its Finance team Adding value to the company With the attributes that I posses, I can contribute to the changing needs of the customers needs. This will lead to an improved customer approach and help the insurance companies to have a better reputation in the market. The attributes will look forward to build a trusted relationship with the customer. The extensive understanding of the macroeconomic environment analysis will further help to assess the present economic situation of the country and know about the spending habits of the people. (Suriadi et al., 2013). Adding value to the finance team The various types of the alternate ideas for solving the current problems of the industry need to be addressed with the allocation of the funds properly into various types of the financial institutions like banks. The company also needs to forecast the requirement of the fund in the various type of the project by using capital budgeting techniques such NPV, IRR and payback period method. This will helps the company to evaluate the various types of the requirement need for long-term financing of the project. The evaluation of the projects based on these techniques helps the company to know about the available amount of free cash flow (Embrechts et al. 2013) Conclusion The various type of the findings of the report shows problems in the insurance industry such as poor customer service, shifting macroeconomic trend and requirement of cost of capital. Based on the various type of the problems identified in the selected industry the companies needs to implement technique such as NPV, payback period and IRR practices which has been implemented on the basis of problem evaluation of the present industry to address the financial problems. The solution provided will ensure that the company will be having sufficient amount of the free cash flow and the company is able to evaluate then different types of the option to make payment to the insurance claims, which were due to the non-availability of the free cash flow during a particular financial year. Hence, the implementation of the above stated technique will significantly improve the performance of the company on both financial basis and operational basis. Reference List Bierman Jr, H., Smidt, S. (2012).The capital budgeting decision: economic analysis of investment projects. Routledge. Brigham, E. F., Ehrhardt, M. C. (2013).Financial management: Theory practice. Cengage Learning. Camp, R. C. (2013).Benchmarking: the search for industry best practices that lead to superior performance. Milwaukee, Wis.: Quality Press; Quality Resources, 1989. Cole, D., Gerhard, L. (2014).U.S. Patent No. 8,768,809. Washington, DC: U.S. Patent and Trademark Office. Cummins, J. D., Weiss, M. A. (2014). Systemic risk and the US insurance sector. Journal of Risk and Insurance, 81(3), 489-528. Embrechts, P., Klppelberg, C., Mikosch, T. (2013). Modelling extremal events: for insurance and finance (Vol. 33). Springer Science Business Media. Hope, J., Fraser, R. (2013).Beyond budgeting: how managers can break free from the annual performance trap. Harvard Business Press. Jacque, L. L. (2013).Management and control of foreign exchange risk. Springer Science Business Media. Koijen, R. S., Yogo, M. (2014). The cost of financial frictions for life insurers. The American Economic Review, 105(1), 445-475. Moten Jr, J. M., Thron, C. (2013). Improvements on Secant Method for Estimating Internal Rate of Return (IRR).International Journal of Applied Mathematics and Statistics,42(12), 84-93. Pwc.com., (2016) Retrieved 31 July 2016, from https://www.pwc.com/us/en/insurance/publications/assets/pwc-top-issues-the-insurance-industry-2016.pdf Shah, P. (2013). Financial Accounting.OUP Catalogue. Spronk, J., Steuer, R. E., Zopounidis, C. (2016). Multicriteria decision aid/analysis in finance. InMultiple Criteria Decision Analysis(pp. 1011-1065). Springer New York. Sturm, P. (2013). Operational and reputational risk in the European banking industry: The market reaction to operational risk events. Journal of Economic Behavior Organization, 85, 191-206. Suriadi, S., Wynn, M. T., Ouyang, C., ter Hofstede, A. H., van Dijk, N. J. (2013, June). Understanding process behaviours in a large insurance company in Australia: A case study. In International Conference on Advanced Information Systems Engineering (pp. 449-464). Springer Berlin Heidelberg. Walkenbach, J. (2013).Excel 2003 bible(Vol. 36). John Wiley Sons.