Thelanguage-based analyses presented in the book enables students to recognize the linguistic features of the four core secondary subjects and to use them in distilling information from scientific texts; seeing "how historical meanings are constructed in recurring patterns related to the goals and purposes of history instruction" (p. 40 Textto speech tools Pricing Features Summary; Play.ht: Paid plan starts at $90/year: Multiple language support. Helpful support team. Good analytics and easy to use fatures. Kukarella: The paid plan starts at $4.99/mo: Offers both the features i.e, text to voice as well as voice to text. Good option for medicine, education and screenwriting TheMaamtrasna Murders: Language, Life and Death in Nineteenth-Century Ireland review Margaret Kelleher studies a man's hanging in context of a legal world oblivious of Irish Creatingtexts; EN2-2A. Plans, composes and reviews a range of texts that are more demanding in terms of topic, audience and language English; Stage 2; A. Communicate through speaking, listening, reading, writing, viewing and representing; EN3-2A. Composes, edits and presents well-structured and coherent texts English; Stage 3 Reviewof Text Structures and Language Features Text Structure: The pattern a author uses to structure the ideas in a text. Language Features: The techniques used to add meaning and interest to their work. Common Language Features Write a recount of an event in your life using 3 WmYKF. Hallo everybody Have you ever reviewed things, movies, songs or something else? If you have not, have you seen a movie review or book review? You can see examples of review text on newspapers that show movies or book reviews, as an illustration of what the Review Text is. Review Text is supposedly the last English lesson of high school level. If you could not make an example of review text, it can be said that you have not passed National Exams, especially for English lessons. You don’t want to be said you could not pass the exam, right? Therefore, in order not to ā€œbe consideredā€ to be failed in the journey during school, let us learn again what and how review text is. Ready? Definition of Review Text Review text is an evaluation of a publication, such as a movie, video game, musical composition, book; a piece of hardware like a car, home appliance, or computer; or an event or performance, such as a live music concert, a play, musical theatre show or dance show. Generic Structure of Review Text Orientation Background information of the text. Evaluations Concluding statement judgement, opinion, or recommendation. It can consist ot more than one. Interpretative Recount Summary of an art works including character and plot. Evaluative Summation The last opinion consisting the appraisal or the punch line of the art works being criticized. In other word Orientation places the work in its general and particular context, often by comparing it with others of its kind or through an analog with a non–art object or event. Interpretive Recount summarize the plot and/or providers an account of how the reviewed rendition of the work came into being Evaluation provides an evaluation of the work and/or its performance or production; is usually recursive Actually, the generic structure of text review does not have to be exactly same as above, perhaps for the reason of ā€œsummarizingā€ the lesson, so the three or four generic structure above just become general description about the structure in review text, okay. Still confused? I also still confused.. šŸ™‚ Okay, let’s just discuss some examples of review text, which is hopefully can understand more about this kind of text. But before we go to the example of review text, let’s discuss its purpose and language features. Purpose of Review Text Review text is used to evaluate / review / critic the events or art works for the reader or listener, such as movies, shows, book, and others. Language Features of Review Text – Present tense. – Using long and complex clauses I just mention those language feature of review text above because those are the main language feature of review text that can be used to identify review text easily. Example of Review Text Example of Review Text Film about The Amazing Spiderman 2 Review of The Amazing Spiderman 2 Introduction I will start by saying that I am a huge fan of Spider-man. I love all the trilogies worked by Raimi yes, even the Spider-man 3 but I do not like the The Amazing Spiderman 1. I was skeptical when I wanted to watch this movie, but I was wrong and I think this second sequel is really great. Unlike its predecessor, this film is full of action, humor, and emotional. Played by the big players, the story is well-written. The action is really spectacular and the final scene makes me satisfied. Evaluation 1 / Interpretation The story begins when Peter Parker Andrew Garfield struggled to maintain his relationship with Gwen Emma Stone after her father’s death. His actions also cause the emergence of a new enemy, Electro, a villain played by James Foxx. Peter also continue to investigate what happened to his father and reunited with his old friend, Harry Osborn. This movie is ended by the death of Gwen that makes the audience will be very emotional and sad. Evaluation 2 However I have to criticize about this film addressed to Paul Giamatti who plays Rhino. His appearance is too over. His acting also does not show that he is a feared villain. It would be a serious problem for the next Spiderman series. So I hope he can improve his acting better than before. Summary Overall, I think this is the best superhero movie since the appearance of The Dark Knight Rises. The script is well-written and convincing. I am sure the next series will be outstanding superhero movie. I recommend this movie to anyone who loves Spider-man or other superhero movies. Example of Review Text Assalamu’alaikum Beijing Review Text of Assalamu’alaikum Beijing Novel Movie title Assalamu’alaikum Beijing Genre Romantic-Religious Director Guntur Soeharjanto Playwritter Asma Nadia Cast Revalina S. Temat, Morgan Oey, Ibnu Jamil, Laudya C. Bella, Desta, Ollyne Apple, Cynthia Ramlan, Jajang C. Noer I really love all the novels written by Asma Nadia. So when the Assalamu’alaikum Beijing novel is filmed , I can hardly wait for the movie in theater. Because it is certainly very good quality film is directed by Guntur Soeharjanto. The film with the tagline ā€œIf you do not find love, let love find youā€. In accordance with the novel title, the film is a lot to discuss religion and love. So it is labeled as romantic religious genre. The film tells the love story that is experienced by Asmara Revalina S. Temat who was broken heart knowing her fiance, Dewa Ibn Jamil had an affair with her friend Anita Cynthia Ramlan just a day before the wedding took place. At the same time, finally Asthma received a job in Beijing due to the help of Sekar Laudya Cynthia Bella. On the way Asma met Zhongwen Morgan Oey. Asma began to open her heart to Zhongwen. However, before continuing their relationship, Asma was diagnosed APS, a syndrome that made her life in danger and could die at any time. Example of Review Text about Film – Film Merry Riana Mimpi Sejuta Dollar Director Hestu Saputra Producer Dhamoo Punjabi, Manoj Punjabi Cast Chelsea Islan, Dion Wiyoko, Kimberly Ryder, Ferry Salim, Niniek L Karim, Sellen Fernandez, Mike Muliyandro, Chyntia Lamusu Studio MD Pictures Released Date December 24, 2014 Duration 105 Minute Country Singapore, Indonesia Orientation Merry Riana is a successful young woman entrepreneur, writer, and motivator. Her life’s story is told in a movie, Merry Riana ā€œMillion Dollar Dreamā€, which is adapted from her book with the same title. This film visualizes her struggle to survive from difficulty of life and become successful woman. Evaluation The violence that happened in Jakarta and other big cities in Indonesia in May 1998 makes Merry Riana forced to flee to Singapore. Merry Riana’s father decided to send his daughter to Singapore because he was afraid of the unsafe condition. She went alone to Singapore with the support money that was only enough to buy food for five days. Fortunately, Merry Riana met with her best friend, Irene, who wanted to go to university there, too. With Irene’s help, Merry could live in a boarding house. She was also accepted in one of the best college there. But, it could only be reached if Merry paid $40,000. The only hope was to take a loan college student that could only be obtained if Merry had a guarantor. Then, Merry met her senior, Alva, who was very reckoning. He gave many requirements before he finally agreed to help Merry. He also had Merry look for side job. Merry realized that she should be successful as soon as possible. She did various work, from spreading online business brochure, until playing with high risky shares. The condition of her economy was moving up and down. Problem of love also occurred when Alva expressed his feeling to Merry. Meanwhile, Merry knew it well that Irene fell in love with Alva. Interpretation The acting of Chelsea Islan Merry Riana in that movie is very good. She can impersonate Merry Riana’s character very well. But, it would be better if there was no kissing scene. Summary I think this is an inspirational movie which can motivate people to be successful at young age. It brings good spirit for young men in Indonesia. The script writer is successful to bring a set of interesting conflicts which make the plot of this movie become alive. Arti dalam Bahasa Indonesia Film Merry Riana Mimpi Sejuta Dollar Sutradara Hestu Saputra Produser Dhamoo Punjabi, Manoj Punjabi Pemeran Chelsea Islan, Dion Wiyoko, Kimberly Ryder, Ferry Salim, Niniek L Karim, Sellen Fernandez, Mike Muliyandro, Chyntia Lamusu Studio MD Pictures Tanggal rilis December 24, 2014 Durasi 105 Menit Negara Singapore, Indonesia Baca juga 2 Procedure Text How To Make Fried Chicken dan Artinya Pengantar Merry Riana adalah pengusaha wanita muda, penulis, dan motivator yang sukses. Kisah hidupnya diceritakan dalam film ā€œMerry Riana Mimpi Sejuta Dolar, yang diadaptasi dari bukunya dengan judul yang sama. Film ini memvisualisasikan bagaimana ia berjuang untuk bertahan dari kesulitan hidup dan menjadi sukses. Evaluasi Kerusuhan yang terjadi di Jakarta dan kota besar lainnya di Indonesia pada Mei 1998 membuat Merry Riana terpaksa mengungsi ke Singapura. Ayah Merry Riana memutuskan untuk mengirimkan anaknya ke Singapura karena takut dengan kondisi yang sedang tidak aman. Merry Riana pergi sendirian dengan bekal uang yang hanya cukup untuk beli makanan selama lima hari. Beruntungnya, ia bertemu dengan sahabatnya, Irene, yang ingin melanjutkan kuliah di universitas yang ada di sana juga. Dengan bantuan Irene, Merry bisa tinggal di asrama dan diterima di salah satu perguruan tinggi terbaik di sana. Tetapi, itu semua baru bisa dapat bila Merry membayar $40,000. Satu-satunya harapan adalah mengambil pinjaman mahasiswa, yang hanya bisa didapat jika Merry memiliki seorang penjamin. Kemudian, Merry bertemu dengan seniornya, Alva. Ia adalah orang yang sangat perhitungan. Ia memberi segala macam syarat sebelum akhirnya setuju untuk menolong Merry. Ia juga menyuruh Meery mencari kerja sambilan. Merry sadar bahwa ia harus sukses secepatnya. Segala macam pekerjaan ia kerjakan, mulai dari menyebar brosur bisnis online, sampai bermain saham beresiko tinggi. Kondisi ekonominya pun naik turun. Kemelut cinta pun terjadi ketika Alva menyatakan perasaan padanya, sementara Merry tahu betul bahwa Irene tengah jatuh cinta pada Alva. Interpretasi Akting Chelsea Islan Merry Riana dalam film ini sangat bagus. Ia mampu memainkan peran sebagai Merry Riana dengan sangat baik. Tetapi, film ini akan menjadi lebih bagus jika tidak ada adegan ciuman. Rangkuman Saya pikir ini adalah film yang inspiratif yang bisa memotivasi orang-orang untuk sukses di usia muda. Hal ini membawa semangat yang baik bagi pemuda-pemuda di Indonesia. Penulis skrip dalam film ini juga berhasil membawa seraingkaian konflik yang membuat jalan cerita menjadi lebih hidup. Example of Review Text – ā€œLove You Like a Love Songā€ Selena Gomez ā€œLove You Like a Love Songā€ is single from one of Disney’s shining stars, Selena Gomez. The young men or women who love this young singer/actress will like this song. Gomez isn’t known for having a super-strong voice or the most original arrangements, but she deserves props for this song, which mercifully tones down the standard synth-pop noise and kicks the vocal performance up a notch. The end result sounds a bit more creative and mature than the rest of the bubblegum-pop pack. Selena’s music is always great, and her voice sounds great especially in the bridge. In the past century people seem to believe that a love song for pop has to be acoustic with guitars, and love songs for Rap/Hip-Hop have to sound the same. This doesn’t seem to bother Rihanna, Lady GaGa, and now Selena Gomez. To be honest in the past century ā€œLove You Like A Love Songā€ has been the most original love song in years. Monotune was perfectly done here, and the Autotune was good layered, Autotune is not just robotic Beyonce and Rihanna use it to. Must original love song and just song in years. Its about loving someone like a love song its gonna use love song cliches. Terjemahannya ā€œLove You Like a Love Songā€ Selena Gomez ā€œLove You Like a Love Songā€ adalah single dari salah satu bintang bersinar Disney, Selena Gomez. Anak-anak muda yang mencintai penyanyi / aktris muda ini akan menyukai lagu ini. Gomez tidak dikenal memiliki suara yang sangat kuat atau pengaturan yang paling orisinil, namun ia pantas menjadi pemeran untuk lagu ini, yang dengan nada penuh kasih menon-aktifkan suara synth-pop standar dan menendang kinerja vokal sampai takik. Hasil akhirnya terdengar sedikit lebih kreatif dan matang dibandingkan dengan paket bubblegum-pop lainnya. Musik Selena selalu bagus, dan suaranya terdengar hebat terutama di jembatan. Pada saat ini orang tampaknya percaya bahwa lagu cinta untuk pop harus akustik dengan gitar, dan lagu cinta untuk Rap / Hip-Hop harus terdengar sama. Sepertinya ini tidak diperdulikan Rihanna, Lady GaGa, dan sekarang Selena Gomez. Sejujurnya masa ini ā€œLove You Like A Love Songā€ telah menjadi lagu cinta paling orisinil selama bertahun-tahun. Monotune sempurna dilakukan di sini, dan Autotune nya bagus dan berlapis, Autotune bukan hanya seperti robot ala Beyonce dan Rihanna yang menggunakannya. Harus lagu cinta orisinal dan nyanyikan lagu hanya dalam beberapa tahun. Its tentang mencintai seseorang seperti lagu cinta yang akan menggunakan lagu cinta klise. Related Articles Report Text ; Definition, Generic Structures, Purposes, Language Features That is the our explanation about Review Text. Hopefully by reading our explanation above you can get more understanding about this material. Okay, I think that’s all, thanks for your visit. If you have any questions or comments regarding this material please leave a comment . Reference Rudi Hartono, Genre of Texts, Semarang English Department Faculty of Language and Art Semarang State University, 2005. Mark Andersons and Kathy Andersons, Text Type in English 1-2, Australia MacMillanEducation, 2003. Terima kasih atas kunjungannya. Semoga dengan berkunjung di website British Course ini sobat bisa makin cinta bahasa inggris, dan nilai bahasa inggris sobat semakin memuaskan. Dan semoga kita bisa belajar bahasa inggris bareng dan saling mengenal. Komentar, saran dan kritik dari sobat kami harapkan demi kemajuan website ini. Thanks.. Expression in a literature review should be informative and evaluative. Apart from incorporating reporting verbs, you will need to use evaluative and cautious verbsA key language feature of a literature review is the use of reporting verbs. These types of verbs describe and report on the literature under review. They report onaims investigates, examines, looks atresults shows, suggests, revealsopinions states, believes, arguesThe choice of reporting verbs indicates your perspectives and attitudes towards the research under review. That is the reporting verbs chosen show whether you are neutral, negative or positive about the sentence pattern of placing reporting verbs is [reporting verb] + either/both [object] / [complement].Evaluative and cautious languageYou can show your perspective on the literature under review by using evaluative language. Evaluative language can indicate whether you’re positive or negative towards the claims in the literature, whether you agree or disagree with the claims language is careful not to express absolute certainty where there may be the possibility of language can bepositive expressions like ā€œeffective,ā€ ā€œnecessary,ā€ ā€œsignificantā€ or ā€œcrucialā€negative ā€œquestionable,ā€ unclear,ā€ ā€œinconclusiveā€ or evaluation Wright’s 2022 argument about the link between parental numeracy and that of their children is conclusively borne out in the evaluation Whether the statistical results support Torney and Wittings’ 2021 hypothesis is way to express certainty or hesitancy is to use boosters and are words or phrases that express confidence or certaintyhedges convey a qualified uncertainty in the claims made in the on the tabs below to view examples of evaluative language, boosters and hedges. ReferencesBailey, Stephen. 2015. The Essentials of Academic Writing for International Students. Taylor & Francis . & Volpe, M. 2012. Completing your qualitative dissertation A road map from beginning to end 2nd ed.. Sage & Ruth, R. 2018. Writing the Literature Review A Practical Guide. Guilford B. & Thomson, P. 2006. Helping Doctoral Students Write. Pedagogies for Supervision. K. E., & Newton, R. R. 1992. Surviving your dissertation A comprehensive guide to content and process. Sage Publications. In linguistics, the term text refers to The original words of something written, printed, or spoken, in contrast to a summary or paraphrase. A coherent stretch of language that may be regarded as an object of critical analysis. Text linguistics refers to a form of discourse analysis—a method of studying written or spoken language—that is concerned with the description and analysis of extended texts those beyond the level of the single sentence. A text can be any example of written or spoken language, from something as complex as a book or legal document to something as simple as the body of an email or the words on the back of a cereal box. In the humanities, different fields of study concern themselves with different forms of texts. Literary theorists, for example, focus primarily on literary texts—novels, essays, stories, and poems. Legal scholars focus on legal texts such as laws, contracts, decrees, and regulations. Cultural theorists work with a wide variety of texts, including those that may not typically be the subject of studies, such as advertisements, signage, instruction manuals, and other ephemera. Text Definition Traditionally, a text is understood to be a piece of written or spoken material in its primary form as opposed to a paraphrase or summary. A text is any stretch of language that can be understood in context. It may be as simple as 1-2 words such as a stop sign or as complex as a novel. Any sequence of sentences that belong together can be considered a text. Text refers to content rather than form; for example, if you were talking about the text of "Don Quixote," you would be referring to the words in the book, not the physical book itself. Information related to a text, and often printed alongside it—such as an author's name, the publisher, the date of publication, etc.—is known as paratext. The idea of what constitutes a text has evolved over time. In recent years, the dynamics of technology—especially social media—have expanded the notion of the text to include symbols such as emoticons and emojis. A sociologist studying teenage communication, for example, might refer to texts that combine traditional language and graphic symbols. Texts and New Technologies The concept of the text is not a stable one. It is always changing as the technologies for publishing and disseminating texts evolve. In the past, texts were usually presented as printed matter in bound volumes such as pamphlets or books. Today, however, people are more likely to encounter texts in digital space, where the materials are becoming "more fluid," according to linguists David Barton and Carmen Lee " Texts can no longer be thought of as relatively fixed and stable. They are more fluid with the changing affordances of new media. In addition, they are becoming increasingly multimodal and interactive. Links between texts are complex online, and intertextuality is common in online texts as people draw upon and play with other texts available on the web." An example of such intertextuality can be found in any popular news story. An article in The New York Times, for example, may contain embedded tweets from Twitter, links to outside articles, or links to primary sources such as press releases or other documents. With a text such as this, it is sometimes difficult to describe what exactly is part of the text and what is not. An embedded tweet, for instance, may be essential to understanding the text around it—and therefore part of the text itself—but it is also its own independent text. On social media sites such as Facebook and Twitter, as well as blogs and Wikipedia, it is common to find such relationships between texts. Text linguistics is a field of study where texts are treated as communication systems. The analysis deals with stretches of language beyond the single sentence and focuses particularly on context, information that goes along with what is said and written. Context includes such things as the social relationship between two speakers or correspondents, the place where communication occurs, and non-verbal information such as body language. Linguists use this contextual information to describe the "socio-cultural environment" in which a text exists. Sources Barton, David, and Carmen Lee. "Language Online Investigating Digital Texts and Practices." Routledge, Ronald, and Michael McCarthy. "Cambridge Grammar of English." Cambridge University Press, Marvin K. L., et al. "Linguistic Perspectives on Literature." Routledge, 2015. Due to the development of e-commerce and web technology, most of online Merchant sites are able to write comments about purchasing products for customer. Customer reviews expressed opinion about products or services which are collectively referred to as customer feedback data. Opinion extraction about products from customer reviews is becoming an interesting area of research and it is motivated to develop an automatic opinion mining application for users. Therefore, efficient method and techniques are needed to extract opinions from reviews. In this paper, we proposed a novel idea to find opinion words or phrases for each feature from customer reviews in an efficient way. Our focus in this paper is to get the patterns of opinion words/phrases about the feature of product from the review text through adjective, adverb, verb, and noun. The extracted features and opinions are useful for generating a meaningful summary that can provide significant informative resource to help the user as well as merchants to track the most suitable choice of IntroductionMuch of the existing research on textual information processing has been focused on mining and retrieval of factual information. Little works had been done on the process of mining opinions until only recently. Automatic extraction of customers’ opinions can better benefit both customers and manufacturers. Product review mining can provide effective information that are classifying customer reviews as ā€œrecommendedā€ or ā€œnot recommendedā€ based on customers’ opinions for each product feature. In this cases, customer reviews highlight opinion about product features from various Merchant sites. However, many reviews are so long and only a few sentences contain opinions for product a popular product, the number of reviews can be in hundreds or even in thousands, which is difficult to be read one by one. Therefore, automatic extraction and summarization of opinion are required for each feature. Actually, when a user expresses opinion for a product, he/she states about the product as a whole or about its features one by one. Feature identification in product is the first step of opinion mining application and opinion words extraction is the second step which is critical to generate a useful summary by classifying polarity of opinion for each feature. Therefore, we have to extract opinion for each feature of a this paper, we take a written review as input and produce a summary review as output. Given a set of customer reviews of a particular product, we need to perform the following tasks1identifying product feature that customer commented on;2extracting opinion words or phrases through adjective, adverb, verb, and noun and determining the orientation;3generating the use a part-of-speech tagger to identify phrases in the input text that contains adjective or adverb or verb or nouns as opinion phrases. A phrase has a positive semantic orientation when it has good associations ā€œawesome cameraā€ and a negative semantic orientation when it has bad associations ā€œlow batteryā€.The rest of the paper is organized as follows. Section 2 describes the related work of this paper. Section 3 elaborates theoretical background for opinion mining. Section 4 expresses methodology and experiments of the system and Section 5 describes are several techniques to perform opinion mining tasks. In this section, we discuss others’ related works for feature extraction and opinion words extraction. Hu and Liu [1] proposed several methods to analyze customer reviews of format 3. They perform the same tasks of identifying product features on which customers have expressed their opinions and determining whether the opinions are positive or negative. However, their techniques, which are primarily based on unsupervised item sets mining or association rule mining, are only suitable for reviews of formats 3 and 1 to extract product features. Then, frequent item sets of nouns in reviews are likely to be product features while the infrequent ones are less likely to be product features. This work also introduced the idea of using opinion words to find additional often infrequent of these formats usually consist of full sentences. The techniques are not suitable for pros and cons of format 2, which are very brief. Liu et al. [2] presented how to extract product features from ā€œProsā€ and ā€œConsā€ as type of review format 2. They proposed a supervised pattern mining method to find language patterns to identify product features. They do not need to determine opinion orientations because of using review format 2 indicated by ā€œProsā€ and ā€œCons.ā€Hu and Liu [3] proposed a number of techniques based on data mining and natural language processing methods to mine opinion/product features. It is mainly related to text summarization and terminology identification. Their system does not mine product features and their work does not need a training corpus to build a summary. Su et al. [4] proposed a novel mutual reinforcement approach to deal with the feature-level opinion mining problem. Their approach predicted opinions relating to different product features without the explicit appearance of product feature words in reviews. They aim to mine the hidden sentiment link between product features and opinion words and then build the association approach for mining product feature and opinion based on consideration of syntactic information and semantic information in [5]. The methods acquire relations based on fixed position of words. However, the approaches are not effective for many cases. Turney [6] presented a simple unsupervised learning algorithm for classifying reviews as recommended thumbs up or not recommended thumbs down. The classification of a review is predicted by the average semantic orientation of the phrases in the review that contains adjectives or adverbs. Wu et al. [7] implemented extracting relations between product feature and expressions of opinions. The relation extraction is an important subtask of opinion mining for the relations between more than one product features and different opinion words on each of and Lam [8, 9] employ hidden Markov models and conditional random fields, respectively, as the underlying learning method for extracting product features. Pang et al. [10], Mras and Carroll [11], and Gamon [12] use the data of movie review, customer feedback review, and product review. They use the several statistical feature selection methods and directly apply the machine learning techniques. These experiments show that machine learning techniques only are not well performing on sentiment classification. They show that the presence or absence of word seems to be more indicative of the content rather than the frequency for a word. Zhang and Liu [13] aimed to identify such opinionated noun features. Their involved sentences are also objective sentences but imply positive or negative opinions. They proposed a method to deal with the problem for finding product features which are nouns or noun phrases that are not subjective but Mining Opinion for Feature LevelIn this paper, we only focus on mining opinions for feature level. This task is not only technically challenging because of the need for natural language processing, but also very useful in practice. For example, businesses always want to find public or consumer opinions about their products and services from the commercial web sites. Potential customers also want to know the opinions of existing users before they use a service or purchase a product. Moreover, opinion mining can also provide valuable information for placing advertisements in commercial web pages. If in a page people express positive opinions or sentiments on a product, it may be a good idea to place an ad of the product. However, if people express negative opinions about the product, it is probably not wise to place an ad of the product. A better idea may be to place an ad of a competitor’s are three main review formats on the Web. Different review formats may need different techniques to perform the opinion extraction 1—pros and cons The reviewer is asked to describe pros and cons 2—pros, cons, and detailed review the reviewer is asked to describe pros and cons separately and also write a detailed 3—free format the reviewer can write freely, that is, no separation of pros and the review formats 1 and 2, opinion or semantic orientations positive or negative of the features are known because pros and cons are separated. Only product features need to be identified. We concentrate on review format 3 and we need to identify and extract both product features and opinions. This task goes to the sentence level to discover details, that is, what aspects of an object that people liked or disliked. The object could be a product, a service, a topic, an individual, an organization, and so forth. For instance, in a product review sentence, it identifies product features that have been commented on by the reviewer and determines whether the comments are positive or negative. For example, in the sentence, ā€œThe battery life of this camera is too short,ā€ the comment is on ā€œbattery lifeā€ of the camera object and the opinion is real-life applications require this level of detailed analysis because, in order to make product improvements, one needs to know what components and/or features of the product are liked and disliked by consumers. Such information is not discovered by sentiment and subjectivity classification [14]. To obtain such detailed aspects, we need to go to the sentence level. Two tasks are apparent.1Identifying and extracting features of the product that the reviewers have expressed their opinions on, called product features for instance, in the sentence ā€œthe picture quality of this camera is amazing,ā€ the product feature is ā€œpicture quality.ā€2Determining whether the opinions on the features are positive, negative or neutral. In the above sentence, the opinion on the feature ā€œpicture qualityā€ is the sentence, ā€œthe battery life of this camera is too short,ā€ the comment is on the ā€œbattery lifeā€ and the opinion is negative. A structured summary will also be produced from the mining Methodology to Find Patterns for Features and Opinions ExtractionThe goal of OM is to extract customer feedback data such as opinions on products and present information in the most effective way that serves the chosen objectives. Customers express their opinion in review sentences with single words or phrases. We need to extract these opinion words or phrases in efficient way. Pattern extraction approach is useful for commercial web pages in which customers can be able to write comments about products or services. Let us use an example of the following review sentence ā€œThe battery life is long.ā€In this sentence, the feature is ā€œbattery lifeā€ and opinion word is ā€œlong.ā€ Therefore, we first need to identify the feature and opinion from the 1 shows the overall process for generating the results of feature-based opinion summarization. The system input is customer reviews’ datasets. We first need to perform POS tagging to parse the sentence and then identify product features and opinion words. The extracted opinion words/phrases are used to determine the opinion orientation which is positive or negative. Finally, we summarize the opinion for each product feature based on their this paper, we focus on feature extraction and opinion word extraction to provide opinion summarization. In feature extraction phase, we need to perform part-of-speech tagging to identify nouns/noun phrases from the reviews that can be product features. Nouns and noun phrases are most likely to be product tagging is important as it allow us to generate general language patterns. We use Stanford-POS tagger to parse each sentence and yield the part-of-speech tag of each word whether the word is a noun, adjective, verb, adverb, etc. and identify simple noun and verb groups syntactic chunking, for instance,The_DT photo_JJ quality_NN is_VBZ amazing_JJ and_CC i_FW know_VBP i_FW m_VBP going_VBG to_TO have_VB fun_NN with_IN all_PDT the_DT POS tagging is done, we need to extract features that are nouns or noun phrases using the pattern knowledge see Table 1. And then, we focus on identifying domain product features that are talked about by customers by using the manually tagged training corpus for domain opinion words extraction, we used extracted features that are used to find the nearest opinion words with adjective/adverb. To decide the opinion orientation of each sentence, we need to perform three subtasks. First, a set of opinion words adjectives, as they are normally used to express opinions is identified. If an adjective appears near a product feature in a sentence, then it is regarded as an opinion word. We can extract opinion words from the review using the extracted features, for instance;The strap is horrible and gets in the way of parts the camera you need access nearly 800 pictures I have found that this camera takes incredible comes with a rechargeable battery that does not seem to last all that long, especially if you use the flash a the first sentence, the feature, strap, is near the opinion word horrible. And in the second example, feature ā€œpictureā€ is close to the opinion word incredible. We found that opinion words/phrases are mainly adjective/adverb that is used to qualify product features with nouns/noun phrases. In this case, we can extract the nearby adjective as opinion word if the sentences contain any features. However, for the third sentence, the feature, battery, cannot be able to extract nearby adjective to meet the opinion word ā€œlong.ā€ The nearby adjective ā€œrechargeableā€ dose not bear opinion for the feature ā€œbattery.ā€Moreover, both adjective and adverb are good indicators of subjectivity and opinions. Therefore, we need to extract phrases containing adjective, adverb, verb, and noun that imply opinion. We also consider some verbs like, recommend, prefer, appreciate, dislike, and love as opinion words. Some adverbs like not, always, really, never, overall, absolutely, highly, and well are also considered. Therefore, we extract two or three consecutive words from the POS-tagged review if their tag conforms to any of the patterns. We collect all opinionated phrases of mostly 2/3 words like adjective, noun, adjective, noun, noun, adverb, adjective, adverb, adjective, noun, verb, noun, and so forth from the processed POS-tagged resulting patterns are used to match and identify opinion phrases for new reviews after the POS tagging. However, there are more likely opinion words/phrases in the sentence but they are not extracted by any patterns. From these extracted patterns, most of adjectives or adverbs imply opinion for the nearest nouns/noun phrases. Table 2 described some examples of opinion Dataset of the SystemWe used annotated customer reviews’ data set of 5 products for testing. All the reviews are from commercial web sites such as and Each review consists of review title and detail of review text. The reviews are retagged manually based on our own feature list. Each camera review sentence is attached with the mentioned features and their associated opinion words. Therefore, we only focus on the review sentences that contain opinions for product features, for instance, ā€œThe pictures are absolutely amazing—the camera captures the minutest of details.ā€ This sentence will receive the tag picture [+3]. Words in the brackets are those we found to be associated with the corresponding opinion orientation of feature whether positive or negative see Table 3. ExperimentsWe carried out the experiments using customer reviews of 5 electronic products two digital cameras, one DVD player, one MP3 player, and one cellular phone. All the reviews are extracted from All of them are used as the training data to mine patterns. These patterns are then used to extract product features from test reviews of these products. We now evaluate the proposed automatic technique to see how effective it is in identifying product features and opinions from customer reviews. In this paper, we only verify only product features but we make sentiment orientation of opinion on that features as an ongoing process. The effectiveness of the proposed system has been verified with review set on these five different electronic products. All the results generated by our system are compared with the manually tagged results. We also assess the time saved by semiautomatic tagging over manual tagging. We showed the comparison results with Hu and Liu’s approach and our approach is slightly higher than their results in Table ConclusionMost of opinion mining researches use a number of techniques for mining opinion and summarizing opinions based on features in product reviews based on data mining and natural language processing methods. Review text is unstructured and only a portion or some sentences include opinion-oriented words. In product reviews, users write comments about features of products to describe their views according to their experience and observations. The first step of opinion mining in classifying reviews’ documents is extracting features and opinion words. Therefore opinion mining system needs only the required sentences to be processed to get knowledge efficiently and effectively. We proposed the ideas to extract patterns of features and/or opinion phrases. We showed results of experiments with extracting pattern knowledge based on linguistic rule. We expected to achieve good results by extracting features and opinion-oriented words from review text with help of adjectives, adverbs, nouns, and verbs. We believe that there is rich potential for future research. For identifying feature, we need to extend both explicit and implicit feature as our future work because both of these features are useful for providing more accurate results in determining the polarity of product/feature before summarizing them, rather than explicit feature Hu and B. Liu, ā€œMining and summarizing customer reviews,ā€ in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD '04, pp. 168–177, August at Google ScholarB. Liu, M. Hu, and J. Cheng, ā€œOpinion observer analyzing and comparing opinions on the web,ā€ in Proceedings of the International World Wide Web Conference Committee IW3C2 '05, pp. 10–14, Chiba, Japan, May at Google ScholarM. Hu and B. Liu, ā€œMining opinion features in customer reviews,ā€ in Proceedings of the 19th International Conference on Artifical Intelligence AAAI '04, pp. 755–760, at Google ScholarQ. Su, X. Xu, H. Guo et al., ā€œHidden sentiment association in Chinese web opinion mining,ā€ in Proceedings of the 17th International Conference on World Wide Web WWW '08, pp. 959–968, April at Publisher Site Google ScholarG. Somprasertsri and P. Lalitrojwong, ā€œMining feature-opinion in online customer reviews for opinion summarization,ā€ Journal of Universal Computer Science, vol. 16, no. 6, pp. 938–955, at Google ScholarP. D. Turney, ā€œThumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews,ā€ in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL '02, at Google ScholarY. Wu, Q. Zhang, X. Huang, and L. Wu, ā€œPhrase dependency parsing for opinion mining,ā€ in Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP '09, pp. 1533–1541, August at Google Wong and W. Lam, ā€œHot item mining and summarization from multiple auction Web sites,ā€ in Proceedings of the 5th IEEE International Conference on Data Mining ICDM '05, pp. 797–800, Houston, Tex, USA, November at Publisher Site Google Wong and W. Lam, ā€œLearning to extract and summarize hot item features from multiple auction web sites,ā€ Knowledge and Information Systems, vol. 14, no. 2, pp. 143–160, at Publisher Site Google ScholarB. Pang, L. Lee, and S. Vaithyanathan, ā€œThumbs up? Sentiment classification using machine learning techniques,ā€ in Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP '12, pp. 79–86, Association for Computational Linguistics, Philadelphia, Pa, USA, July at Google ScholarR. Mras and J. Carroll, A comparison of machine learning techniques applied to sentiment classification [ thesis], University of Sussex, Brighton, UK, Gamon, ā€œSentiment classification on customer feedback data,ā€ in Proceedings of the 20th international conference on Computational Linguistics, p. 841, Association for Computational Linguistics, Morristown, NJ, USA, at Google ScholarL. Zhang and B. Liu, ā€œIdentifying noun product features that imply opinions,ā€ in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics Human Language Technologies ACL-HLT '11, pp. 575–580, June at Google ScholarB. Liu, ā€œSentiment Analysis and Subjectivity,ā€ in A Chapter in Handbook of Natural Language Processing, 2nd at Google ScholarCopyright Ā© 2013 Su Su Htay and Khin Thidar Lynn. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. With the increase in users of social media websites such as IMDb, a movie website, and the rise of publicly available data, opinion mining is more accessible than ever. In the research field of language understanding, categorization of movie reviews can be challenging because human language is complex, leading to scenarios where connotation words exist. Connotation words have a different meaning than their literal meanings. While representing a word, the context in which the word is used changes the semantics of words. In this research work, categorizing movie reviews with good F-Measure scores has been investigated with Word2Vec and three different aspects of proposed features have been inspected. First, psychological features are extracted from reviews positive emotion, negative emotion, anger, sadness, clout confidence level and dictionary words. Second, readablility features are extracted; the Automated Readability Index ARI, the Coleman Liau Index CLI and Word Count WC are calculated to measure the review’s understandability score and their impact on review classification performance is measured. Lastly, linguistic features are also extracted from reviews adjectives and adverbs. The Word2Vec model is trained on collecting 50,000 reviews related to movies. A self-trained Word2Vec model is used for the contextualized embedding of words into vectors with 50, 100, 150 and 300 pretrained Word2Vec model converts words into vectors with 150 and 300 dimensions. Traditional and advanced machine-learning ML algorithms are applied and evaluated according to performance measures accuracy, precision, recall and F-Measure. The results indicate Support Vector Machine SVM using self-trained Word2Vec achieved 86% F-Measure and using psychological, linguistic and readability features with concatenation of Word2Vec features SVM achieved may be subject to copyright. Discover the world's research25+ million members160+ million publication billion citationsJoin for free Citation Khan, Rizwan, A.;Faisal, Ahmad, T.; Khan, G. Identification of ReviewHelpfulness Using Novel Textual andLanguage-Context 2022,10, 3260. Editors Nebojsa Bacaninand Catalin StoeanReceived 15 August 2022Accepted 5 September 2022Published 7 September 2022Publisher’s Note MDPI stays neutralwith regard to jurisdictional claims inpublished maps and institutional Ā© 2022 by the MDPI, Basel, article is an open access articledistributed under the terms andconditions of the Creative CommonsAttribution CC BY license of Review Helpfulness Using Novel Textual andLanguage-Context FeaturesMuhammad Shehrayar Khan 1, Atif Rizwan 2, Muhammad Shahzad Faisal 1, Tahir Ahmad 1,Muhammad Saleem Khan 1and Ghada Atteia 3,*1Department of Computer Science, COMSATS University Islamabad, Attock Campus,Islamabad 43600, Pakistan2Department of Computer Engineering, Jeju National University, Jeju-si 63243, Korea3Department of Information Technology, College of Computer and Information Sciences, Princess Nourah BintAbdulrahman University, Box 84428, Riyadh 11671, Saudi Arabia*Correspondence geatteiaallah the increase in users of social media websites such as IMDb, a movie website, andthe rise of publicly available data, opinion mining is more accessible than ever. In the research fieldof language understanding, categorization of movie reviews can be challenging because humanlanguage is complex, leading to scenarios where connotation words exist. Connotation words havea different meaning than their literal meanings. While representing a word, the context in whichthe word is used changes the semantics of words. In this research work, categorizing movie reviewswith good F-Measure scores has been investigated with Word2Vec and three different aspects ofproposed features have been inspected. First, psychological features are extracted from reviewspositive emotion, negative emotion, anger, sadness, clout confidence level and dictionary readablility features are extracted; the Automated Readability Index ARI, the ColemanLiau Index CLI and Word Count WC are calculated to measure the review’s understandabilityscore and their impact on review classification performance is measured. Lastly, linguistic featuresare also extracted from reviews adjectives and adverbs. The Word2Vec model is trained on collecting50,000 reviews related to movies. A self-trained Word2Vec model is used for the contextualizedembedding of words into vectors with 50, 100, 150 and 300 pretrained Word2Vecmodel converts words into vectors with 150 and 300 dimensions. Traditional and advanced machine-learning ML algorithms are applied and evaluated according to performance measures accuracy,precision, recall and F-Measure. The results indicate Support Vector Machine SVM using self-trainedWord2Vec achieved 86% F-Measure and using psychological, linguistic and readability features withconcatenation of Word2Vec features SVM achieved neural network; Word2Vec; Natural Language Processing; sentiment classificationMSC 68T50; 68T071. IntroductionSentiment analysis is also known as opinion mining. The Natural Language ProcessingNLP technique is used to identify the sentiment polarity of textual data. It is one of thefamous research areas in NLP topics. People’s attitudes and thoughts about any movie,events or issue are analyzed with sentiment analysis of reviews. Sentiment analysis ofreviews classifies the review as having a positive or negative polarity that helps the userdecide about a product or any movie. While large volumes of opinion data can provide anin-depth understanding of overall sentiment, they require a lot of time to process. Not onlyis it time-consuming and challenging to review large quantities of texts, but some textsmight also be long and complex, expressing reasoning for different sentiments, making itchallenging to understand overall sentiment quickly once a new kind of communicationMathematics 2022,10, 3260. Mathematics 2022,10, 3260 2 of 20has been started between a customer and a service provider. People share their opinionabout services through websites. Usually, online products have thousands of reviews, andit is very difficult for the customers to read every review. Excessive and improper use ofsentiment in reviews makes them unclear concerning a product and it becomes difficult forcustomers to make the right decision. This entailed a Few-Shot Learner novel approachapplied for NLP tasks, including review sentiments, but focusing less on the impact ofinfluential textual features [1]. In this scenario, sentiment-based review classification isa challenging research problem. Sentiment analysis is a hot topic due to its applicationsquality improvement in products or services, recommendation systems, decision makingand marketing research [2]. The major contributions in the research are as follows•The proposed psychological features are positive emotion, negative emotion, anger,sadness, clout confidence level and dictionary words.•The readability features extracted according to Automated Readability Index ARI,Coleman Liau Index CLI and Word Count WC are calculated to measure thereview’s understandability score.• The linguistic features extracted are adjectives and adverbs.•The psychological, readability and linguistic features are concatenated with Word2Vecfeatures to train the machine-learning methods have been used to investigate data and convert raw data intovaluable data. One of the applications of computing is NLP [3,4]. Many advanced algo-rithms and novel approaches have improved sentiment classification performance, butmore productive results can be achieved if helpful textual reviews are used for sentimentclassification. New features are adverbs and adjectives in terms of sentiment classifica-tion [5,6], describing the author’s sentiments. The clout feature defines the confidence ofthe review written by the author. The review length feature determines the information thata review has and the readability feature defines how much information can be understoodor absorbed by the user. The readability feature also determines the complexity of anyreview for the reviews are short in length, representing opinions about products or a review given by a user has an important role in the promotion of a movie [7].Most people generally search for information about a movie on famous websites such asIMDb, a collection of thousands of movies that stores data about a movie’s crew, reviewsby different users, cast and ratings. Hence, surely it is not the only way to bring people tocinemas. In this regard, reviews also have an important analysis makes opinion summary in movie reviews easier by extractingsentiment given in the review by the reviewer [8]. Sentiment analysis of movie reviewsnormally includes preprocessing [9] and the feature-extraction method with appropriateselection [10], classification and evaluation of results. Preprocessing includes convertingall the capitalized words into lower-case words due to case sensitivity, stopping wordremoval and removing special characters that are preprocessed for classification. Differentfeature-extraction methods are used to extract features from the review of a movie orproduct [11]. Most feature-extraction methods are related to lexicon and statistical-basedapproaches. In statistical feature-extraction methods, the multiple words that exist inreviews represent a feature by measuring the different weighing calculations like InverseDocument Frequency IDF, Term Frequency TF and Term Frequency–Inverse DocumentFrequencyTF-IDF [12,13]. In the feature- extraction method lexicon, the extraction oftextual features from the pattern derived among the words is derived from the partsof speech of words tag [14]. The method based on lexicon extracts the semantics fromthe review by focusing on text ordering in sentiment analysis, short text and keywordclassification. The emotions using short text are written on social networking sites whichhave become popular. Emotions used in the review on social networking sites includeanxiety, happiness, fear, analysis of the IMDb movie review website finds the general perspectiveof review for emotions exhibited by a reviewer concerning a movie. Most researchers Mathematics 2022,10, 3260 3 of 20are working on differentiating positive and negative reviews. In the proposed work, acontextualized word-embedding technique is used Word2Vec. It is trained on fifty thousandreviews given by IMDb movie users. The qualitative features extracted using Word2Vecthat involves pretraining and the quantitative features are extracted from LIWC withoutpretraining. Experiments on vector features with different dimensions using the Skip-Gram Method are performed and LIWC extracts the quantitative linguistic features andpsychological features. The psychological features include positive emotion, negativeemotion, anger, sadness and clout, which measure confidence level from the reviews. Thereadability features include ARI, CLI and WC. Linguistic features include adjectives statistical and lexicon-based methods extract features to increase the model’saccuracy. When the features are extracted from the reviews, different feature selectiontechniques are applied to the features that help extract helpful features and eliminate thefeatures that do not contribute to the effectiveness of the classification of sentiment analysisof reviews [15,16]. The classification of sentiments of reviews defines the polarity of reviewsand classifies them as positive or negative. ML and lexical-based methods were used forsentiment analysis. ML methods have achieved high performance in academia as wellas in industry. It is a fact that ML algorithms make the classification performance able toachieve high performance, but data quality is important as well. Data quality can limit theperformance of any ML algorithm regardless of how much data are used to train the modelof the ML classifiers [17].2. Related WorkThere are two types of user reviews high-quality and low-quality. A high-qualityreview helps to participate in decision making, while a low-quality one reduces helpfulnessconcerning serving users. That is the reason it is necessary to consider the quality of reviewsfor large data identify the quality of reviews, many researchers consider high-quality reviewsand their helpfulness. Ordinal Logistic Regression OLR is applied to application reviewsfrom Amazon and Google Play with the feature of review length [18]. The Tobit regressionanalysis model has been applied to the dataset of TripAdvisor and Amazon book reviewsusing features of review length and word count [19]. The IMDb movie review dataset isselected for this research and serves as the dataset for sentiment classification. Multipletextual features are extracted using the Word2Vec model trained on reviews and LIWC inthis research helps to improve the classification performance of performance of sentiment analysis has been improved gradually with time byfocusing on advanced ML algorithms, novel approaches and DL algorithms. Details aregiven in brief in Table 1, describing the number of papers that achieved the best performanceconcerning review sentiments using advanced DL algorithm CNN-BLSTM was applied to the dataset of IMDb reviews andcompared with experiments on single CNN and BLSTM performance. In the dataset, wordswere converted into vectors and passed to the DL model [20]. Linear discriminant analysison Naive Bayes NB was implemented and achieved less accuracy using only thefeature of sentiwords [21].The Maximum Entropy algorithm was applied to the movie review dataset and fea-tures extracted by the hybrid feature-extraction method and achieved the highest compared to K Nearest Neighbor KNN and Naive Bayes NB. The features usedare just lexicon features positive word count and negative word count [22]. The highestaccuracy achieved for the IMDb dataset of online movie reviews was 89% because fewerdata were used 250 movie reviews concerning text documents for training purposes and100 movie reviews for testing purposes. Mathematics 2022,10, 3260 4 of 20Table 1. Summary of Accuracy achieved on the dataset of IMDb Models/ Approach Features Dataset Accuracy1CNNBLSTMCNN-BLSTMHybrid [20]Word embedding into vectors IMDb reviews Pre train model82% without the Pre train model2 LDA on Naive Bayes [21] Sentiword Net IMDb reviews Maximum Entropy [22] Sentiment words with TF IDF IMDb reviews Naive Bayes [23] Heterogeneous Features Movie review 89%5 Naive Bayes, KNN [2] Word vector sentiword Movie reviews Entailment as FewShot Learner [1] Word embedding into vectors IMDb reviews pretrainmodel7 Deep ConvolutionNeural Network [24] Vector Features IMDb Movie Reviews LSTM [25] Vector Features IMDb Movie Reviews Neural Network [26] Lexicon Features IMDb reviews 86%Heterogeneous features were extracted from the movie review to achieve the bestperformance for Naive Bayes [23]. There are also some other Amazon datasets publiclyavailable with many non-textual features. Furthermore, many researchers have also workedon an Amazon dataset, analysing reviews using non-textual features, which include productfeatures, user features and ratings [27,28]. The above literature concludes that to improvethe performance of the model features, the size of the dataset plays a more important role;only the use of an efficient algorithm is not sufficient to improve the performance of this experimentation dataset of 5331 positive and 5331 negative processed snippetsor sentences, the sentences are labelled according to their polarity. The total number ofsentences used for training purposes is 9595 sentences or snippets and 1067 sentences areused to test the model. First, the pretrained Word2Vec is used for feature extraction andthen Convolutional Neural Network CNN is applied to these features extracted fromWord2Vec. The Google News dataset contains 3 million words on which Word2Vec istrained to achieve the embedding of words into vectors. Testing accuracy is achieved onthe test dataset and is [24].In this paper, three datasets are used; the first dataset consists of 50 thousand reviews25 thousand are positive, and 25 thousand are negative. The data are already separated inthe form of training and testing reviews in which the ratio of positive and negative numbersof reviews is the same. The first drawback of this experimentation is that the dataset is notselected for training and testing of randomized models, which bringsbias to this paper. Thesecond dataset used in the experiments is 200 movies, each having ten categories in DoubanMovies. The rating of movies was from 0 to 5. A movie rating of 1 to 2 was considered anegative review and a 3 to 5 movie rating is considered a positive review of the movie. Thecomments that had a rating of 3 were ignored. So, there were 6000 used as training andthe other 6000 were used to test the dataset. The total number of comments achieved afterremoving neutral reviews was 12,000. The second drawback is that in this paper, the ratiois 5050 and most of the references show that 8020 or 7030 is the best ratio for splittingthe dataset. For evaluating the classification performance, three classifiers are used forsentiment classification. One is NB, an extreme learning machine and LSTM is conductedbefore that dataset is passed through Word2Vec for word embedding. The word vectorswere sent to LSTM for classification and the results show that LSTM performed better thanother classifiers. The LSTM F-Measure was [25]. The last reference mentioned in Mathematics 2022,10, 3260 5 of 20Table 1, shows that the accuracy achieved by NN is 86% using lexicon features. This alsoapplies to neural networks. In the IMDb dataset of movie reviews used in this research,reviews are normalized using the following steps All the words of reviews are convertedto lower case from upper case words or characters. Secondly, numbers are removed, specialcharacters, punctuational marks and other diacritics are removed. White spaces includedin the review were also removed. Finally, abbreviations are expanded and stop words inreviews are also removed. All the processing of reviews involved in the referred paper isdescribed above [26].Word Embedding Using the Word2Vec ApproachWhile representing a word, the context in which the word is used matters a lot becauseit changes the semantics of words. For example, consider the word ’bank’. One meaning ofthe word bank is a financial place, and another is land alongside water. If the word ’bank’is used in a sentence with words such as treasury, government, interest rates, money, etc.,we can understand by its context words its actual meaning. In contrast, if the context wordsare water, river, etc., the actual meaning in this case of context word is land. One of theemerging and best techniques we know for word embedding is used in many fields suchNLP, biosciences, image processing, etc., to denote text using different models. The resultsusing word embedding are shown in Table 2. Word2Vec results in other fields ResultsImage Processing [29] 90% accuracyNatural Language Processing Tasks [30] More than 90% accuracyRecommendation Tasks [31] Up to 95% accuracyBiosciences [32] More than 90% accuracySemantics Task [33] More than 90% accuracyMalware Detection Tasks [34] Up to 99% accuracyWord embedding is most important and efficient nowadays in terms of representing atext in vectors without losing its semantics. Word2Vec can capture the context of a word,semantic and syntactic similarity, relation with other words, etc. Word2Vec was presentedby Tomas Mikolov in 2013 at Google [35]. Word2Vec shows words in a vector space. Thewords in the review are represented in the form of vectors and placement is carried out sothat dissimilar words are located far away and similar meaning words appear together invector Proposed MethodologyThe proposed methodology, the environment of hardware and software was set asneeded to perform experiments. The hp laptop core i5 4th generation having 8 GB RAMis used for experimentation. The Google Colab software is used and is the IntegratedDevelopment Environment for the Python language in which we peformed our the latest libraries of Python are used for experiments. The research methodologyconsisted of four steps. The steps are dataset acquisition, feature engineering, models andevaluation, shown in Figure 1below. Figure 1defines that after preprocessing of dataacquisition from the IMDb movie review website, it is passed for feature engineering, whichconsists of three blocks B, C and D. B, C and D blocks are used independently as well asin hybrid; B and C, and B and D blocks are named Hybrid-1 and Hybrid-2, E consists of 10-fold cross-validation, training and testing of different ML modelsand the last one is the evaluation process of models. After extraction of features, eachfeature is normalized using the Min/Max Normalization technique. On the normalizedfeature, 10-fold cross-validation is applied to remove the bias. Machine-learning ML anddeep-learning DL models are trained and tested; these are Support Vector Machine SVM,Naive Bayes NB, Random Forest RF, Logistic Regression LR, Multi-Layer Perceptron Mathematics 2022,10, 3260 6 of 20MLP, Convolution Neural Network CNN and Bidirectional Gated Recurrent Unit Bi-GRU. The results were achieved after implementing models on features and were Review Dataset AcquisitionLinguistic Inquiry and Word CountWord2Vec model training and word Embedding Pretrained glove modelMin/Max Normalization Hybrid A+B Hybrid A+B10 k fold stratified cross validationSVMNBRandom ForestLogistic RegressionCNNBiGRUAccuracyRecallPrecisionF1 ScoreComparison AB C DEFigure 1. General Diagram of working flow of Research Dataset AcquisitionThe benchmark of the movie review dataset from IMDb is collected and availablepublicly. The main dataset exists of 50,000 reviews with polarity levels. The ground ratingis also available according to the 10-star rating from different customers. A review with arating of less than 4 is a negative review, and a review with a score of more than seven is apositive review. All the reviews are equally pre-divided into 25,000 positive reviews andthe other 25,000 negative. Each review is available in the text document. Fifty-thousandtext documents containing reviews were Preprocessing for Feature ExtractionAfter downloading, each text document including reviews is preprocessed by usingPYCHARM IDE. In two columns, all the reviews and their polarity are read and written inthe Comma Separated Value CSV file. One column indicates the reviews and the secondcolumn indicates the polarity. Firstly, the reviews in sentences tokenized into words andthen all the special characters, stop words and extra spaces are removed from the reviewusing the NLP tool kit library available in Python. The preprocess reviews are written upin the preprocess column of the CSV file for future Data Preprocessing ToolFor data preprocessing, we use the tool PYCHARM 2018 IDE and Python version Natural Language Tool Kit NLTK is used for text processing such as tokenizationand stop word removal. Google Colab is used for implementing DL algorithms because itprovides GPU and TPU for fast processing. Mathematics 2022,10, 3260 7 of Feature Feature Extraction Using LIWCThe LIWC consists of multiple dictionaries to analyze and extract the features. Toextract psychological, textual and linguistic features from the movie review dataset, LIWCis used. First, the reviews are preprocessed and then used to extract features, as describedin Figure 2. The diagram flow is defined as the preprocessed reviews passing sent to LIWCfor extraction of the feature. LIWC compares each word of review from its dictionariesto check which category the given review word belongs to. It calculates the percentageby counting the number of words in the review that belong to a specific category anddivides by the total number of reviews. The division result is multiplied by 100 to obtain apercentage as described in Equation 1.x=Count the number o f words in review that bel ong to s peci f ic categoryTotal numb er o f words i n review s Ɨ100 1xdenotes the specific subcategory of features in LIWC. The features calculated byLIWC are positive emotionPE, negative emotionNE, angerAng, sadness, clout,dictionary wordsDic, adverbsAdvand adjectivesAdj.PE,NE,Ang,SadandCloutare categorized byLIWCas psychological categorized aslinguistic ReviewsLowercase ,Remove stop words, Extra spaces, special characters, LemmatizationLIWCExtract FeaturesLinguistic/Summary LanguagePsychologicalReadabilityEPositive Emotion, Negative Emotion, Anger, Sad, clout, Adjective, Adverb, Dictionary Words ARI, CLI, Word CountBFigure 2. Feature Engineering with 3shows that after the extraction of features, Min/Max Normalization is appliedand then passes through block E for further implementation, including 10-fold cross-validation, training of ML models, testing of ML models and last is evaluation. Mathematics 2022,10, 3260 8 of 20InmovieheroplaygoodWt Wt-2 Wt-1 Wt+1 Wt+2 Wt+3 Sci-fiction000100Input Layer Hidden Layer Output Layer Example In Sci-fiction movie hero play good role ever and the Window size 7Figure 3. General Diagram of working flow of Research Readability Feature ExtractionThe readability score of reviews defines the effort required to understand the text ofreviews. The three readability features are calculated on the preprocessed reviews ARI,CLI and word is used for measuring the readability of English text and it is calculated by usingthe formula given in Equation 2.ARI = ƗCW+ ƗWSāˆ’ 2where Crepresents characters that counts letters and numbers in review, Wrepresentswords and the number of spaces in review. Srepresents sentences that is the number ofsentences in Iscores define how difficult text is to understand and it is calculated by using theformula given in Equation 3.CL I = 3whereLrepresents the average number of letters per 100 words and S represents theaverage number of sentences per 100 words to measure the understandability of a count WC is calculated by linguistic inquiry word count which consists ofmultiple dictionaries and is calculated with Equation 4.WordCount =Nallwords āˆ’N punctuationāˆ’Nstopwords āˆ’Nnonalpha 4where Nallwords represents the total number of words in the review text, Npunctuationrepresents the number of punctuation characters in the review text, Nstopwords representsthe number of stop words in the review text and Nnonalpha represents the number ofnon-alphabetic terms in the review the extraction, each readability feature of Min/Max Normalization is applied, asdescribed in the next Word Embedding by Review-Based Training of Word2Vec ModelThe features of movie reviews are extracted by training the Word2Vec neural sequence of the feature-extraction process is given in Figure 4below. Firstly, for Mathematics 2022,10, 3260 9 of 20training, the neural model of Word2Vec data is prepared using the dataset of IMDb moviereviews with 50 thousand reviews. The total number of words included in this dataset is6 review was used in the training of the Word2Vec neural model and three differentembedding sizes were used in experiments, 50, 100 and 150, with a context size of 10. Thereare two methods for training the Word2Vec neural model; one is the COBOW context ofthe bag of words and the second one is the Skip-Gram Method. We used the Skip-GramMethod, which focuses on less frequent words and gives good results concerning wordembeddings of less frequent words. Skip-Gram Method operations are given in Figure 3defines that the model is trained by defining the window size 10 and Skip-Gram computes word embedding. Instead of using context words as input to predict thecenter word like a context bag of words, it used the center word as input and predictsthe center word’s context words. For example, ā€œIn Sci-fiction movie hero play good roleā€with context size 7. Training instances are created such as ā€œInā€ is the target word which isthe input and the context word ā€œSci-fiction movie hero play the good roleā€ is the outputword. The training instances are given in Table 3. Using training samples defined above inthe table used for training the neural network, the result of word embedding is generatedfor each word given in the vocabulary. The trained model is saved and movie reviewspass to these models for converting words into vectors. Three different types of vectorshaving sizes of 50, 100 and 150 are created. For classification, Word2Vector features areused measured by Skip-Gram Method passed to block ReviewsLowercase ,Remove stop words, Extra spaces, special characters, Lemmatization50 Thousand Reviews6,142,469 wordsVocabularyTrain Word2vec Neural ModelTrained Word2vec ModelTrained Word2vec ModelTrained Word2vec Model50 embedding sizeContext size 10 100 embedding sizeContext size 10150 embedding sizeContext size 10Skip gram Method Skip gram Method Skip gram MethodTest Model Test Model Test ModelVector50 Vector100 Vector150EConvert Sentences into wordsC1CFigure 4. Feature Extraction Process with self Pretrained Word2Vec 3. Word2Vec Results in other fields’ OutputIn sci-fictionIn movieIn heroIn playIn goodIn roleIn ever Mathematics 2022,10, 3260 10 of Word Embedding by Pretrained Word2Vec ModelThe Glove Model is an unsupervised learning algorithm used for vector representationof words. Training samples are taken from Wikipedia and different books. The GloveModel uses a generalized kind of text on which it is trained. Figure 5describes the stepsfor word embedding into first step is to download the Glove Model in the zip file with 150 vectors and300 vectors. The pretrained Glove Model is loaded and passed for test on the preprocessedreviews. Each preprocess review consists of words and is passed to the test model as inputand output are received as the vector of each review by taking the average of vectors. Eachreview vectors has 150 and 300 numbers in review vectors. The output of the vector ispassed to the E block for further implementation, which includes 10-fold cross-validation,training ML models, testing ML models and Million Vocabulary Wikipedia +books Preprocessed ReviewsConvert Sentences into wordsTrained glove Model 300Trained glove Model 150Test trained ModelTest trained ModelVectors 150 Vectors 300ETaking Mean of vectors Taking Mean of vectors DFigure 5. Feature Extraction Process with Pretrained Word2Vec Evaluation and DatasetThe dataset selected for the experiment is IMDb movie reviews, consisting of 50,000 re-views of different movies with sentiment polarity. The reason for this dataset selection isthat it is the largest number of reviews compared to the previously uploaded dataset ofmovie reviews on the website accessedon 4 April 2022. A total of 25,000 reviews are positive and the other 25,000 thousandreviews are negative. Each review is in the text file so in the zip file 50,000 text files areincluded with their rating value from 1 to 10 as text filename. Mathematics 2022,10, 3260 11 of Feature Exploration and Hypothesis TestingIn this subsection, the linguistic, psychological and readability features extracted fromthe reviews and used in the sentiment-based review classification are explored. A summaryof the descriptive statistics of the features under each category linguistic, psychologicaland readability are provided in Table 4. This summary includes the number of datarecords N, mean, median, standard deviation SD, maximum Max and minimum Minvalues of the features under each category. Moreover, the significance of the featuresrelated to the three categories is examined using hypothesis testing. In order to select theright significance test, the normality of the features is examined. To obtain a sense of thedistributions of features and outcome variable, histograms and associated distributioncurves are plotted as depicted in Figure 6. It is noteworthy that only CLI has a well behavedbell-shaped distribution curve while all other features are not. To confirm this observation,normal probability plots for all features are provided in Figure 7. A normal probability plotdemonstrates the deviation of record distribution from normality. It has been observed thatthe Adv, Adj and Clout distributions deviate slightly from normal distribution. However,all other feature distributions except CLI are not normally investigate the association between input features, a correlation matrix is the probability distributions of most features are not Gaussian, it is not possible to usePearson correlation to check the relationship between features. In contrast, the Spearmancorrelation coefficient is an efficient tool to quantify the linear relationship between con-tinuous variables that are not normally distributed [36]. As this is the case of our inputfeatures, Spearman correlation has been adopted in this study to quantify the associationbetween the features. A heat map of the Spearman correlation coefficient is created andpresented in Figure 8. The circle’s size is indicative of the strength of bivariate correlation map of Figure 8reveals a strong relationship between anger and negativeemotions and between ARI and CLI features, and a moderate association between NE andsadness and between Dic and ARI and CLI. However, the map shows weaker associationbetween the other input features. As the outcome, polarity class, is a categorical variable,the correlation coefficient is not an adequate tool to measure its association with the inputfeatures. Therefore, Binomial Logistic Regression LR has been adopted to investigate thisassociation. Logistic Regression assesses the likelihood of an input feature being linked toa discrete target variable [37]. The input features do not exhibit high multicollinearity, asdeducted from the correlation matrix plot of Figure 8, which makes the LR a suitable test ofassociation for our problem. Table 5displays the output of a Binomial Logistic Regressionmodel that was fitted to predict the outcome based of the linguistic, psychological andreadability feature values. The p-values and significance levels for each of the regressionmodel’s coefficients are listed in Table 5. The asterisks denote the level of the feature’ssignificance; more asterisks imply a higher level of importance. If the associated p-valueis less than three asterisks are used to denote significance, two asterisks are used torepresent significance if the corresponding p-value is in the range [ one asteriskreflects a p-value between and and no asterisk is for p-values larger than Asshown in Table 5, the p-values for PE, NE, Ang, Sad, Clout, Adj and CLI indicate that thesefeatures are statistically significant to the polarity class. Mathematics 2022,10, 3260 12 of 20Figure and probability distribution curves for linguistic, physiological, readabilityfeatures and polarity class probability plots for linguistic, physiological, readability features and polarity classvariables. Mathematics 2022,10, 3260 13 of 20Table statistics summary of linguistic, psychological, readability features and NE Ang Sad Clout Dic Adv Adj WC ARI CLI PolarityMean 175 0 0 0 0 0 0 12 āˆ’ āˆ’ 0Max 99 1304 1N2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000Table significance of linguistic, psychological and readability features using BinomialLogistic S-Error t-Statistics p-Value QualityPE āˆ’ āˆ’ **NE ***Ang ***Sad *Clout ***Dic **WC āˆ’ āˆ’ ***Figure Correlation coefficient matrix of linguistic, psychological and readability Chi-square hypothesis test is conducted to verify the sufficiency of the LR modelto test a feature’s significance. The null hypothesis of the test, H0, assumes that there isno relationship between the response variable, the polarity and any of the input features, all model coefficients except the intercept are zero. On the other hand, the alternativehypothesis, H1, implies that if any of the predictor’s coefficients is not zero, then thelearning model is called efficient. The p-value of the Chi-square test of the model wasrecorded as 1988 degrees of freedom on 2000 observations for all indicates that the LR model differs statistically from a constant model with only theintercept term and can be considered as an adequate test of feature significance. As aresult, the null hypothesis can be rejected, and the association between the input featuresin predicting the polarity of a review is confirmed. As depicted in Table 4, the binomial LRreveals that all psychological features are significant. However, only Adj from the linguistic Mathematics 2022,10, 3260 14 of 20features and CLI from the readability features are significant. Therefore, only significantfeatures are used for review classification in this Evaluation Measure and Performance ComparisonThe evaluation of the deep-learning and conventional models is carried out by calcu-lating the performance measures accuracy, precision, recall and F-Measure. These perfor-mance measures are calculated on the basis of a confusion matrix. The details of confusionmatrixes are given Confusion MatrixA confusion matrix is also known as an error matrix and is used for measuring theperformance of a classification model. A confusion matrix is represented in Figure a review is an actual negative, and the model is predicted as positive it is calledfalse positive FP. When a review is an actual positive, and the model is predicted to bepositive, it is called true positive TP. When a review is an actual positive, and the model ispredicted as negative, it is called false negative FN. When a review is an actual negative,and the model is predicted as negative, it is called true negative TN.TP FPFN TNPositive1 Negative0Positive1Negative0Actual ValuesPredicted ValuesFigure 9. Confusion Pretrained Word EmbeddingThe pretrained word embedding Glove experimented with two different words em-bedding word vector dimensions 150 and 300. The 6 ML classifiers are used with 150 wordvector dimensions and each word vector is tested. The experiments with 150 and 300 wordvector dimension and their results are shown in Tables 6and 6. Results of pretrained model of vector dimension of 150 Accuracy Precision Recall F ScoreMulti-LayerPerceptron NearestNeighborā€ Forest Bayes VectorMachine Mathematics 2022,10, 3260 15 of 20Table 7. Pretrained model of vector dimension 300 Training Accuracy Average Testing Accuracy AverageCNN the movie review dataset preprocessing, it is passed to the 10-fold stratified cross-validation for the unbiased splitting of the dataset. The Glove pretrained model for featureengineering process is used. The 150 dimensions of the Glove pretrained model are used asa feature for ML models. The six ML algorithms are applied and SVM achieves the bestresults concerning other algorithms NB, RF, LR, KNN and MLP on the evaluation measuresof accuracy, precision, recall and F-Measure. The highest F-Measure score achieved is SVM, which is the impact of the pretrained Glove Model with 150 dimensions offeature vectors. The ML algorithm performs better on the 150 dimension vector of MLP, three layers are used with 20 neurons at each layer to predict review impact of the pretrained Glove Model having 300 dimensions is represented inTable 7. The two DL models are applied to features having a vector dimension of used models are CNN and Bi-GRU and the best results are achieved with Bi-GRUwith testing accuracy. The lowest dimension of the pretrained model is 150, whichleaves a higher impact on the results using the traditional ML algorithm compared to the300 dimensions using the DL Review-Based Trained Word2Vec Model Word EmbeddingThe reviews are embedded into vectors with three different word vector size dimen-sions, 50, 100 and 150. Then, the ML and DL algorithms are applied to varying sizes ofvectors of 28 dimensions independently and evaluated. Finally, the results are shown inTable 8based on the 8. Trained Model on reviews with 50 word vector dimension evaluation Accuracy Precision Recall F ScoreNaive Bayes Forest VectorMachine 50 dimensions of the Word2Vec model are self-trained on movie reviews. Afterthat self-trained model, it is used for word embedding of the movie reviews into vectorsrepresenting the meaning of that word. Then, the six ML algorithms are applied. The SVMachieves the best results compared to other algorithms, NB, RF, LR, KNN and MLP, on theevaluation measures accuracy, precision, recall and F-Measure. The highest F Measure scoreachieved is using SVM with 50 word embedding dimension, which is the impact ofthe self-trained model with a smaller number of dimensions. In Table 9, the 100 dimensionparameter of the self-trained model is evaluated using a confusion matrix. Mathematics 2022,10, 3260 16 of 20Table 9. Without the pretrained model with a 100 word vector dimension evaluation Accuracy Precision Recall F ScoreNaive Bayes NearestNeighbor Forest VectorMachine 100 dimensions of the Word2Vec model are self-trained on movie reviews. Afterthat model is self-trained, it is used for word embedding of the movie reviews into vectorsrepresenting the meaning of that word. Then, the six ML algorithms are applied. The SVMachieves the best results compared to other algorithms, NB, RF, LR, KNN and MLP on theevaluation measures accuracy, precision, recall and F-Measure. The highest F-Measurescore achieved is using SVM with 100 word embedding dimensions, which is theimpact of the self-trained model with a higher number of dimensions than previous Table 10, the impact of 150 dimensions of the self-trained model is trained on reviews of 150 word vector dimension without psychological, linguisticand readability features evaluation Accuracy Precision Recall F ScoreNaive Bayes NearestNeighbor Forest VectorMachine 150 dimensions of the Word2Vec model are self-trained on movie reviews. First,the context size of the model is set to 10 and the Skip-Gram Method is used to train theWord2Vec model. After that model is self-trained, it is used for word embedding of themovie reviews into vectors representing the meaning of that word. Then, the six MLalgorithms are applied. The SVM achieves the best results compared to other algorithms,NB, RF, LR, KNN and MLP on the evaluation measures accuracy, precision, recall andF-Measure. The highest F-Measure score achieved is using SVM with 150 wordembedding dimensions, which is the impact of the self-trained model with a higher numberof dimensions than the previous 50 and 100 dimension results. In Table 11, the impactof 150 dimensions of the self-trained model in addition to psychological, linguistic andreadability features is defined. The 150 dimension self-trained model with proposedfeatures is considered because it shows better results than the pretrained Glove psychological features are extracted using LIWC. Next, the psychological featuresused in this experiment are positive emotion, negative emotion, anger, sadness, clout anddictionary words. CLI’s readability feature is used because it gave a better result in theprevious experiment. Mathematics 2022,10, 3260 17 of 20Table trained on reviews of 150 word vector dimension with psychological, linguistic andreadability features evaluation Accuracy Precision Recall F ScoreNaive Bayes NearestNeighbor Forest VectorMachine the six ML algorithms are applied. The SVM achieves the best results withrespect to other algorithms, NB, RF, LR, KNN and MLP, on the accuracy, precision, recall andF-Measure evaluation measures. The highest F-Measure score achieved is using psychological, linguistic and readability features improve the evaluation 12 shows the impact of 300 dimensions of the self-trained model concerning results on word embedding 150 word vectors with psychological and Training Average Accuracy Testing AccuracyCNN 2 Layers evaluation result of two DL algorithms applied to 300 dimension vectors withoutpsychological and readability features. The impact on accuracy of 300 dimensions of theself-trained model is higher than the 300 dimensions of the pretrained model. The resultsshow that the method of embedding that is context-based gives higher results with respectto global based embedding. The applied models are CNN with two layers with 32 and64 neurons, respectively. Bi-GRU is used, which has two gates; one is an updated gate andthe other is a reset gate. The update gate is used to the retain memory and the reset gateis used to forget memory. The best results are achieved with Bi-GRU with testingaccuracy as compared to the pretrained Glove evaluation results of two DL algorithms applied on 300 word vectors with psy-chological and readability features are given in Table results on word embedding 300 word vectors with psychological, linguistic andreadability Training Accuracy Average Testing Accuracy AverageCNN the psychological features are extracted using LIWC. The psychological featuresused in this experiment are positive emotion, negative emotion, anger, sadness, clout anddictionary words. CLI’s readability feature gave a better result in the previous applied models are CNN with two layers with 32 and 64 neurons, Bi-GRU has two gates; one is an updated gate and the other is a reset gate. The updatedgate is used to retain the memory and the reset gate is used to forget the memory. Bi-GRUachieves the best results with testing accuracy compared to the pretrained GloveModel. In Table 14, a comparison is given between the proposed work and the previouswork based on evaluation measures. Mathematics 2022,10, 3260 18 of 20Table 14. Comparison of F-Measure of Proposed work with Previous Embedding Model Classifier F-MeasureReview based trainedWord2Vec Support Vector Machine Word2Vec [16] CNN-BLSTM Word2Vec [22] LSTM [18] Maximum Entropy analysis of the results following the experiment is given below.•The self-trained Word2Vec model on movie reviews with 150 dimension parameterhas a higher impact on performance than the pretrained Glove Model.• The CLI readability achieved the highest score compared to ARI and WC.•The SVM algorithm performs better than the applied algorithms NB, LR, RF, CNN,KNN and MLP.•The use of the psychological and readability feature CLI to classify reviews withself-trained embedding improves the performance from 86% to smaller number of words embedding dimension 150 performs better concerningthe traditional ML algorithm and for the DL algorithm 300 dimensions gives a ConclusionsClassification of opinion mining of reviews is open research due to the continuousincrease in available data. Many approaches have been proposed to achieve classificationof movie reviews. After a critical analysis of the literature, we observe that words areconverted into vectors for sentiment classification of movie reviews by different approaches,including TF-IDF and Word2Vec. The pretrain model of Word2Vec is commonly used forword embedding into vectors. Mostly generalized data are used to train the Word2Vecmodel for extracting features from reviews. We extract features by training the Word2Vecmodel on specific data related to 50 thousand reviews. For review classification, theWord2Vec model is trained on reviews. Most researchers used a generalized trained modelfor review classification as an alternative. This research work extracts features from moviereviews using a review-based trained Word2Vec model and LIWC. The review-basedtrained data have some characteristics. They include 6 million vocabularies of the wordand are specific to movie reviews related to the task of sentiment classification of six ML algorithms are applied, and SVM achieves the best result of F-Measurewith respect to other algorithms NB, RF, LR, KNN and DL algorithms are also applied. One is CNN and the other is Bi-GRU. Bi-GRUachieved which is greater than the results CNN achieved. The results conclude thatthe data used for model training perform better than the model trained on generalized the ML algorithm,150 features perform better than 50 and 100 features for theused movie review dataset. The DL model 300 feature vectors perform better classificationsthan the 150 feature vectors. Significant psychological, linguistic and readability featuresaided in improving the classification performance of the used classifiers. SVM achievedan F-Measure with 150 word vector size and BiGRU achieved the same F-Measurescore using 300 word vector size. We applied both traditional ML and DL algorithmsfor the classification of reviews. Both achieved nearly the same results on a performancemeasure that proves that the dataset of IMDb movie reviews having 50,000 is not enoughfor applying a DL algorithm. In future work, a larger dataset is needed to apply the DLalgorithm to increase the classification performance of ContributionsConceptualization, Muhammad Shehrayar Khan, Saleem Khan, Muhammad Shehrayar Khan, and methodology, Muhammad Shehrayar Khan, Muhammad Saleem Khan, and Mathematics 2022,10, 3260 19 of software Muhammad Shehrayar Khan, Muhammad Saleem Khan, and validation, Muhammad Shehrayar Khan, MuhammadSaleem Khan, and formal analysis, Muhammad Shehrayar Khan, and investigation, Muhammad Shehrayar Khan, MuhammadSaleem Khan, and resources, Muhammad Shehrayar Khan; data curation, Saleem Khan; writing original draft preparation, Muhammad Shehrayar Khan, Muhammad Saleem Khan, and writing review and editing, Shehrayar Khan, Muhammad Saleem Khan, and Muhammad Shehrayar Khan, Muhammad Saleem Khan, and supervision, project administration, funding acquisition, and authors have read and agreed to the published version of the This research received no external Availability StatementI declare that the data considered for this research is original andcollected by the authors for generating insights. Moreover, the data mining & ML tools consideredfor this research are freely available and built the models in accordance with our own of InterestThe authors declare that there is no conflict of interest related to this Wang, S.; Fang, H.; Khabsa, M.; Mao, H.; Ma, H. Entailment as Few-Shot Learner. arXiv 2021, arXiv Nguyen, Dao, Sentiment Analysis of Movie Reviews Using Machine Learning Techniques. InProceedings of Sixth International Congress on Information and Communication Technology, London, UK, 25–26 February 2021;Springer Berlin, Germany, 2022; pp. 361– U.; Khan, S.; Rizwan, A.; Atteia, G.; Jamjoom, Samee, Aggression Detection in Social Media from Textual DataUsing Deep Learning Models. Appl. Sci. 2022,12, 5083. [CrossRef] T.; Faisal, Rizwan, A.; Alkanhel, R.; Khan, Muthanna, A. Efficient Fake News Detection Mechanism UsingEnhanced Deep Learning Model. Appl. Sci. 2022,12, 1743. [CrossRef] Rizwan, A.; Iqbal, K.; Fasihuddin, H.; Banjar, A.; Daud, A. Prediction of Movie Quality via Adaptive Voting Access 2022,10, 81581–81596. [CrossRef] A.; Abbas, Y.; Ahmad, T.; Mahmoud, Rizwan, A.; Samee, A Healthcare Paradigm for Deriving KnowledgeUsing Online Consumers’ Feedback. Healthcare 2022,10, 1592. [CrossRef] A.; Agrawal, A.; Rath, Classification of sentiment reviews using n-gram machine learning approach. Expert 2016,57, 117–126. [CrossRef] Mohamed, Haggag, A survey on opinion summarization techniques for social media. Future J. 2018,3, 82–109. [CrossRef] I.; Varma, Govardhan, A. Preprocessing the informal text for efficient sentiment analysis. Int. J. Emerg. TrendsTechnol. Comput. Sci. IJETTCS 2012,1, 58– Shenoy, Mohan, Aspect term extraction for sentiment analysis in large movie reviews using Gini Indexfeature selection method and SVM classifier. World Wide Web 2017,20, 135–154. [CrossRef] B.; Lee, L.; Vaithyanathan, S. Thumbs up? Sentiment classification using machine learning techniques. arXiv2002,arXivcs/ Bakar, Yaakub, A review of feature selection techniques in sentiment analysis. Intell. Data 159–189. [CrossRef] M.; Harish, B. A New Feature Selection Method based on Intuitionistic Fuzzy Entropy to Categorize TextDocuments. Int. J. Interact. Multimed. Artif. Intell. 2018,5, 106. [CrossRef] M.; Brooke, J.; Tofiloski, M.; Voll, K.; Stede, M. Lexicon-based methods for sentiment analysis. Comput. 267–307. [CrossRef] A.; Zhang, D.; Levene, M. Combining lexicon and learning based approaches for concept-level sentiment of the First International Workshop on Issues of Sentiment Discovery and Opinion Mining, Beijing, China, 12 August2012; pp. 1– L.; Wang, H.; Gao, S. Sentiment feature selection for sentiment analysis of Chinese online reviews. Int. J. Mach. 2018,9, 75–84. [CrossRef] S.; Kar, Baabdullah, A.; Al-Khowaiter, Big data with cognitive computing A review for the future. Int. J. 2018,42, 78–89. [CrossRef]18. Fink, L.; Rosenfeld, L.; Ravid, G. Longer online reviews are not necessarily better. Int. J. Inf. Manag. 2018,39, 30–37. [CrossRef] L.; Goh, Jin, D. How textual quality of online reviews affect classification performance A case of deep learning sentimentanalysis. Neural Comput. Appl. 2020,32, 4387–4415. [CrossRef] Mathematics 2022,10, 3260 20 of Z. Sentiment Analysis of Movie Reviews based on Machine Learning. In Proceedings of the 2020 2nd InternationalWorkshop on Artificial Intelligence and Education,Montreal, QC, Canada, 6–8 November 2020; pp. 1– Karim, M.; Das, S. Sentiment analysis on textual reviews. IOP Conf. Ser. Mater. Sci. Eng. 2018,396, 012020. [CrossRef] H.; Harish, B.; Darshan, H. Sentiment Analysis on IMDb Movie Reviews Using Hybrid Feature Extraction Method. Int. Multimed. Artif. Intell. 2019,5, 109–114. [CrossRef] R. Sentiment analysis of movie reviews using heterogeneous features. In Proceedings of the 2018 2nd InternationalConference on Electronics, Materials Engineering & Nano-Technology IEMENTech, Kolkata, India, 4–5 May 2018; pp. 1– Chaurasia, S.; Srivastava, Sentiment short sentences classification by using CNN deep learning model withfine tuned Word2Vec. Procedia Comput. Sci. 2020,167, 1139–1147. [CrossRef] Liu, Luo, X.; Wang, L. An LSTM approach to short text sentiment classification with word embeddings. InProceedings of the 30th conference on computational linguistics and speech processing ROCLING 2018, Hsinchu, Taiwan, 4–5October 2018; pp. 214– Z.; Zulfiqar, Xiao, C.; Azeem, M.; Mahmood, T. Sentiment analysis on IMDB using lexicon and neural Appl. Sci. 2020,2, 1–10. [CrossRef] A.; Mukhopadhyay, S.; Panigrahi, Goswami, S. Utilization of oversampling for multiclass sentiment analysis onAmazon review dataset. In Proceedings of the 2019 IEEE 10th International Conference on Awareness Science and TechnologyiCAST, Morioka, Japan, 23–25 October 2019; pp. 1– A.; Akhilesh, V.; Aich, A.; Hegde, C. Sentiment analysis of restaurant reviews using machine learning techniques. InEmerging Research in Electronics, Computer Science and Technology; Springer Berlin, Germany, 2019; pp. 687– Ghosh, Valveny, E.; Harit, G. Beyond visual semantics Exploring the role of scene text in image Recognit. Lett. 2021,149, 164–171. [CrossRef] L.; Wang, G.; Zuo, Y. Research on patent text classification based on Word2Vec and LSTM. In Proceedings of the 2018 11thInternational Symposium on Computational Intelligence and Design ISCID, Hangzhou, China, 8–9 December 2018; Volume 1,pp. 71– Q.; Dong, H.; Wang, Y.; Cai, Z.; Zhang, L. Recommendation of crowdsourcing tasks based on Word2Vec semantic tags. Mob. Comput. 2019,2019, 2121850. [CrossRef] PeƱa, Breis, San RomĆ”n, I.; Barriuso, Baraza, Snomed2Vec Representation of SNOMED CTterms with Word2Vec. In Proceedings of the 2019 IEEE 32nd International Symposium on Computer-Based Medical SystemsCBMS, Cordoba, Spain, 5–7 June 2019; pp. 678– A.; Khatua, A.; Cambria, E. A tale of two epidemics Contextual Word2Vec for classifying twitter streams duringoutbreaks. Inf. Process. Manag. 2019,56, 247–257. [CrossRef] T.; Mao, Q.; Lv, M.; Cheng, H.; Li, Y. Droidvecdeep Android malware detection based on Word2Vec and deep beliefnetwork. KSII Trans. Internet Inf. Syst. TIIS 2019,13, 2180– T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv2013,arXiv C.; Dehon, C. Influence functions of the Spearman and Kendall correlation measures. Stat. Methods 497–515.[CrossRef]37. Collett, D. Modelling Binary Data; CRC Press Boca Raton, FL, USA, 2002. ResearchGate has not been able to resolve any citations for this healthcare agencies HHCAs provide clinical care and rehabilitation services to patients in their own homes. The organization’s rules regulate several connected practitioners, doctors, and licensed skilled nurses. Frequently, it monitors a physician or licensed nurse for the facilities and keeps track of the health histories of all clients. HHCAs’ quality of care is evaluated using Medicare’s star ratings for in-home healthcare agencies. The advent of technology has extensively evolved our living style. Online businesses’ ratings and reviews are the best representatives of organizations’ trust, services, quality, and ethics. Using data mining techniques to analyze HHCAs’ data can help to develop an effective framework for evaluating the finest home healthcare facilities. As a result, we developed an automated predictive framework for obtaining knowledge from patients’ feedback using a combination of statistical and machine learning techniques. HHCAs’ data contain twelve performance characteristics that we are the first to analyze and depict. After adequate pattern recognition, we applied binary and multi-class approaches on similar data with variations in the target class. Four prominent machine learning models were considered SVM, Decision Tree, Random Forest, and Deep Neural Networks. In the binary class, the Deep Neural Network model presented promising performance with an accuracy of However, in the case of multiple class, the random forest model showed a significant outcome with an accuracy of Additionally, variable significance is derived from investigating each attribute’s importance in predictive model building. The implications of this study can support various stakeholders, including public agencies, quality measurement, healthcare inspectors, and HHCAs, to boost their performance. Thus, the proposed framework is not only useful for putting valuable insights into action, but it can also help with retrieval from huge social web data is a challenging task for conventional search engines. Recently, information filtering recommender systems may help to find movies, however, their services are limited because of not considering movie quality aspects in detail. Prediction of movies can be improved by using the characteristics of social web content about a movie such as social-quality, tag quality, and a temporal aspect. In this paper, we have proposed to utilize several features of social quality, user reputation and temporal features to predict popular or highly rated movies. Moreover, enhanced optimization-based voting classifier is proposed to improve the performance on proposed features. Voting classifier uses the knowledge of all the candidate classifiers but ignores the performance of the model on different classes. In the proposed model, weight is assigned to each model based on its performance for each class. For the optimal selection of weights for the candidate classifiers, Genetic Algorithm is used and the proposed model is called Genetic Algorithm Voting GA-V classifier. After labeling the suggested features by using a fixed threshold, several classifiers like Bayesian logistic regression, NaĆÆve Bayes, BayesNet, Random Forest, SVM, Decision Tree, LSTM and AdaboostM1 are trained on MovieLens dataset to find high-quality/popular movies in different categories. All the traditional ML models are compared with GA-V in terms of precision, recall and F1 score. The results show the significance of the proposed features and proposed GA-V KhanSalabat KhanAtif Rizwan Nagwan AbdelsameeIt is an undeniable fact that people excessively rely on social media for effective communication. However, there is no appropriate barrier as to who becomes a part of the communication. Therefore, unknown people ruin the fundamental purpose of effective communication with irrelevant—and sometimes aggressive—messages. As its popularity increases, its impact on society also increases, from primarily being positive to negative. Cyber aggression is a negative impact; it is defined as the willful use of information technology to harm, threaten, slander, defame, or harass another person. With increasing volumes of cyber-aggressive messages, tweets, and retweets, there is a rising demand for automated filters to identify and remove these unwanted messages. However, most existing methods only consider NLP-based feature extractors, TF-IDF, Word2Vec, with a lack of consideration for emotional features, which makes these less effective for cyber aggression detection. In this work, we extracted eight novel emotional features and used a newly designed deep neural network with only three numbers of layers to identify aggressive statements. The proposed DNN model was tested on the Cyber-Troll dataset. The combination of word embedding and eight different emotional features were fed into the DNN for significant improvement in recognition while keeping the DNN design simple and computationally less demanding. When compared with the state-of-the-art models, our proposed model achieves an F1 score of 97%, surpassing the competitors by a significant spreading of accidental or malicious misinformation on social media, specifically in critical situations, such as real-world emergencies, can have negative consequences for society. This facilitates the spread of rumors on social media. On social media, users share and exchange the latest information with many readers, including a large volume of new information every second. However, updated news sharing on social media is not always this study, we focus on the challenges of numerous breaking-news rumors propagating on social media networks rather than long-lasting rumors. We propose new social-based and content-based features to detect rumors on social media networks. Furthermore, our findings show that our proposed features are more helpful in classifying rumors compared with state-of-the-art baseline features. Moreover, we apply bidirectional LSTM-RNN on text for rumor prediction. This model is simple but effective for rumor detection. The majority of early rumor detection research focuses on long-running rumors and assumes that rumors are always false. In contrast, our experiments on rumor detection are conducted on real-world scenario data set. The results of the experiments demonstrate that our proposed features and different machine learning models perform best when compared to the state-of-the-art baseline features and classifier in terms of precision, recall, and F1 growth of social networking web users, people daily shared their ideas and opinions in the form of texts, images, videos, and speech. Text categorization is still a crucial issue because these huge texts received from the heterogeneous sources and different mindset peoples. The shared opinion is to be incomplete, inconsistent, noisy and also in different languages form. Currently, NLP and deep neural network methods are widely used to solve such issues. In this way, Word2Vec word embedding and Convolutional Neural Network CNN method have to be implemented for effective text classification. In this paper, the proposed model perfectly cleaned the data and generates word vectors from pre-trained Word2Vec model and use CNN layer to extract better features for short sentences find out what other people think has been an essential part of information-gathering behaviors. And in the case of movies, the movie reviews can provide an intricate insight into the movie and can help decide whether it is worth spending time on. However, with the growing amount of data in reviews, it is quite prudent to automate the process, saving on time. Sentiment analysis is an important field of study in machine learning that focuses on extracting information of subject from the textual reviews. The area of analysis of sentiments is related closely to natural language processing and text mining. It can successfully be used to determine the attitude of the reviewer in regard to various topics or the overall polarity of the review. In the case of movie reviews, along with giving a rating in numeric to a movie, they can enlighten us on the favorableness or the opposite of a movie quantitatively; a collection of those then gives us a comprehensive qualitative insight on different facets of the movie. Opinion mining from movie reviews can be challenging due to the fact that human language is rather complex, leading to situations where a positive word has a negative connotation and vice versa. In this study, the task of opinion mining from movie reviews has been achieved with the use of neural networks trained on the ā€œMovie Review Databaseā€ issued by Stanford, in conjunction with two big lists of positive and negative words. The trained network managed to achieve a final accuracy of 91%.Duc Duy Tran Sang NguyenTran Hoang Chau DaoSentiment analysis is the interpretation and classification of emotions and opinions from the text. The scale of emotions and opinions can vary from positive to negative and maybe neutral. Customer sentiment analysis helps businesses to point out the public’s thoughts and feelings about their products, brands, or services in online conversations and feedback. Natural language processing and text classification are crucial for sentiment analysis. That means we can predict or classify customers’ opinions given their comments. In this paper, we do sentiment analysis in the two different movie review datasets using various machine learning techniques including decision tree, naĆÆve Bayes, support vector machine, blending, voting, and recurrent neural networks RNN. We propose a few frameworks of sentiment classification using these techniques on the given datasets. Several experiments are conducted to evaluate them and compared with an outstanding natural language processing tool Stanford CoreNLP at present. The experimental results have shown our proposals can achieve higher performance, especially, the voting and RNN-based classification models can result in better with visual and scene text content are ubiquitous in everyday life. However, current image interpretation systems are mostly limited to using only the visual features, neglecting to leverage the scene text content. In this paper, we propose to jointly use scene text and visual channels for robust semantic interpretation of images. We not only extract and encode visual and scene text cues but also model their interplay to generate a contextual joint embedding with richer semantics. The contextual embedding thus generated is applied to retrieval and classification tasks on multimedia images with scene text content to demonstrate its effectiveness. In the retrieval framework, we augment the contextual semantic representation with scene text cues to mitigate vocabulary misses that may have occurred during the semantic embedding. To deal with irrelevant or erroneous scene text recognition, we also apply query-based attention to the text channel. We show that our multi-channel approach, involving contextual semantics and scene text, improves upon the absolute accuracy of the current state-of-the-art methods on Advertisement Images Dataset by in the relevant statement retrieval task and by 5% in the topic classification task.

language features of review text