From the Selectedworks of Nader Ale Ebrahim Ethical and Unethical Methods of Plagiarism Prevention in Academic Writing Ethical and Unethical Methods of Plagiarism Prevention in Academic Writing

This paper discusses plagiarism origins, and the ethical solutions to prevent it. It also reviews some unethical approaches, which may be used to decrease the plagiarism rate in academic writings. We propose eight ethical techniques to avoid unconscious and accidental plagiarism in manuscripts without using online systems such as Turnitin and/or iThenticate for cross checking and plagiarism detection. The efficiency of the proposed techniques is evaluated on five different texts using students individually. After application of the techniques on the texts, they were checked by Turnitin to produce the plagiarism and similarity report. At the end, the " effective factor " of each method has been compared with each other; and the best result went to a hybrid combination of all techniques to avoid plagiarism. The hybrid of ethical methods decreased the plagiarism rate reported by Turnitin from nearly 100% to the average of 8.4% on 5 manuscripts.


Introduction
Academic publishing is a key factor for any person in academia. All research students, lecturers, professors and researchers in general tend to publish their works and the results of their studies in academic conference proceedings, books, journals or magazines (Hockenos, 2013). Even though getting a paper published is very important, the originality and ethic of the context of the published paper is very important as well. The most common problem in academic manuscripts is "Plagiarism" (Carrol, 2002;Hawley, 1984;Patel, Bakhtiyari, & Taghavi, 2011). Plagiarism has been defined by different resources. But one of the best definitions of plagiarism is represented on Dictionary.com web site: "an act or instance of using or closely imitating the language and thoughts of another author without authorization and the representation of that author's work as one's own, as by not crediting the original author." (Dictionary.com, 2013) In Etymonline, it is also defined as "kidnapper, seducer, plunderer" which is originality since 1597. In American Heritage, it is called as "Literary theft", and in Encyclopedia Britannica is known as "fraudulence, forgery, piracy-practices" (Encyclopedia-Britannica, 2013). Therefore, plagiarism is the act of copying someone else text, idea or language and publish it as it is his/her own work. It is very clear from the above definitions that plagiarism is against the law, and the authors should avoid plagiarism in their own manuscripts.
Nowadays, plagiarism is growing along with increase of publications and academic papers. One of the most important criteria to evaluate the academics is their publications. Therefore, it is very important to help them to increase their publication quality, and make them aware of plagiarism consequences. This paper gives a specific attention to the authors, who plagiarize unconsciously without being aware of the plagiarism. Therefore, the present study is to propose some ethical techniques which can prevent or at least decrease the plagiarism. Beside the ethical techniques, we also present the unethical methods of masking plagiarism, which are employed by the authors who consciously plagiarize. However, in order to decrease the rate of plagiarism, we do not recommend using unethical methods which are not accepted in academia. But it is worth mentioning them to drag librarian and academic lecturers' attention. To evaluate the efficiency of proposed ethical techniques, they are applied in five different texts using 16 students. The results had shown promising efficiency in plagiarism.

Plagiarism: Literature & Causes
At the beginning, we discuss the causes of plagiarism, and analyze how an author might commit plagiarism. There are two types of authors who have plagiarism in their manuscripts. The first type is the direct plagiarizing that the author would plagiarize by duplicating another author's text exactly as it is, and pretending as his/her own work. It is commonly known as copy-pasting in word processing applications. The second type is about authors who plagiarize unconsciously. There are many factors which may cause these problems. Some of these factors are as follows: 1) Uncited Ideas & Concepts: Scholars are used to read many articles to improve other methodologies in order to deliver a contribution in science. Using other researcher's idea is considered as plagiarism unless his/her research article would be cited properly in the manuscript. Even though authors cite the other scholars' research, they should avoid using the same words and language structures of the original text.
2) Many Authors: In many manuscripts, there might be many authors involved. However, not every single author is aware whether the others have been honest and ethical in their writings or not. In a collaborative work, each author is responsible for what he/she writes, but when it gets published all the authors are partially responsible for the whole published material. Usually all authors trust each other in publications. However, always the authors may not be honest. There is a famous quotation which says: "Sometimes too much trust kills you".
3) Accidental Similarity: The next factor is when the text of the author has been closely similar to another author's by chance. When an idea or a sentence is as usual as everybody uses in their own text, it might be possible that a sentence or a concept would be recognized as plagiarism in plagiarism detection systems.

4) Fixed Definitions:
At almost all sciences, there are fixed definitions which authors write and cite them, and modification would make them a change or twist the meaning.

5) Cross-Text Plagiarism (Text Recycling):
A piece of writing consists of several parts such as Introduction, Body and Conclusion. These parts are dependent on the manuscript goals, applications and styles. However, they might have more sections including Abstract, Evaluation, etc. Sometimes, authors need to repeat some concepts in various parts of the text. For example, in the abstract they have to talk about what they are presenting and discussing in the whole paper. Meanwhile, at the same time, each part of the abstract has been discussed in details in the body of the manuscript. Another example, in methodologies, evaluations, or even conclusion; the authors would definitely address the problems and issues which have been discussed and elaborated in the other sections (e.g. Introduction) (IEEE, 2011). These texts with the same meaning occurring in various places of the text might cause language and word similarity inside the manuscript. To elaborate, a sentence in introduction with the definition of the problem statement might be duplicated somewhere else in the paper (e.g. in Conclusion) with the exact language and sentence structure (iThenticate.com, 2011). This problem is a very common issue with plagiarism. The ability of writing a sentence with a same meaning in various styles and structures is an art, and it needs practice and authority in both language and science. Co-authoring and having the manuscript proofread help significantly to overcome this problem.  Vol. 7, No. 7;2014 6) Self-Plagiarism (Similarity): Researchers might publish more than one article from their own research. Even though they are different, they have many similarities in different sections of the articles. These papers may fall into plagiarism by the following three cases: a. Redundancy and Duplication: This is also known as dual publication. It refers to the publication of a single research article in more than one journal. Schein has done a research that 93 out of 660 studies (more than 14%) from 3 surgical major journals were in case of dual publication (Schein & Paladugu, 2001).
Dual publication is not always considered as plagiarism. In two cases, dual publication is accepted and ethical.
i) Summaries and abstracts which have been presented and published in a conference proceedings can get published in a journal with some extension on the original paper. This extension usually should be minimum 30%.
ii) A published article can be republished in another language. But in the second publication, the existence of the original (first article) should be mentioned and addressed.
b. Salami Slicing: Sometimes when the research has different aspects, the authors generate various articles on different sections of the research such as literature review and methodology. They may also split up the results (data fragmentation) in order to generate different research papers. In all these papers, the basics such as the Introduction and Problem Statements are very similar to each other. Therefore, writing the same concepts in different words is usually difficult. Some authors only copy the text from their own previous published papers.
Besides data fragmentation, there is another issue called data augmentation. Data augmentation refers to the publication of the studies which have been already published, and subsequently, the authors would collect new data to improve the result and the contribution of the research. Having both results mixed together and get it published might easily mislead the readers (malpractice).
c. Copyright Infringement: The act of copying a text from the author's previous published manuscripts is another case of plagiarism. Although the authors have written the original text, it was published under the copyright of a publisher. According to the copyright rules and regulations, making a copy of the published text in whole section or a part is considered as plagiarism (or self-plagiarism).

7)
Metadata: At the time of producing a document, the word processors (such as Microsoft Office Word) make some metadata inside the document, which are hidden from the eyes and print. The metadata consist of many useful information, and sometimes they cause headaches to the author (Hockenos, 2013;Mehta, Meyer, & Murgai, 2013). For example, in a Microsoft Word document file, the generated metadata contains the amount of times in minutes which took to complete the file, the last 10 authors of the file (last 10 computer names which edited the file), the company which has registered Microsoft Office product, and in some cases with the track changes turned on, the deleted texts could also be found and extracted. Dennis Rader was prosecuted because of murder, torture and bind. He was found and caught by police because his word processor left his company information in his sent document's metadata (Anderson, 2013).
The plagiarism as the problem, and its causes have been discussed. However the main challenge is to provide a solution on "How can we avoid the plagiarism?"

Plagiarism Prevention Methods and Techniques
In the previous section, the driving factors of plagiarism in a manuscript were discussed. Here, few solutions are presented to avoid unwanted plagiarism. As it has been discussed earlier, there are two groups of authors with plagiarism. The first group of authors who are aware of plagiarism and what they do. For this group the only solution is providing them sufficient education on consequences of ethical and unethical writings. The second group includes the authors who plagiarize unconsciously without any intention to do so. Accordingly, there are two major methods to cover up plagiarism by these two groups of authors: 1) Unethical methods, 2) Ethical methods.
Unethical methods provide the solutions to decreasing the plagiarism by bypassing the plagiarism detection software. In fact, the plagiarism has not been solved by these methods, but the software would report very low or even 0% of plagiarism and similarity. Ethical methods talk about how to change the text properly in order to decrease and to avoid the real amount of the plagiarism. The methods discussed in this paper are considered as the authors have followed the ethical behavior in their research and have cited all references correctly and properly in their manuscripts. In the follow, the solutions, methods and techniques are discussed in both of unethical and ethical approaches.

Unethical Methods
Unethical methods refer to the techniques in order to decrease the plagiarism rate without any effort to change the original text. These techniques aim to bypass plagiarism detection algorithms and to get a reasonable result of plagiarism on a highly plagiarized context (Gillam, Marinuzzi, & Ioannou, 2010). However, they are not decent and not recommended by any librarian. In this section, these methods are discussed and elaborated to inform authors, lecturers, librarians and students to avoid and to be mindful of these methods. Also to make sure that these techniques have not been applied to the produced manuscripts to be published.
Besides the unethical methods, there is a new concept in publishing and writing industry called "Ghost writing". Ghost writing means the action of getting other people to write on behalf of someone else. Although it is legal and common among politicians and famous people, it is prohibited in the academic world. There are ghost writers who write articles, books, academic assignments, or even university theses on behalf of a researcher for money. Even though using ghost writing service seems legal, easy and common in many areas, but it is considered as an unethical behavior in academia.

Mojibake
Mojibake is a Japanese word means transformation, and here it literally means character transformation. Using Mojibake is only available in multi-layer documents such as PDF. A PDF document has a minimum of two layers. The first layer (top-layer) is visual, and it is visible to the reader and printer. This layer is similar to an image, and its text cannot be rendered. The second layer (bottom-layer) is a textual layer, and it contains the text of the visual layer. When a text is selected, copied or searched; the textual layer is used and it is shown on the visual layer (ISO, 2008).
In PDF documents, it is possible to keep the visual layer, which is image based, and change the textual layer to the non-meaningful expressions. Figure 1 shows a sample of a PDF document which its textual layer is changed to Mojibake (Gillam et al., 2010).  Vol. 7, No. 7;2014 By having such PDF document submitted to plagiarism detection software, the system reads the textual layer and checks its plagiarism rate which would be equal to 0%. This method is useful for the copyright protection in order to prevent PDF documents' texts from being copied and reused in other documents.

Cyrillic Letters
Coincidentally, some of the letters in Latin and Cyrillic look the same, but they are actually different. There are three letters of "a", "o" and "e" which they exist in both Latin and Cyrillic and they all look pretty much the same in both alphabetical systems. These letters have also different values in ASCII and UNICODE for computer systems. It means that they would be recognized as different characters in computer applications.
As an unethical action, the letters of "a", "o" and "e" can be replaced with their similar Cyrillic character in a whole text and those letters with the replaced characters will be invisible and unsearchable for plagiarism detection software (Gillam et al., 2010).

Space Replacement
In a text, every word is separated from the adjacent words with a simple space. In fact, spaces define the words.
If there is no space in a text, it means that there is only one word even with hundreds of characters. This method tries to eliminate the spaces of a text for plagiarism detection software, but on the other hand, make it readable to the humans. If the spaces would be replaced with dots or some Chinese characters and then recolored to white, they will not be visible in rich-text area. But in the plagiarism detection software, it would be considered as only one word as it only processes the plain text and eliminates the rich-text's styles. In the following, there is the main text and its space-replaced version (Patel et al., 2011).
Original Text: "Before Unicode, it was necessary to match text encoding with a font using the same encoding system. Failure to do this produced unreadable gibberish whose specific appearance varied depending on the exact combination of text encoding and font encoding." Text with spaces replaced with a Chinese character: The second text with 39 words is considered as a single word for anti-plagiarism software with the same number of characters as there is no space in the whole text.

Automatic Translation
System Translation (SysTran) is a translation of a text from one language into another by using computer systems (Google, 2011;SysTranSoft.com, 2011). These systems have been improving every day and more users are using such systems. Even though these systems cannot translate the text accurately as a human does, they are very fast and their results are reliable. Using SysTran for an academic manuscript is a double-sided blade, which has ethical and unethical uses.
There are lots of research studies available in other languages, and they have not been translated into English. Using SysTran to translate the research and publish as their own work is considered as an unethical approach.
Because what it has been claimed is not the author's original work.

Synonyms Replacement
Most of word processors are equipped with a dictionary or a thesaurus to suggest synonyms and antonyms for the words. This feature is very useful for the author to replace the similar words in a paragraph with their synonyms. Although this feature helps, it can be used to reduce plagiarism from a copied text. If some words in a copied text would be replaced with their synonyms, the plagiarism detection systems usually fail to detect them. However, the final text has been changed, the original structure of the context is still the same, and it is considered as an unethical method to reduce the plagiarism. 3.1.6 Text Image All of the available plagiarism detection systems only process the text of a manuscript, and they simply ignore the images. If the authors put the text in an image, they would be skipped because the system cannot recognize the text in the images of a manuscript (Patel et al., 2011). This detection is not only difficult for the system, but also having a text image can be quite tough for humans to be recognized manually. For example, a text image in a Word document is pretty clear to be identified, but in a PDF document, it is not easy to be found.

Ethical Methods
Ethical methods focus on the text words and structures to change them properly and to decrease the similarity percentage as much as possible. These techniques should be used and applied with consideration of citing the references correctly and properly in the manuscripts. These methods do not rely on the weaknesses of the plagiarism detection algorithms.

Reading from Many Sources
When a researcher intends to read about a topic, usually there are loads of references to be read. Researchers might get the point by reading only one reference, and reading the other similar articles might look time wasting in some points.
There is a general psychological aspect of reading only one single reference. When a person reads a text, he/she would respond by using the same structure when he/she is questioned. For example, imagine a person who has watched a movie or read a book. Now he is retelling it to the others. He might use the same or very close dialogues of the actors unconsciously to describe what has happened in the movie. Although they do not want to tell the exact words and they have not memorized them; they repeat what they have heard (or read).
But when a person reads the same concept in many resources, his brain would make an imaginative conceptual structure in his mind. While he decides to write, his brain discusses what exists as a structural model of the concept in his brain and not the words and language structures. Reading from many resources will cause the reader to have the concept of context fixed in his mind. However, reading a single text for many times would have the words of context fixed in mind, and not the concept. People usually do this (reading a single text for many times) to memorize a text.

Writing after a Few Days of Reading
It is quite common to write exactly what you have just read. Writing instantly after reading has the benefit of being accurate and not missing any important part, but it also increases the chance of facing with plagiarism in the text. According to the forgetting curve of human memory by Hermann Ebbinghaus, what it has been read will be forgotten a day after by 20% (Ebbinghaus, 1913).  Vol. 7, No. 7;2014 Figure 2 shows the forgetting curve that after a day, 20% of what we have learnt is lost. If we do not review it in 3 days, we have almost forgotten 40%, and in almost a week, it is completely gone. But if after a day, we review it for a few minutes, we can get it back to 100% again. This time the slope of the forgetting curve is less than the previous time. Therefore, after 2 days of the review, we might lose 20% which was possible previously in only 1 day. This chart shows that after 4 times of periodic revision, it would be fixed in our minds.
By using this curve, it can be concluded that it is preferable to write a few days after you have read a reference. In this way, the exact words and language structure will not be in your mind anymore, and your brain will make up the new sentences automatically.

Using Thesaurus/Dictionary
A plagiarized sentence is an expression with the duplicated words. So, replacing the words with synonyms and antonyms can help the authors to decrease the chance and amount of similarity in the context. Using dictionaries and thesauruses are good tools for this purpose. Beside the paper-based dictionaries, almost most of the word processing software is featured with an integrated dictionary to provide word replacements. Using these tools not only decrease the plagiarism rate, but also it beautifies and makes fluent texts by replacing the repeated words in a paragraph.
This technique is not ethical by itself, because replacing the words do not change the sentence structures. But having the other approaches to avoid plagiarism next to this method is ethical.

As Many Proofread as Possible
Many authors would just have their own work submitted for publication just after being finished with writing. Of course, they might be very good in writing or confident in the science, but having the paper proofread as many as possible would also make some minor changes and improvements in the language structure of the paper. These changes would definitely eliminate the plagiarism and similarity rate.
There are three issues that the authors should be aware of them. Firstly, they primarily should get people with the same research discipline to proofread the text. Therefore, they only would modify the context as the meaning remains the same. Asking various people with various research areas can just make amphibologies and opacities in the context.
The second issue is that spreading the paper to many people can increase the risk of research piracy. Some of the colleagues may publish the manuscript before the main author publishes the paper. Therefore, the authors should find the people who are trustworthy and reliable.
The third issue is to remember to review the paper every time after the proof readings. Even if the authors trust the knowledge of colleagues who have proofread the paper, still they should check to see everything has been done correctly. It is very possible to make mistakes or twist the meanings when they edit a manuscript.

Using Machine Translators
Some people find it very tricky to write a sentence with a same meaning in several structures (paraphrase); sometimes they cannot find any other solution rather than copying from the original location. Finding people to paraphrase them is not always possible, or it is time consuming. Using machine translators is a very easy but not a fast solution.
Computer translators use Natural Language Processing (NLP) techniques. NLP is a research area and it is not going to be discussed in this paper. But here we get to know how it works in multi lingual translators. They choose a language (usually English) as a main language, and they translate every other language from and into this language. For example, if a user would like to translate a text from French into German, the system firstly translates French into English, and then English into German. The reason of this action is based on the graph mathematics. Making the translation rules of each language into another one in a multi-lingual translator will make a mesh network of relations of languages and rules, which makes a really huge set of rules. A multi-lingual translator with the support of 10 languages would only need 9 sets of rules, which are between 9 languages and the one as the main language. But making a mesh network of sets for 10 languages is equal to 45 sets of rules as following Equation 1. (1) According to a research in 2011 (Patel et al., 2011), the result of translated text from a western language into another western, back to the source language will not make many changes. So, it is suggested to translate it www.ccsenet.org/ies International Education Studies Vol. 7, No. 7;2014 across the eastern and western languages and at the end back to the source language. For example, consider that there is a text in English. It is suggested to translate it into an eastern language (e.g. Chinese), and then from the eastern language (e.g. Chinese) into a western language (e.g. French), then repeating this procedure by translating from western into eastern languages, and finally back into the original language which is English in this example. The resulted text would be quite different with the original text.
Despite being an easy to use method, there are a few cautions that we should be aware of them. Firstly, the generated language might be grammatically incorrect and it should be edited. Sometimes editing a sentence is easier than paraphrasing the whole sentence or paragraph. Secondly, editing incorrect sentences takes a long time, and it is not as easy and fast as it looks like, but it helps a lot.

Periodic Self-Reading
Once the paper is written, the authors should not leave or submit it. If they read what they have just written, they can only find the grammatical mistakes or if a word is missing in the text. Further failures will not be recovered in that stage. But if the authors read the text a few days later, they can cover more failures such as dim sections, or even having more new sentences written in order to extend the paper. Self-reading the paper periodically can fix the mistakes, improve and increase the quality of the writing. In spite of the benefits of having a periodic self-reading, over-reading a paper can cause obsessions.

Quoting
There are some cases that the authors find themselves inevitable to write the exact words as the original text. For example, it is a famous saying, a scientific abbreviation or a known expression. In these cases, using quotation is a solution. Quoting someone else words is always ignored and not being considered as plagiarism. Quoting can be from a few words to hundreds. This technique is simply putting the text into two quotation marks and referencing the source at the end of the expression. For example, "A person who never made a mistake never tried anything new"-Albert Einstein.
Besides the ethical solution of quoting, opening a quote in the beginning of the manuscript, and closing it at the end is an unethical and ineligible approach.

Realistic & Imaginative Context
A solution (methodology) plays one of the most important roles for the whole journey of the research. The right solution drives the author to the destination, and the wrong one can lead him/her to nowhere. Imagination is the first place where the new ideas are born.
A researcher tries to use his imaginations and to lead them to be realistic, and respect them by writing, feeding, raising, improving, and at the end publishing them.
Although an imagination might be funny and stupid at the first sight, it has the potential of opening a new door toward the problem solving. Writing of imagination is usually and basically plagiarism-free, because they are purely the authors' properties.

Efficiency Evaluation of Plagiarism Prevention Techniques
In this section, the above ethical techniques are evaluated and the results are compared to check how effective they are in decrement of plagiarism and similarity at academic writings. A group of 16 individual students from the National University of Malaysia (Bangi, Malaysia) and Erican Language Centre (Kajang, Malaysia) has been selected. Five texts consist of 400-500 words were chosen from Wikipedia to have a primary 95% -100% similarity index in Turnitin software. In the first stage, each of those above ethical techniques was taught to two students, and they were requested to apply that specific technique on all those 5 texts from Wikipedia. Also the students were allowed to edit the text as long as the meaning is the same. At the end of the first stage, each of those ethical techniques was applied to all 5 texts by two students. And 80 texts were produced from a nearly 100% plagiarized texts (copied from Wikipedia). These 80 texts were checked for plagiarism by Turnitin system.  Figure 3 shows 80 values in percentage resulted by 16 students on 5 different texts. It shows that methods number 2 (Writing after a Few Days of Reading), 5 (Using Machine Translators) and 8 (Realistic and Imaginative Context) are the top 3 methods to avoid plagiarism in a text. To clarify the above result, each of the 2 values of each method in a paper is averaged, and it is illustrated in Figure 4.   In Figures 4 and 5, the effectiveness of each method is illustrated. As it can be shown, the best method to avoid plagiarism went to method 2 (Writing after a Few Days of Reading). We also decided to measure how these techniques might affect on the texts by applying all together.
Then, for the final stage, all 16 students were requested to apply their learnt methods together on the texts again. This time, they were applying their methods collaboratively at the same time on 5 texts. And at the end, we had 5 texts with 8 methods applied on by 16 students. These 5 papers were evaluated to check the similarity index. Table 1 shows the plagiarism rate of each text after applying all 8 methods. The similarity indices of these texts were reduced from nearly 100% to the average of 8.4%, which is a brilliant similarity index by using of 8 methods and different students.

Conclusion
This paper started with the explanations of plagiarism issues and literary piracy in academic manuscripts. It also presented few unethical methods to warn the authors, students and librarians about them. Then it proposed ethical methods to avoid the plagiarism and to decrease the similarity rate in the academic writings. These ethical methods were all evaluated on 5 manuscripts by 16 students and checked by Turnitin. The results show that all those methods can reduce the similarity index significantly. The most effective method was the second approach, "Writing after a Few Days of Reading". The hybrid of all these methods can make an almost plagiarism-free context down to the average of 8.4% of similarity index. This research can be used by researchers to improve the quality and originality of their academic writings.  Vol. 7, No. 7;2014