Critically Question 'Questions on Critical Thinking'

This paper explores whether the open-end questions, which are constructed to develop critical thinking (CT) in China’s English textbook, are effective. All the 405 questions are examined and categorized into non-CT group and CT group, which includes lower-stage CT and higher-stage CT. Findings reveal that most of them fall into non-CT group, therefore ineffective to enhance CT. Some of them are non-text based. Others are unjustifiable. Still others are not hierarchically distributed. Corresponding modifications are then suggested for the improvement of ineffective questions. It enlightens later researchers with the conclusion that effective questions should be text-based, text-justified, integrated with a reader’s own judgment and hierarchically distributed.


Introduction
In pedagogical research, many studies have proposed tools for constructing questions in reading comprehension (RC) for critical thinking (CT) cultivation. For example, Wilson and Smetana (2011) proposed Question as Thinking framework for designing comprehension questions. Degener and Berne (2017) provided tools to construct questions where complex comprehension is expected. Day and Park (2005) argued that teachers can use the six types of comprehension and five forms of questions as taxonomies to make their own comprehension questions. In a word, most previous studies have focused on how to design questions. But it is rarely examined whether the existing questions are effective or not. Let alone how the ineffective questions should be modified.
In pedagogical practice, cultivating CT in RC has long drawn attention of educators all around the world. China is no exception. The latest published College English textbook, New Standard College English (NSCE), has integrated Questions for Developing Critical Thinking (QDCT) into its after-text exercise. The purpose of this paper is to explore the effectiveness and modification of QDCT in NSCE textbooks in the context of College English reading in China. Instead of criticizing China's college English education, it acknowledges the very real existence of problems and tries to work out solutions. It begins from working definitions of CT and RC. Then a tentative theoretical framework is established for the categorization of the QDCT. What follows is a close analysis of the effectiveness of these QDCT. And finally, modifications are suggested on how to make ineffective QDCT effective.
Bloom had a more detailed and systematic taxonomy of CT as early as 1956, which has always been regarded as the cornerstone of CT study in cognitive domain. He suggested six CT skills from lower stages to higher stages: knowledge, comprehension, application, analysis, synthesis and evaluation. According to Bloom, knowledge means "the recall of specific and isolable bits of information from which more complex and abstract forms are built." (p.186). It is the beginning of the cognitive continuum (p.49), but it is not CT. Comprehension is the understanding or apprehension of the knowledge. It represents the lowest stage of CT (p.190). Application is the skill to apply the knowledge that one comprehends to new situations (p.50). Analysis is to clarify the knowledge, to indicate how it is organized and how it manages to convey its effects (p.191). Synthesis is putting together elements and parts of knowledge so as to form a whole (p.192). Evaluation is "judgments about the value of knowledge and methods for given purpose" (p.50).
One more perspective for classifying the CT skills is well worth mentioning. It is provided by the 46 experts from philosophy, social sciences and education in Delphi report. CT skills were identified as "interpretation, analysis, evaluation, inference, explanation, and self-regulation", each of which was expanded into several sub-skills (Facione 1990).
There is something in common in the two CT models. They both took CT as continuous skills, ranging from the lower to the higher stage, with each stage resting upon the previous one. They also provide complementary perspectives for reading comprehension. For example, while knowledge is the first skill in Bloom's taxonomy, it is not included in the CT skills defined by the Delphi report. There are also overlaps between the two models. One example is that they both took analysis and evaluation into CT skills. In both models, CT are identified as a set of skills hierarchically and logically levelled, with CT skills of lower stages serving as the base for those of higher stages (Bloom 1956;Kauchak & Eggen 1998). In Bloom's category, though knowledge itself is not CT, all the five stages of CT are based on it. Without knowledge, CT of all other stages would be impossible. So is in Delphi report, with analysis as the beginning of CT.

Literatures on RC
Definitions and the taxonomy of RC are reviewed here.
Like CT, RC has also been defined in various ways. In one way, it is defined as the ability to process the text, understand its meaning, and to integrate with what the reader already knows (William, 2009). In another, it is defined as "the meaning constructed as a result of the complex and interactive processes relating a reader's CT, prior knowledge and inference-making." (Aloqaili, 2012) It can be seen that, first, RC is based on understanding. Second, it requires readers to further process about the factual information of the text with his opinions and previous knowledge involved. So to cultivate CT in RC means to have the reader making his own judgement which is based on and justified by the text. Understanding is comprehension. And further processing is analysis, synthesis, evaluation and application. In other words, the reader first understands the factual information in the text. Then he analyzes and synthesizes it and accesses the information that is untold but implied. Finally, he evaluates the worthiness of the information told or untold and applies it to a new situation.
Like CT, RC is also hierarchically categorized. Initially put forward by Herber (1970) and further developed by Vacca and Vacca (1999), the three proposed levels of RC are: literal, interpretative/inferential, and applied/evaluative. Literal reading occurs "on the lines" (Grey, 1960, quoted by Cassany, 2006 and is the lowest level in RC on which the other two levels of reading are based. At this level, readers comprehend factual information in verbatim text. It involves nothing more than repeating and recalling skills. As a result, literal reading can do little in developing readers' CT. Interpretive/inferential comprehension is related more to imply what is communicated than what is actually said. It refers to the ability to read "between the lines", including inferring the implied meanings of sentences and paragraphs in the text, interpreting the author's intended meaning, distinguishing between central and peripheral information, between facts and opinions, and clarifying the author's purpose and tone, identifying relationships between paragraphs and between each paragraph and the whole text and so forth. In a word, the reader at this level is able to go beyond the text to infer other details which is not directly told in the text (Westwood, 2008 p.32). Interpretive/inferential reading is grounded on but deeper than literal reading. It is of some help to CT cultivation. The last also the highest level is applied/evaluative comprehension. It is reading "beyond the lines". It requires readers to make a judgment or evaluation with justification from what is "on the lines" or "between the lines", or to apply on-the-lines or between-the-lines information to a new situation. Applied/evaluative comprehension is of essential importance in the cultivation of higher-stage CT. Like Bloom's CT taxonomy, the three levels of RC are also hierarchically related. Literal level does not reach the height of CT, though, the other two levels are based on it. Without literal reading, the other two readings of higher levels would be impossible. Readers first comprehend what is literally stated. Then they need to interpret or infer what is communicated. Only after these two steps can they make and justify their own judgment or evaluation about the text. It is also true with application. Only after literal and inferential comprehension can readers apply ideas in the text to different situations. That is to say, "inferred meanings are somehow deeper than literal meanings, and that a critical understanding of a text is more highly valued than a mere literal understanding" in CT cultivation (Alderson, 2005 p.8).
The roots of the Three-level comprehension skills might have stemmed from Bloom's Taxonomy (Chu, 2017). They are similar in that, as is shown in the previous part, both continuums are hierarchically arranged from lower to higher stage. With Bloom, the CT continuum is ranging from knowledge, comprehension, application, analysis, and synthesis to evaluation. Similarly, Herber's RC continuum is from literal to inferential and evaluative levels.
These literatures paved the way for the coming working definitions of CT and RC.

Working Definitions of CT & RC.
The existing CT definitions and taxonomies are all very general. While Bloom's taxonomy is education-oriented, the Delphi report was associated with philosophy and was applied to a broad range of educational, personal and civic subjects and issues (Facione, 1990). Neither of them is ready-made to assess the effectiveness of open-end QDCT, let alone modify them. So there should be clear definitions of CT and RC before the study in the later part of this paper.
Enlightened by and boiling down the previously reviewed CT literatures, this paper defines CT as hierarchically distributed skills, starting from comprehending the ideas in the text, proceeding to analyzing and synthesizing them, finally reaching the stages of evaluating them with text-based justification and applying them to new contexts.
Oriented on CT, RC is hence defined as a hierarchical reading process, starting from understanding/comprehending the literal meanings which are on the lines of a text, proceeding to analyze and synthesize them for the inferred meanings between the lines, aiming at evaluating them with justification which is either on the lines or between the lines and applying them in new contexts beyond the lines.

Tentative Theoretical Framework
Inspired by general CT models and RC levels, a tentative framework on RC-specific CT (see Table 1) is constructed. It covers the three stages of CT. In different stages, different CT skills and RC methods are employed on different RC levels. Like general CT skills, RC-specific skills are also inseparably and hierarchically distributed. So higher-stage CT is based on lower-stage CT. In the employment of higher-stage CT skills, lower-stage CT skills must be involved. For example, in order to make evaluation on a text, knowledge of the literal meaning on the lines is far from sufficient. Readers must have a good comprehension and interpretation of the text. Then they must make some analysis and synthesis to acquire the inferential meanings between the lines. In a word, higher-stage CT cannot be achieved without lower-stage CT.

Present Study
This part first introduces the research context. Then research method is presented, which includes the research questions and how researchers categorize the QDCT.

General Introduction on the Research Context
RC in high schools in China has long been taken as training students how to get high scores in the nationwide Gaokao, which literally means entrance exam to the university. After they are admitted in universities, they are still faced with two more nationwide English proficiency tests: College English Test (CET) Band Four & Band Six. In all exams, RC is decisive in their scores, which somewhat reduced RC to a guessing game. This has led Chinese students to an embarrassing situation. According to the survey conducted by the Denmark International Education in 2012, the average SAT (Scholastic Assessment Test) score of Chinese students was approximately 300 points lower than their American counterparts, with Chinese students scoring 1213 while American students scoring 1509. The disparity in achievement mainly lies in the reading and writing sections, which intend to examine the students' CT ability. Though it is partially responsible that Chinese students are not native speakers of English, the result shows that Chinese students are somehow lagging behind their American counterparts in CT.
A silver lining is that Chinese educators have come to an agreement that it is high time to enhance college students' CT. One of the immediate actions is to integrate QDCT into NSCE, which is the textbook of College English---a compulsory course in the four semesters of freshman and sophomore years for all non-English majors in China. QDCT is an exercise that is made up of four or five open-end questions following each text. There are totally 405 QDCT (see Table 2). It goes without saying that in a textbook-based and question-guiding class, poorly constructed questions can hardly cultivate students' CT. It is of major significance whether these QDCT are effective or not. Our present study focuses on whether the 405 QDCT are effective and how those ineffective ones could be modified.

Research questions.
The present study makes an attempt to answer the two questions.
(1) Are all QDCT effective for CT cultivation? Why or why not?
(2) If not all QDCT are effective, how could those ineffective ones be modified?

Research process.
We three researchers first gathered all the 405 QDCT together. Then we categorized them respectively consulting the theoretical framework. The Inter-rater agreement was calculated by comparing item categorizations among all the three of us. Then we calculated the percentage of agreement. It turned out that we agreed on the categorizations of as many as 392 QDCT. The overall level of agreement was 96.8%. When disagreement arose, we first had a discussion attempting to reach an agreement. When we failed, we took "the minority is subordinate to the majority" principle. For example, if one researcher categorized a question into the synthesis group while the other two categorized it into the analysis group, we took the majority's opinion and put this disputable question into the analysis group. But seldom was this the case, just 3.2% in total.
Our initial attempt was to categorize these questions following the tentative theoretical framework. But unexpectedly, we all found as many as more than 300 questions could not be put into any of the six categories. So we had to create additional appropriate categories. We picked out those uncategorizable questions and found that all of them were non-CT questions, but for different reasons. Besides literal questions, some of them are not based on the text. Others are not justified by the text. So we added, non-text-based and unjustifiable questions besides literal questions in the non-CT group. Hence the QDCT of non-CT group fall into three subcategories: literal, unjustifiable and non-text-based (see Table 3). Non-text-based questions go even further than unjustifiable questions. They are totally irrelevant of the text and mere readers' opinions and previous knowledge from experiences are required. They can be answered without even reading the text. And it is more like coincidence of association than reflective judgments to answer these questions. All the four NSCE textbooks are ridden with such questions. For example, How important is it to be ambitious in life? (Book 3, Unit 1, Passage A: Catching Crabs). What features of your personal home, and your home, the planet Earth, do you feel are important? (Book 4, Unit 7, Passage B: Home Thoughts)

Discussion
This part answers the two research questions. Besides calculating the number and percent of each QDCT category, it also analyzes why those ineffective QDCT are ineffective and further suggests modifications for them.

The Answer to the First Research Question
The answer to the first research question, "Are all QDCT effective for CT cultivation? Why or why not?", is well shown in Table 3. As is clearly seen, the CT-category questions add up to only 76 items, accounting for just 18.77%. Among them, 3.95% QDCT are on the comprehension level, while 4.69% on the application level, 3.95% on the analysis level, 1.23% on the synthesis level and 4.94% on the evaluation level. For these QDCT can be of some help to CT cultivation, they can remain intact.
In contrast, 329 QDCT fall into the non-CT category, covering 81.23% of all the 405 QDCT. They are further categorized into literal, non-text-based and unjustifiable QDCT. Doubtlessly speaking, most of the widely used QDCT are overwhelmingly ineffective. We will then cast a glance at why they are ineffective.
Literal QDCT are lopsided with too much emphasis on recognizing, repeating and remembering factual information on the lines while CT skills are rarely involved. Unjustifiable QDCT encourage readers to give a set of empty opinions without support from the text. Failure to identify the text-based justification destroys the student's opportunity to evaluate the argument, because we cannot determine the worth of an opinion until we identify the reasons (Browne & Stuart, 2005). Non-text-based questions encourage readers to make their own judgment though, processing the information in the text is made unnecessary, since the answers to these QDCT are irrelevant of the text.
Resultingly, CT cultivation would hence become almost impossible if students are guided by such QDCT in RC.
What is worse, students may be led astray to fragmented thinking instead of wholistic and systematic CT. And they may form a wrong conception that CT is nothing more than memorizing literal information, or resorting to elt.ccsenet.org Vol. 13, No. 6; previous knowledge, or fabricating answers randomly just to make a go in the class. Therefore, immediate modifications are required for the three types of non-CT questions.
The following paragraphs answer the second research question, focusing on the modifications of QDCT falling into non-CT category.

The Answer to the Second Research Question
The second research question concerns modifications of ineffective QDCT. It will be answered with examples after a general modification principle is elaborated. Following the theoretical framework, effective QDCT must be based on but not limit to the literal information of the text, since CT in reading is not just a process in which readers roughly scan the text, passively and mechanically memorize and accept the ideas in the text, but an analysis, integration and evaluation of the author's views, tendencies and assumptions (Fan, 2008). Therefore, QDCT should be constructed with reader's opinions involved to analyze, synthesize, evaluate the text-based information and apply it to a new context.
And also, QDCT should be hierarchically arranged. Researches on reading support a fairly firm boundary between lower-level and higher-level CT in RC (Landi, 2010). Lower-stage CT is regularly used to locate, understand, and recall explicit information in the text. And higher-stage CT is called for when reading tasks become increasingly demanding, for example, when it involves complex inferencing, evaluating the information in the text, or applying the meaning to a new context. But higher-stage CT should be based on such lower-stage CT as the comprehension of the text.
To put it together, well-designed QDCT in RC should be able to integrate readers' own views with the on-the-lines or between-the-lines information of the text, with the latter as justification of the former. Inadequate emphasis on either of them spells a missed opportunity for readers to strike a balance between their own text-based opinion and the on-the-lines or between-the-lines information of the text, which is indispensable in the cultivation of CT. The following part illustrates with examples how to modify the QDCT in the three non-CT subcategories.

Literal questions.
Of the three non-CT subcategories, literal questions total eight and percent 1.98% of all the QDCT. Here are two examples.
Example 1: "What do you think makes a story newsworthy?" (Book 2, Unit 4, Passage A: Making the Headlines).
Example 1 is taken from a passage which tells what makes news newsworthy. The answer to it is shown crystal clear from on-the-lines information that a newsworthy story must be odd, unexpected, of human nature and immediacy.
Example 2: "According to the writer, what has changed in our society within the last generation?" (Book 3, Unit 4, Passage A: Work in Corporate America).
The text, which Example 2 is from, tells that in today's America, our society has changed much. It does not fix or produce anything anymore. Instead, most fathers sit in glass buildings doing what is absolutely incomprehensible to their children. In the past, if a child asked about his father's job, the father might answer "I fix steam engines" or "I make horse collars". But nowadays, a father may answer, "I sell space", or "I do market research", or "I am a data processor" and so on. So it's difficult for a child to figure out what his father actually does. Again, the answer is clearly shown on the lines.
Both questions require nothing more than readers' abilities to recognize literal or factual information from on-the-lines information of the text. With only such skills as skimming and scanning involved, literal questions can hardly help students access text-implicit meanings. They can answer such questions even without engagement in lower-stage CT like comprehension, let alone higher-stage CT like analysis, synthesis, application or evaluation. Though it is easy to develop literal questions and to anticipate students' responses and to direct class discussion, it is the least effective method for enhancing CT.
Therefore, immediate modifications are called for. With only text-based information being asked, literal questions should involve lower-stage or higher-stage CT. The two examples after modification can be: Modified Example 1: "Here is a piece of news from XXX, do you think it is newsworthy? Why or why not?" Modified Example 2: "How do you meet the challenge that has brought about by the changes in our society described in the text?" In this way, the first example is improved from the literal to the application stage, requiring readers not only to elt.ccsenet.org English Language Teaching Vol. 13, No. 6; have a good comprehension of the on-the-lines information about what makes news newsworthy, but to apply it in a new context. The second example is modified from the literal to the synthesis stage. After modification, this question allows readers to synthesize a scenario of the text-based information and to propose solutions accordingly.

Unjustifiable questions.
There are 67 unjustifiable questions, reaching 16.54%. In answering unjustifiable questions, a reader makes a claim or an assertion on the text though, he is not required to justify his answer. Here is an example.
Example 3: "Do you agree with the advice in the passage?" (Book 3, Unit 1, Passage B: We Are All Dying).
Example 3 is from the text which tells that life is short and we never quite know when we will die. So it is advised to do what we want now lest we regret in the future. Example 4 is from This is Sandy. It is an extract from a real story about the life of a deaf girl. Her friends would like to introduce her to strangers. One of the boys whom she has recently made friends with has begun to date with her.
Neither of the 2 examples encourages further processing of the on-the-lines information or text-based justification, such a question just deserves an empty yes or no as its answer. Therefore, text-based justification must be required to guide them toward comprehension and higher-stage CT. Unjustifiable questions can be modified by adding "why or why not according to the text" to require readers' further text-based justification.
Modified Example 3: "Do you agree with the advice in the passage? Why or why not according to the text?" Modified Example 4: "Do Sandy's friends treat her as a 'normal' person? Why or why not according to the text?" In this way, Modified Example 3 requires, besides a yes or no, text-based justification like "Because as is written in the text, we can't predict when we will die, we should live our life fully. That is, we should make full use of our time and opportunities". In the same way, Modified Example 4 requires a reasonable answer made up of a yes or no judgment plus text-based justification with comprehension and analysis involved. It is possibly like this, "yes, because her friends like to introduce her to strangers as if she could hear what they said".
With "why or why not according to the text" added, readers are challenged to identify and comprehend the on-the-lines information of the text, to pick out what can be employed to justify their judgment and to reconstruct it to make their justification logic and better understood. Such justification is never a guessing game, but requires readers' comprehension of the text and engagement of analysis, synthesis and evaluation skills. Obviously, making a justified judgment is more conducive to CT cultivation.

Non-text-based questions.
The whole four NSCE textbooks are fraught with non-text-based QDCT. As is shown in Table 3, there are 254 such questions, covering 67.72% of all QDCT.
The following examples are from the passage (Book 1, Unit 1, Passage B: Extract from Tis: A Memoir) which tells about the story of an Irish immigrant to America. He worked full-time while taking courses part-time at a university in New York. He is faced with many problems, such as a sense of inferiority, a strong Irish accent, failure to follow the professor and lack of courage to ask questions in the class. Four of the five QDCT following this passage are: Example 5: How did you feel when you first started college？ These questions have nothing to do with the text at all. They are ready to accept any plausible responses. With no requirement to adequately explore relevant support from the text, it is unlikely for readers to establish a link between their opinions and the text. Instead, students can concoct answers completely arbitrarily on their previous experiences even without reading the text. Besides, these questions are not hierarchically constructed. This cannot guarantee the continuum of CT development from a lower stage to a higher stage, as is suggested by Bloom. With neither cultivation nor development of CT is guaranteed by these QDCT, modification is a fierce urgency of the moment.
As is known, QDCT should be based on but not limited to what the text communicates, or they may fall into the elt.ccsenet.org Vol. 13, No. 6; literal group. With the text being the most eligible source of the readers' response to QDCT, if they are demanded little reading of it, their meaning making is not challenged no matter how difficult the text is (Degener & Berne, 2017). So QDCT should be built to engage readers in text-based analysis, synthesis, application and evaluation. In other words, the QDCT should encourage readers to process information on the lines and to get access to information between the lines or beyond the lines.
Besides, effective QDCT should also provide assurance that lower-stage CT has occurred prior to higher-stage CT tasks. Lower-stage CT is required when readers decode textual information to understand the literal and implied meanings and respond to the material. While higher-stage CT is required when readers manage constructive and integrative processes to make complex inferences using text information and prior knowledge and parse a text into the idea units to grasp what the text says (Afflerbach et al., 2015). Though higher-stage QDCT benefit more to CT development, they would lose ground without lower-stage CT.
Resultingly, QDCT should require the involvement of both lower-stage and higher-stage CT skills. An exposure to a variety of hierarchically different types of QDCT guarantees a continuous progress in students' CT development. That is to say, some of the QDCT should be asked at the comprehension stage, others at the analysis or the synthesis stage, and still others at the application or the evaluation stage. Modified Example 8: When the professor said "Pilgrims left England to escape religious persecution", did the author agree with him? Why or why not?
Modified Example 5 has reached the comprehension stage. It must be answered when a reader fully understand the author's feelings in the university, which is on the lines of the text. Then he can make a comparison between the author's feelings and his own.
Modified Example 6 involves the analysis skill, while the former part of the question requires a reader's opinion, the latter part "…and how do you know from the text" elicits a text-based justification.
Modified Example 7 consists of a text-based supposition and inference. Readers must comb through the text and analyze and synthesize the teacher and classmates' reaction to what the author had done previously so that they could synthesize what is the possible reaction to him if he asked questions, which is an uncommon occurrence for him.
Example 8 must be answered with the text-based evaluation. Only when a reader understands what the author means by "…pilgrims were the ones who persecuted everyone else, especially the Irish. I'd like to tell the professor how the Irish suffered for centuries under English rule" can he make a judgment about whether the author agrees with the professor or not.
After modification, not only are these QDCT based on the comprehension of the literal information of the text, but they are hierarchically arranged from lower-stage to higher-stage CT.

Conclusion
Not all QDCT are created equal. While all effective QDCT are alike, each ineffective QDCT is ineffective in its own way. Some are literal. Others are unjustifiable. Still others are non-text-based. All of the ineffective QDCT need modification according to the cornerstone principle that effective QDCT should be text-based but not limited to text. Besides, the answer to it should engage readers to an integration of their own views and the text to form text-based and text-justified judgment. Finally, QDCT should be constructed in hierarchy. When later question constructors develop questions to improve readers' CT, implication they should bear in mind is that the questions must be text-based, text-justified, integrated with a reader's own judgment and hierarchically distributed.