Modifying and Translating the Beginning College Survey of Student Engagement for Use in Saudi Arabia: The Difficulty of Validation

Student engagement in the first year of university plays a vital role on retention, learning, and persistence in STEM fields. The Beginning College Survey of Student Engagement (BCSSE) measures student engagement during their first year of university and was constructed and validated for use in the English language. In Saudi Arabia, the first year of university aims to smooth students’ transition from secondary to higher education. However, many students struggle with this transition and experience challenges to adjust. Therefore, a similar questionnaire to the BCSSE is needed to implement an exploratory study that measures student engagement during their first year of university in Saudi Arabia. To do so, some items of the BCSSE were modified and translated into the Arabic language. The process of providing evidence for validity, utilizing a forward-backward translation technique, is outlined here. Initial translation was done by the author. Face validity was obtained using multiple Arabic and English speakers. A total of 71 Saudi students completed the survey, and the internal consistency was tested using a Cronbach’s α -coefﬁcient. Eight Saudi students participated in cognitive interviews to provide additional information regarding validity of the survey items. This paper discusses some of the problems encountered in each stage of validation. The translated survey was revised to a final Arabic version based on students’ suggestions and will be tested for reliability with a future sample.


Introduction
Student engagement in the first year of university is a key contributing factor to student's retention in later university years and also persistence in science majors (Tinto, 1975;Astin, 1984;Seymour and Hewitt, 1997). Kuh (2009) has defined student engagement as "the time and effort students devote to activities that are empirically linked to desired outcomes of college and what institutions do to induce students to participate in these activities." The Beginning College Survey of Student Engagement (BCSSE) is an instrument that was developed by the Indiana University School of Education and has been used widely in the U.S. to measure student engagement during the first year of university. The BCSSE is a self-report questionnaire that measures first-year students' experiences and their expectations along nine engagement scale categories including academic preparation, academic perseverance, academic difficulty, academic help-seeking, collaborative learning, student-faculty interaction, hours studying and working, campus support, and predicted graduation from their current institution. The BCSSE contains 34 question stems with a total of 79 items asking about high school experiences, expected first year experiences, and other demographics information. Cole and Dong (2013) examined the psychometric properties of the BCSSE engagement scales using over 70,000 student records from 120 institutions across the United States. The Cronbach's alpha values for the engagement scales ranged from 0.63 to 0.92. The BCSSE engagement scales were considered appropriate and trusted to be used in measuring incoming first-year students engagement behaviors (Cole & Dong, 2013). Data from the BCSSE is currently used by universities in many ways including academic advising, retention efforts, first-year program design and evaluation, accreditation self-studies, and faculty and staff development (NSSE -National Survey of Student Engagement, 2020). For instance, at the University of South Florida, an institution participating in BCSSE assessment, the percent of the first-year retention increased from 86% to 91% because of changes made in part from results of the survey. The University of South Florida used BCSSE data to identify students at risk of leaving college and started intervention efforts in the first few weeks of classes (Bombaugh & Cole, 2019).
In Saudi Arabia, the preparatory year is a mandatory program for first-year students to pursue higher education in Saudi universities. The preparatory year is one full academic year that aims to prepare students for independence and academic life through providing them basic courses that bridge the gap between secondary education and higher education (Ministry of Education, 2015;Kamel, 2015). The preparation courses include mathematics, English proficiency, communication, and statistics. Additionally, for students who aim to pursue their higher education in STEM fields, they need to also complete introductory science courses in chemistry, physics, and biology. The preparatory year also enhances the development of important skills such as communication and collaboration. Also, it involves students in academic challenges and scientific practices (Ministry of Education, 2015;Khalil, 2010). Despite the numerous roles of the preparatory year to smooth the transition from secondary to higher education, many students struggle with this transition which causes them to leave school after the first year of university or in other cases change their intended majors away from the sciences (Khoshaim, 2017;Khoshaim et al, 2018). A survey questionnaire is a research tool that provides an objective means of collecting self-reported data about peoples' beliefs, attitudes, opinions, or behavior. Ensuring the validity of the survey is an important early step before conducting research using that survey. The validation processes are necessary for developing a new survey as well as when a previously validated survey needs to be used in another language. Validity is concerned with the accuracy of the survey. It is about making sure the items truly measure what they are supposed to measure. There are several lines of evidence that can be used to validate a survey. According to Oluwatayo (2012), face validity involves having the survey reviewed by experts. The expert reviewers evaluate the appropriateness of the survey to the subject matter as well as to assess the formatting and the clarity of the language used. In addition to the expert reviews, validity evidence extends to include internal structure and response processes, according to the joint committee on standards for educational and psychological testing of the American Educational Research Association [AERA], the American Psychological Association [APA], and the National Council on Measurement in Education [NCME], (2014). Internal validity evidence aims to ensure questions are homogeneous and measure a single construct. Validity evidence based on internal structure can be measured through internal consistency, which tests the correlations among questions on the same scale by using Cronbach's α -coefficient. In additional to the internal structure, response process is another type of validity evidence. Cognitive interviews can be used to obtain validity evidence based on response processes. Cognitive interviews are intended to evaluate participants' understanding of each item in the survey instrument, and to they ensure that the individuals responded to the questions in the way intended (Willis, 2005). Also, cognitive interviewing provides essential feedback regarding the wording or language in the survey as well as identifying any difficulties that cannot be observed through statistical methods (Peterson et al., 2017). Student engagement has been a focus of higher education researchers for its importance in students' retention and persistence at colleges and universities. There is no instrument similar to the BCSSE found in Arabic to measure student engagement. Therefore, there is a need to implement an exploratory study to measure student engagement during the preparatory year in Saudi universities. This study aims to • Translate the BCSSE into Arabic and culturally adapt it following a forward-backward procedure.
• Pilot the translated version of the instrument and assessing its internal consistency using standard statistical analysis. • Assess response process validity using qualitative analysis. • Create an Arabic version of the survey suitable for final validation. Results of this study were intended to determine if it is appropriate to use the translated survey in Saudi Arabia to get a better understanding of the factors that impact student retention in the sciences. However, I faced some specific problems with the translation and validation of the survey between two very different culture and language. Those insights will also be discussed in this paper.

Method
The goal of this study is adapting the Beginning College Survey of Student Engagement (BCSSE) instrument into Arabic, as well as creating a valid Arabic version to be used in future research. To reach the goal of this study, three lines of evidence for validation have been considered. 1) The cultural adaptation of the survey. The adaptation process includes modification and translation of the survey items followed by a recommended translation technique -forward-backward translation. The translators who helped in translating the survey items were Saudi graduate students. This portion was used to provide evidence of face validity. 2) The second step is establishing evidence of validity of the translated survey through statistical methods. A pilot test of the translated survey version was implemented, and quantitative data was collected through the survey to test internal consistency. The survey was completed by 71 female Saudi students who were experiencing the preparatory year during Fall 2019 in a university in Saudi Arabia. 3) The third step is providing validity evidence based on response processes. Qualitative data was gathered through cognitive interviews. Eight female Saudi students participated in the interviews during two rounds. In the first round and during Fall 2019, three students were from the original group of students who completed the survey, and one student had just finished the preparatory year and moved into her second year of university. In the second round, during Spring 2020, four female students were experiencing the second semester of the preparatory year. Because of the nature of Saudi Arabian culture and gender separation in the university, only female students were invited to this study. No meaning was drawn from students' responses to the survey questions. The intent was only to find evidence to validate the instrument, not to draw conclusions about participants' experiences.

Results and Discussion
Step 1: Instrument translation using a forward-backward procedure Adapting an existing instrument rather than developing a new one has several advantages. For example, the cost and time that is consumed in creating a new instrument is avoided. Also, the psychometric properties of the instrument were already provided and the items scales were already considered adequate for use to represent the constructs for assessing first-year students' expectations and experiences, in this case. Figure 1 shows the model of cultural adaptation process that was performed in this study. This model was suggested by Ali (2016), and it was modified to fit the purpose of this study. Beginning the adaption process, of the original 34 questions on the BCSSE survey, ten questions were chosen by the author and reviewers to be translated into Arabic. Those ten questions were chosen because they focus on expected first year experiences. The other 24 questions were not chosen because they were either not appropriate for Saudi culture or did not fit the goal of this study. The first 12 questions were related to high school experiences, not to high education level, so those questions were excluded. The two following questions aim to measure hours studying and working, and those were also excluded for its complexity, as suggested by the reviewer team. The last other ten questions were excluded because they gather demographics information not pertinent to the Saudi culture, such as tuition costs of universities and ethnicity classifications. As a part of the evidence for face validity, the selected items were reviewed by a professor in science education as well as by three Ph.D. students who are originally from Saudi Arabia to ensure these selected items fit Saudi education and culture. The selected items were sent to BCSSE at Indiana University to get permission for use 1 . After getting the permission agreement, a forward-backward translation technique was followed. The forward-backward translation technique is recommended by several studies (Harkness et al., 2003;Cha et al., 2007;Ali, 2016). The forward procedure aims to modify and translate the original version of the items from English into Arabic by two translators who work independently. Both translators were fluent in Arabic and English. A review meeting between the two translators was held to discuss and resolve the inconsistences between their two versions and make one agreed Arabic version. In the backward procedure, the Arabic version was translated into an English version by another two independent translators who also are fluent in Arabic and English. A review meeting was held to check the final back translation version. After completing forward-backward translation steps, an expert in the educational field, who is a native English speaker, checked the equivalence between the original English version and its back translation. When meaning equivalence was ensured, and ensuring adequate evidence for face validation, the Arabic version of BCSSE was named as M-BCSSE-A (Modified BCSSE Arabic), and then piloted.

Modified BCSSE Arabic version (M-BCSSE-Arabic)
As a result of the forward-backward translation technique, a total of 11 categories with 42 items are found on the Modified BCSSE Arabic version. The M-BCSSE-Arabic version includes nine engagement scale categories: academic perseverance, collaborative learning, student-faculty interaction, academic difficulty, academic helpseeking, academic preparation, importance of campus support, active learning, and predicted graduation from the current institution. Each question on the survey reflects one of the nine engagement scale categories, and may contain four to seven items. For example, question one on the survey is measuring academic perseverance, and has six question items. Most of the questions involve a Likert response. Questions 5, 10, and 11 are in a multiple-choice format, and no statistics were run on those questions.  Vol.12, No.17, 2021 Step 2: Pilot testing and internal consistency Quantitative data was collected to test internal consistency. A total of 71 Saudi students completed the M-BCSSE-Arabic survey on paper. The data was entered manually into IBM SPSS Statistics 25 and analyzed using its statistics. Internal consistency was performed to test the correlations between different items on the same scale by using Cronbach's α -coefficient. The internal consistency provides evidence for structural or internal validity in terms of consistency of scores across a set of items. Table 2 shows results of the internal consistency tests for the M-BCSSE-Arabic version. Cronbach's Alphas for the students completing the survey ranged from α = 0.549 (lowest) to 0.782 (highest). According to the guidelines by (DeVellis, 1991), a Cronbach's alpha between .80 and .90 was considered very good, between .70 and .79 was respectable, between .60 and .69 was acceptable, and lower than .60 was unacceptable. Based on the guidelines, the two low values 0.549 and 0.58 are below the acceptable level, whereas the other values fall in the acceptable to respectable range. The academic difficulty scale had one of the low internal consistencies (α = 0.58). However, the internal consistency analysis of this scale showed an increase in the alpha value to be 0.66 if item four (Making New Friends) was deleted. As such, removal of this item was considered to reach acceptable levels of internal consistency. Academic helpseeking is the other scale that had low internal consistency (α = 0.54). The internal consistency analysis of this scale indicated that all 6 items are mostly correlated and appeared to be worthy of retention. The low internal consistency value of this scale may relate to the problem with the wording of some items, as suggested by interviewees discussed below.

The results of internal consistency for the M-BCSSE-Arabic version
Step 3: Cognitive interviewing and response processes validity In this step, cognitive interviews were employed to assess validity evidence on the basis of response process. This type of evidence was used to identify sources of confusion in the M-BCSSE-Arabic survey as well as to verify that respondents understood the questions as intended. Also, this type of evidence was useful to identify problems that could not be identified through statistical methods.
The sample size for cognitive interviews is typically low. For example, Willis (2005) recommended a sample size ranged from n= 5 to 15. In this study, eight students participated in the cognitive interviews during two rounds, four students in each round. The sample size in each round was determined based on the saturation point where the same comments were heard repeatedly.
In the first round, a total of four Saudi students participated in the interviews. The cognitive interviews were administered in a face-to-face manner. The cognitive interviews were conducted on the basis of think aloud and verbal probing. Students were asked to verbalize their thoughts as they answered the survey questions, and students were asked some probing questions to evoke detailed information relevant to the questions. Table 3 shows the questions used in the interviews. The interviews were recorded and then transcribed. I read the transcripts and wrote a report summarizing students' thought processes, understanding of the questions, and their comments. Based on students' suggestions on modifying, deleting, or adding some items on the survey, the M-BCSSE-Arabic survey was revised to a final version.
In the second round of the cognitive interviews, another four Saudi students participated via video chat interviews. The revised version of the M-BCSSE-Arabic survey was sent to students to be assessed. I discussed each question in the survey with students. Students were asked to provide any comments or thoughts regarding the survey items or the language used, at the same time the author wrote notes for any useful information gained. These interviews were done as a last step to ensure the validity of the M-BCSSE-Arabic survey to be used in a future study.
Journal of Education and Practice www.iiste.org ISSN 2222-1735 (Paper) ISSN 2222-288X (Online) Vol.12, No.17, 2021 Interview Questions 1. How would you respond to this question? And why would you respond that way? 2. What do you think we want to know by asking this question? 3. What parts of the question are confusing or not appropriate? 4. Which of these responses seem like they may not really apply? Are there responses that should be there, but are not? 5. Was the scale we used appropriate? Does it make sense? 6. Is there anything about this question that you would change to make it better or more understandable? Table 3. The questions asked in the cognitive interview Round one: Results from the cognitive interviews with four Saudi students showed that students generally interpreted survey items as intended. Students were able to read, understand, and answer the survey questions adequately. Students seemed to know what the survey questions were asking about. For example, students thought about teamwork, study groups, and cooperating with friends within the questions in the collaborative learning scale. Thoughts such as caring about school and overcoming difficulties were discussed in the academic perseverance scale. Also, students mentioned relationships and communication with teachers in the scale for students-faculty interaction.
Most students mentioned that the words "often, and very often" in the four-point Likert scales were confusing. Student suggested changing or deleting the choice "very often." For this reason, I changed the fourpoint Likert scale from "never, sometimes, often, and very often" to be "never, sometimes, often, and always." The questions with six-point Likert scales took a longer time to think about and answer. Students spent a significant amount of time trying to distinguish between the numbers in the scales. Students suggested reducing the point scale from six to five or four, so questions would be easier to process and respond to. Also, they suggested clarifying the meaning of each number in the scale. To address this problem, I have changed the sixpoint Likert scales to be five-point Likert scales in all the survey questions. Also, each number in the scale was labeled to have a specific meaning. Figures 2 and 3 show an example of one of the questions in the survey before and after revision. In looking at confusing or inappropriate questions, students suggested ways to reword some items that were included in the scales of: academic perseverance, students-faculty interaction, academic help-seeking, academic difficulty, and campus support. Also, students mentioned that some response items in the academic help-seeking scale did not really apply to them. For example, the word "tutoring" means private lessons which is considered Journal of Education and Practice www.iiste.org ISSN 2222-1735 (Paper) ISSN 2222-288X (Online) Vol.12, No.17, 2021 as not a free service offered by the Saudi schools. Therefore, I revised the offered learning services according to their availabilities for students. Also, students found the word "offices" confusing and did not know what it exactly meant, so I delated it, and, instead, I used the phrase "private lessons or other persons." Students also suggested using the Arabic term for "academic guide" instead of the more directly translated "academic advisor." Moreover, students suggested adding "Internet" as a response item in the academic help-seeking scale. Using the information obtained from students, I can possibly explain the low Alpha value in the internal consistency test for the academic help-seeking scale. Students appeared to have difficulty answering items in this scale. Their difficulties may affect the Alpha value, which was used to test the correlation between items in that scale. I therefore revised the items to better reflect students' comments. Figures 4 and 5 show the question intended to measure academic help-seeking, before and after revision. Figure 4. The academic help-seeking response items before revision Figure 5. The academic help-seeking response items after revision Round two: After the M-BCSSE-Arabic survey revised to a final version based on students' suggestions, four Saudi students participated in cognitive interview to test the revised version. I found that students were able to understand the survey questions as intended. Students thought the language of the survey was clear and appropriate. Also, students mentioned that the items in the survey mostly did apply for them. In the academic help-seeking scale, students appeared to agree with the new items as offered. Additionally, students found the new scales that used in the survey to be understandable. One student asked about the number labeling of the scales (1=very easy, 2=easy, 3= not sure, etc.), and if the numbers have specific meaning. This point was reviewed and discussed with an expert in education, and the numbers were removed in the applicable questions of the survey.
The final version of the M-BCSSE-Arabic survey is attached in the appendices. Appendix A shows the final Arabic version of the instrument, and appendix B shows the survey in English.
Journal of Education and Practice www.iiste.org ISSN 2222-1735 (Paper) ISSN 2222-288X (Online) Vol.12, No.17, 2021 The Difficulty of Validation There were some difficulties encountered in obtaining evidence for validity of the M-BCSSE-Arabic. During the adaptation process of the survey, there was some disagreements between the reviewers regarding the relevance of some items in the questionnaire to the Saudi students. For example, there was disagreement about the learning support services that are offered by Saudi Universities such as writing centers, tutoring, and using technology. Also, in the original translation of the survey and during forward translation, there was some inconsistencies between the two translators' versions, which took a long time to discuss and resolve. In addition, the Cronbach's alpha values of the internal consistency appeared all low to some acceptable, but this test was used as an estimation for correlation and I did not expect students to use all resources in equal amounts. The resources were used in the original English survey, so I used them in the translated survey which was administrated in another context and country. Also, I did not use the entire instrument and I only used a subset of items which may affect how the instrument performs statistically. However, the cognitive interview was actually more valuable than the quantitative analysis. Saudi students provided useful comments and suggestions that informed me of the actual problems in the survey, that may have led to low statistical results. For instance, the low Cronbach's value of the academic help-seeking scale gave me no indication of what the problem was, but the interviews did, and I was able to make appropriate changes to it.
Additionally, a problem was realized between the Arabic written formal and the oral informal talk. When the Arabic survey was written in a formal way by the translators, Saudi students had some difficulties understanding and distinguishing between items. However, when the oral informal talk was used in the interviews to clarify meanings, students had a better understanding of what was intended and also gave suggestions on changing some words to reflect their level of language use. This problem appeared in Arabic, but is not really a problem for English language users.

Conclusion
A valid survey in its original form will not necessarily be valid when it is translated into another language. The validity of a translated survey should be assessed again, or the translated survey may not really measure what it intends to measure. Validating a translated survey is not a simple task. Face validity and expert reviews are not enough to ensure the questions will be relevant to the individuals taking the survey. In addition, it is very important to have more than one person involved in both forward and backward translation, as there were several instances in which disagreements arose in how to translate something, and agreement needed to be reached. Internal consistency was also not the best measure of validity in this case. While a Cronbach's alpha could inform me that consistency was low between individual items, the quantitative results did not tell me about the problems causing the low alpha values. On the other hand, validity evidence based on the response process made a valuable contribution to the validation process in this paper. Cognitive interviews were a helpful method to inform item revision decisions and improve the survey scales. Therefore, using cognitive interviews is recommended for future research that aims to validate a translated survey. A few numbers of cognitive interviews were useful to identify problems with questions. When the survey was heavily modified after the initial cognitive interviews, a second round of cognitive interviews was recommended to assess the revised survey. The revised version has been made in expectation to test the reliability of the M-BCSSE-Arabic survey using a test/re-test format, and then for it to be ready for use in a larger scale study.