An Exploratory Study of the Three Phases Analysis of Factor

In this study, we examine factor analysis as a multivariate statistical tool, starting from the origin of factor analysis with regards to Spearman’s approach of 1904 to the three phases of factor analysis. This is done with a view of determining the similarities and individual contributions of each of the three phases of factor analysis. This was achieved by examining the algorithms used in parameter estimations of the three phases of factor analysis. By inputting data into the algorithms and examining their outcomes and proffering recommendations based on the respective findings.


Introduction
Factor analysis is a statistical technique used to describe the inconsistency among observed, correlated variables in relation to lower number of unobserved variables called factors. It quests for combined variations in reaction to unobserved latent variables. Onyeagu (2003) describes factor analysis as a multivariate statistical technique which aims to describe, if possible, the covariance relationship among variables in terms of a few underlying but unobservable random quantities called factors. In other words it can be described as a multivariate statistical method which expresses "p" observed variables "z" in terms of "q" latent variables "f". Spearman (1904) published a capital idea. His celebrated article was titled "General Intelligence, Objectivity, Determined and Measured." This marked the beginning of the quantitative investigation of latent variables. He thought originally that all inter correlation among mental tests could be explained by assuming one general factor, along with a unique factor in each test. This he called the theory of two factors. Later this was modified to include group factors Spearman (1927). Spearman's theory was the conceptualization of the nature of a common factor -the element in common to two or more indicators (preferably three or more). Again, the theory highlighted the presence of two classes of factors; General (with one member) and Specific (with a potentially infinite number). It also stressed on the evaluation of empirical evidence on the tetrad difference criterion (i.e. on patterns in correlations among manifest variables) with no consideration of diagonal elements.

.
(1) where "G" is the general factor and "i" and "j" are the specific variables. This theory was later developed extensively by Thurstone (1935Thurstone ( , 1947. This is the most familiar multivariate procedure used in the behavioural sciences. This field of study was initially hampered by lack of adequate statistical Malenie (2004) defined factor analysis as a statistical model which expresses p observed variables z in terms of q unobservable latent variables f, where . As time went on, another statistical model was developed for factor analysis. This time in the form of a linear model given in (Onyeagu 2003) as … ∈ (2) where 1,2, … . and 1,2, … , where where the p is a set observable random variables , , , … , with means , , … , , and some unknown constants with k unobserved random variables where the ∈ ′ are independently distributed error terms with zero mean and infinite variance which may not be the same for all ,thus we let ∈ , ∈ 0 (3) In matrix terms, we have ∈ (4) If we have n observations, then we will have the dimensions , and . Each column of X and F denote the values for one particular observations, and matrix L does not vary across observations. Also we note the following assumptions on F I. F and ∈ are independent II. 0 III.
(to make sure that factors are uncorrelated) Any solution of the chosen set of equations following the constraints for F is defined as the factor, and L as the loading matrix.
Factor analysis was later to be developed through the theory of latent structure analysis as proposed by Lazarfeld (1963) which gave the model as | . This was described by Bishop(2009) as one of the simplest forms of latent variable models and is based on the mapping ; , so that µ , in which W and µ are adaptive parameters . The distribution p(x) is chosen to be a zero-mean unit covariance Gaussian distribution N(0,I), while the error model for u is also a zero mean Gaussian with a covariance matrix Ψ which is the diagonal element of the matrix. Silvia Bianconcini (2012) studied the nature of curiosity by analysing the agreements of junior high school students with large battery of statements ''I like to figure out how machinery works '',or ''I like to try new kinds of food'.' A factor analysis identified seven factors, three measuring enjoyment of problem -solving, learning and reading and computational technology. Rahn (2019) in his view stated that factor analysis is a useful tool for investigating variable relationships for complex concepts such as socio-economic status dietary patterns, or psychological scales. He enthused that it allows researchers to investigate concepts that are not easily measured directly by collapsing a large number of variables into a few interpretable underlying factors. Here the key concept of factor analysis is that multiple observed variables have similar patterns of responses because they are all associated with a latent variable.
In the study of factor analysis, it is a common knowledge that the correlation model has been of prominent use as well as the co-variance structure used in the linear model; all these were before the advent of the latent variable model discovered by Lazarsfeld (1968) which formed the crux of the General Linear Latent Variable Model (GLLVM) developed by Bartholomew and Knott (1999). All these make up the three phases of factor analysis and they all try to answer the questions posed by factor analysis which is data reduction, hypothesis testing, generating factor scores etc. The three models use different approaches to answer the basic problem posed by factor analysis.
However, there is a need to study these phases in order to ascertain the most suitable and convenient model fit for any type of data generated from sample survey used in factor analysis. Hence, the main purpose of the research is to determine the similarities and differences that exist in the three phases of factor analysis with a view to discovering a more comprehensive approach to the questions that factor analysis tends to answer. The research also examined the algorithms involved in the development of factor analysis as a multivariate statistical technique. This paper is organised as follows. Section 2 deals with the challenges of factor analysis, Section 3 discussed the methodology, section 4 deals with data analysis and results, and finally, section 5 contains summary and conclusions.

Challenges of Factor Analysis
The interpretation of factor analysis is based on empirical approach which is a solution that is convenient even if not completely true. More than one interpretation can be made of the same data factored the same way, and factor analysis cannot identify causality. Factor analysis is not without cost, however. It is mathematically complicated and entails diverse and numerous considerations in their formal training, and the sum is the major cost of factor analysis: most laymen, social scientists, and policy-makers find the nature and significance of the results incomprehensible for their work application. Its technical vocabulary includes strange terms such as eigenvalues, rotate, sample structure, orthogonal, loadings, and communality. Its results usually absorb a dozen or so pages in a given report, leaving little room for a methodological introduction or explanation of terms. Philippe (2013) while working on (estimation of generalized linear latent models) stated that the general linear latent variable model (GLLVM) which is a statistical analysis presents a difficulty, since the latent variables are not observed , they must be integrated out from the likelihood function .and the calculations involved in this method is highly enormous. Add to this the fact that students do not ordinarily learn factor analysis as a course while in school.

The Three Phases of Factor Analysis 3.1 First phase of factor analysis
According to Yule (1927) the essence of what Spearman needed is contained in the formula for the partial correlation between two variables ,i and j say, given a third variable which Spearman calls G. This he gave as . - where . is the partial correlation between i and j This is called the Spearman's approach. The study employed the principle of partial correlation given by Onyeagu (2003). Here partial correlation between and given by the matrix ∑ ∑ ∑ ∑ which is also known as the matrix of partial variances and co variances. This can be demonstrated by letting ….. … denote the element in ∑ . , the matrix defined by the elements is the matrix of partial correlations where … .
… is the partial correlation between and in the first set of , holding variables in the second set constant .The partial correlation coefficient allows us to measure the linear dependence of any two variables in the set by removing the linear association of the variables in the second set with the variables in the first set. This can be shown in the example below:

Suppose and where
The partial correlation between and can be written as .
Thus, working in a recursive manner any desired partial correlation can be obtained using Anderson (1958) formula Consequently, we can write From this, we derive the correlation between and based on the assumption that their correlation can be totally explained by their common dependence on implying that . 0, thus, the correlation matrix ( 1 0 ) If there are two or more (independent) underlying factors the correlation matrix would have the form ∑ ( 1 1 ) The decision to use product moment correlation therefore, implies assumptions that item test of scores are linearly related to any underlying factor. Incidentally this became the central idea in the second phase of factor analysis. The major limitation of this phase was that, it's rudimentary in the sense that it was not capable of treating more than one factor at a time. Again it does not go beyond the level of partial correlation; so that all one needs to study was the pattern of the correlation matrix in order to get a picture of the nature of relationship between the variables involved. This method of dealing with factor analysis at the early stage of the development of factor analysis was hampered by the lack of adequate computational techniques and computer packages.

Second phase of factor analysis
The second phase of factor analysis which in modern notation according to Lawley and Maxwell (1963) supposed that ⋯ ; for 1,2 … (12) which can be represented in terms of a probability distribution as in Bartholomew et al (1999) as given that ( 1,2 … ), and ~ 0,1 . The covariance matrix ∑ has the form is called the loading of the factor .The matrix L is the matrix of loadings, the specific factor ∈ is associated only with the response the p deviations , are expressed in terms of random variables , , … , , ∈ , ∈ , … ∈ the assumptions are a) And that F and ∈ are independent so that ∈, ∈ 0 this constitutes the orthogonal factor model which has the covariance as ∑ ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) Vol.11, No.3, 2021 4 The covariance structure is of the form or ⋯ , ⋯ and ∈ is linear in the common factors. Thus ⋯ ℎ 1,2 … 3.

Estimation of parameters
In the study the researcher examines ways to establish the covariance relationship given a dataset of say, , , … observations on p correlated variables ,which tends to address the problem posed by factor analysis which is to determine whether the factor model +∈, with its assumptions can adequately provide us with estimates of the data. This implies verifying the covariance relationship.
And , Here the population covariance matrix ∑ is estimated by the sample covariance matrix S or Σ . The study used the principal component method (principal factor) and the maximum likelihood method to achieve this. In the principal factor method, we factor out the covariance matrix using spectral decomposition theorem by Onyeagu et al (2003) where ( , is the eigenvalue-eigenvector pair of ∑ and ⋯ 0 ∑ 0 The factor analysis representation is exact but not very useful since it does not allow for any variation in the specific factors ∈, to overcome this the contribution of ⋯ to ∑ in the spectral decomposition of ∑ giving rise to Σ , … the approximation becomes ∑ ( 1 8 ) where ∑ , ( 1,2. . ) In maximum likelihood method, Bartholomew and Knott (1999) state that if ~ , ∑ , where the common factors and specific factors ∈ are assumed to be normally distributed. The maximum likelihood estimates of the factor loadings and specific variances can be obtained. The likelihood function for observations , … , may be written as / . This is first minimized with respect to . This is a standard problem and it is readily shown that ̂ . This is substituted into , and then maximized with respect to ∧ and Ψ, where Σ ∧ ∧ ( 2 1 ) The ∧ is then properly defined by imposing the uniqueness condition ∧ ∧ ∆, a diagonal matrix. and are then obtained by maximizing the likelihood function .

|∑|
This is evaluated with the use of the computer program, thus the maximum likelihood estimators , , ̂ maximize the likelihood function , ∑ subject to being a diagonal. ℎ ⋯ for 1,2, … , which is termed the maximum likelihood estimates of the communalities for the covariance matrix.
In the second phase of factor analysis the model provided us with rigorous and detailed answers to questions posed by factor analysis as well as makes available to us factor scores, which are used for further research work. One major limitation of this phase is that, it assumed that the specific variables were all continuous and thus ignoring the fact that data obtained from research work in most behavioural sciences are discrete, nominal or a mixture of the two in nature; in this way a lot of useful information is lost as a result of the assumptions.

Third phase of factor analysis
In the third phase of factor analysis emphasis is laid most, on latent variables. Latent variable models provide an important tool for the analysis of multivariate data. They offer a conceptual framework within which many disparate methods can be unified and a base from which new methods can be developed. A statistical model specifies the joint distribution of a set of random variables and it becomes a latent variable model when some of these variables -the latent variables are unobservable .The model known as general linear latent variable model (GLLVM) given in Bartholomew and Knott et al (2011), the model consist of two parts; first the prior distribution is given by ℎ ∏ / (24) It is represented by the density function ℎ and the second element in the model is a set of conditional distribution of the set of the manifest variables ( ) given the latent variable . These are denoted by / 1.2. . where the subscript and reminds one that the form of distribution can vary with. A convenient family of distributions which turns out to have many useful properties in other branches of statistics is the one parameter exponential family. Suppose that latent variables combine to determine the value of the parameter, then one may have / exp for 1,2, … (25) where ,is some function of , the simplest assumption about the form of this function is to suppose that it is a linear function ,in which case we obtain, ⋯ . 1,2, … . This is the general linear latent variable model, the term linear refers to its linearity in the . The difference between this model and the generalised linear model commonly used in statistics is that in the general linear latent variable model will have a set of ′ rather than a single dependent variable and here the ′ are unobservable. Thus we shall be predicting the ′ given the ′ .
The exponential family of equations above includes the normal, gamma, and Bernoulli distributions as special cases, if and are allowed to be vector valued it will also include the multinomial distribution.

Maximum likelihood estimation
The log-likelihood of the latent factor model for the mixture distribution is of the form , , ∑ ∑ ∑ / (26) Maximization of this log likelihood is made complex by the presence of the summation inside the logarithm. A technique for performing the optimization is known as the expectation maximization (EM) algorithm can be employed according to Michael et al (1999).
An introductory account of this EM in the content of the factor model is given in the equation , / ∏ / (28) The EM algorithm is based on the observation that if we have a set of indicator variables specifying which component is responsible for generating in each data point , then the log likelihood would take the form , , ∑ ∑ ∑ ln / (29) and it's optimization would be straight forward, with the result that each component is filled independently to the corresponding group of data points. And the mixing coefficients are given by the fractions of the points in each group. The } are regarded as 'missing data 'and the data set { } is said to be "incomplete" combining { } and } the corresponding "complete " data set is obtained with a log likelihood given in eq. (27). Though the values of } are unknown, but are determined by their posterior distribution given by . As can be seen from the model, the third phase of factor analysis has laid to rest the issue of the nature of the specific variable, by using the conditional distribution of / and combining it with the mixing (or prior) distribution . In this way there is need to know the distribution of in order to predict the various specific variables. Hence the need to search for a more advanced model which can tell us more about the distribution of the specific variables without making use of the prior distribution of . Taking a step further from the model for the third phase of factor analysis which is of the form ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) Vol.11, No.3, 2021 6 ∏ ⁄ ) where is the prior (or mixing) distribution and / is the conditional distribution of given . And using the principles of Bayes theorem which states that / and applying same to the factor model gives Simplifying further gives ∏ and integrating the mixing distribution over the range of , we obtain ∏ . In this way the parameter estimates are obtained by fitting distribution on and obtaining the distribution of the specific variables, using the computer program R-console.

Results and Analysis
In this section the parameter estimates obtained for the three phases of factor analysis were evaluated using different sample sizes. The computational analysis of the simulated data for the parameters of the three phases of factor analysis were performed starting with the first phase.

The first phase of factor analysis
For this phase of factor analysis, with sample size n =379, and x =4, from a Binomial distribution, the partial correlation matrices can be obtained manually via eq. (8). The partial correlation estimates give us an insight of the contribution of the various variables to the general factor G in the above equation. Table 4.1 shows the results obtained using SPSS version 20 for the first phase of factor analysis .609 .707 1.000 Here it can be seen from the correlation matrix that items 3 and 4 are highly correlated followed by factors 2 and 3. Thus all the information about the model is all that can be found in the correlation matrix.  Table 4.2 shows the total variation explained by each of the components. Component one explained about 72.1% of the variance, followed by component two (12.9%) and lastly, component 4 which explained approximately 6.8% of the variance. This is also confirmed by the scree plot in fig 1. Only factor one appears significant Fig. 1: Scree Plot of the total variance explained From the Scree plot diagram it is seen that only one variable is significant, which makes for the selection of one component only. The results obtained using the third phase is shown below  Table 4.3 shows the individual distribution of factor scores, expected value of X ,   E X and the standard error S.e of the posterior distribution of the data imputed. The analysis for the second data goes thus starting with the first phase of factor analysis which deals with the issue of partial correlation. For the first phase of factor analysis these results were obtained. 1.000 From the result obtained from the correlation matrix, it can be seen that the items are loosely correlated, the model does not provide us with much information concerning the number of factors in the model, since it is a one factor model thus it assumes that all of the variables (items) are grouped under one factor. But further analysis with the other models will prove otherwise.  Fig. 2 shows the scree plot of the communalities.  In the factor matrix above we are able to obtain information about the two factors extracted from the model and their relationship with the other factors. With this, further analysis such as regression analysis can be carried out with the reduced data obtained from this factor analysis model.

Summary of findings: 5.1 First phase of factor analysis
In the first phase, the model was able to derive the correlation matrices for the two data sets obtained by simulation using Minitab 16. Results as shown in table 4.1 and 4.2 led to the conclusion that the first phase of factor analysis is peripheral and lacks much content, because it starts and ends with the correlation matrix.

Second phase of factor analysis
This phase of factor analysis, which is a model based approach, goes beyond the provision of the correlation matrix to give a detailed explanation of the patterns in the matrix by providing an ANOVA table of the factor variances, table of communalities (this deals with the uniqueness of the error variances), and the SCREE plot diagram . Also the aspect of data reduction and provision of factor scores were adequately treated in this second phase of factor analysis. Thus this model approach may be considered most appropriate though it provides us with rigorous methods for answering the traditional questions addressed by factor analysis.

The third phase of factor analysis
The third phase of factor analysis was able to accommodate and treat adequately the various types of data (continuous, discrete and mixed) available. It also handled the issue of data reduction which is the main goal of latent variable analysis. In the third phase the posterior distribution is used to give a picture of the specific variables by providing estimates of the expectations, the standard error and the standardized values of the factor scores, using the logit /normal model. In summary, the study explored the three phases of factor analysis by bringing to focus the synergy that exists among the three phases as can be seen from their individual correlation matrices. Furthermore, phases two and three of factor analysis share so much in common in their model attributes, such as, the uniqueness of the error variances, data reduction, and provision of factor scores. Though the factor scores of phase two and three are not exactly the same, but they nevertheless pursue the same objective, which is in providing data for further analysis. Both models also used eigenvalues to reduce the number of factors involved in the models. Again the research was able to show the different algorithms employed by the three phases and how they were put to use to ascertain there relevance in the study.

CONCLUSION
The three phases of factor analysis have one goal in common, that is to address the questions posed by factor analysis. This they all do, which still boils down to the issue of factor scores and its implementation.
However, based on the computational results from the three phases of factor analysis, it can be said that the rigorous approach of the second phase of factor analysis makes room for a better understanding of factor analysis and in making data available for further analysis. Though it does not address adequately the issue of the nature of data involved, these it treats as continuous. But the third phase treats this issue with much caution. Hence the edge it has over the second phase of factor analysis model. Thus of the three phases it is the most advanced model as it is all encompassing especially in the proper manner it addresses each variable in the model.