On Statistical Approach to Automated Normal Systolic Blood Pressure Detection in Continuously Monitored Blood Pressure Data

This paper considers a statistical approach for detecting normal systolic blood pressure pattern from a continuously acquired systolic blood pressure data. Blood pressure monitoring system able to detect subtle changes well in advance in physiological vital signs before clinical emergencies, requires knowledge of the normal blood pressure pattern. Nevertheless, normal data is not always available for pragmatic learning. Ability to learn the normal pattern of systolic blood pressure data is a significant element in the development of robust blood pressure monitoring system. This paper builds on Kernel density approach, based on statistics obtained from novelty scores of the density estimates. The methods are illustrated using simulations and a real data of a continuously acquired systolic blood pressure dataset from Biofourmis Singapore Pte., with detection accuracy of 98 %.

training dataset" has not been addressed in the literature. This paper hypothesizes that statistics derived from the probability density function of training dataset are an attractive source for building automated normal vital signs detector. Thus, a rich source for building statistical procedures which are able to learn from available training data, an appropriate physiology specific normal pattern for contextual and pragmatic detection of subtle changes in physiological vital signs.
The probability density function (pdf) of data provides information on vital features of the data, thus an essential source for the computation of many statistics, for example, the sample mean, sample variance etc. Suppose the expected value of a given random variable is assumed, an appropriate measure of center for such data, then it can be employed as a basis for exploring the contribution of each observation to the center of the distribution of the random variable under consideration. In particular, the corresponding density values of the observations are seen as weights assigned to each observation, thus, can be viewed as their corresponding contributions to the center of the probability distribution of the associated random variable. It is easy to see that an observation spatially close to the center will have a relatively higher contribution than those reasonably far from such center. Also, it is well known that the probability density function underpins several inferential methods, in which varied views are considered for its construction and estimation. For example, in the likelihood approach to inference, it serves as the fundamental pillar. It should be recalled that a statistic is a measurable function of the observations, hence summarizes the data. Statistics obtained based on the probability density function of the data can provide a standard way for data exploration and development of attractive statistical methods for solving scientific problems. This is the direction we propose for addressing the problem of self-learning of normal systolic blood pressure measurements in the context of blood pressure monitoring.
The rest of the paper is organized as follows. Section 2 introduces formally, the statistical methodologies considered. Section 3 provides details on the implementation protocols based on the developed methods. Section 4 discusses in brief, the performance evaluation with some illustrative examples using both real data and simulations outlined and Section 5 concludes.

Probability density function of data
Kernel density estimation provides standard approach for both multivariate and single-variate data density estimation from sample data (Pimentel et al., 2013). The probability density function of a sample data of size, , say, , is estimated based on the model (1) is a normal mixture model with m components. The components are univariate normal distributions weighted equally with each observation out of the sample acting as the mean. The unknown parameter here is , and is termed smoothing parameter or bandwidth in the context of kernel density estimation. It controls the smoothness of the underlying probability density function.

Smoothing parameter estimation
The choice of smoothing parameter is primal to the performance of kernel density (Bishop, 1994). Thus, the need to adopt a pragmatic approach to its estimation or learning based on empirical data. Bishop (1994) proposed an approach for estimating  based on average 10 nearest neighbours in terms of Euclidean distance, for multivariate kernel density estimation. Mensah, Assabil & Eyiah-Bediako (2019) proposed an empirical scheme for learning suitable smoothing parameter value, , instead of, , for univariate density estimation based on weighted distances. Following Mensah et al., (2019), for a single variate, y, we consider an estimator, , , ,  Vol.10, No.4, 2020 value is mined from the data. In other words, it enables contextual learning, making use of the underlying physiological features in order to allow for the estimated value, say, 2  , not being too small or too large relative to the data. It should be noted that we have written, ( ) i y  , to mean the remaining data after taking out the ith observation. Mensah et al., (2019)   .
T y      and thus, it can be related to physiology in practice. In particular, regarding systolic blood pressure of human, we say, positive values correspond to normal setting since the individual is alive. Concentrating on the positive values, appropriate threshold can be set to learn plausible normal systolic blood pressure measurements for building effective systolic blood pressure monitoring systems.
The rationale for the use of the statistic, ( ), T y in (5) is due to the asymptotic normality of ( ), W y justified by the Central Limit Theorem. For sufficiently large dataset of size, say, m, ( )~(0,1).

W y N
Thus, it can be utilized to assess values of y, which are likely to be normal. The set of values which are highly likely will obviously cluster around the zero line (i.e. within -1 and 1, say). Thus, systolic blood pressure measurements which are highly likely to be normal in terms of a given physiology are located in the high density regions of the empirical density function. Thus, will cluster around regions of low ( ) Z y features when considered in such feature space.
Similarly, deteriorating or abnormal or suboptimal systolic blood pressure values will be located within low density regions of the density function and thus, will generate high ( ) Z y feature values. When translated into the statistic, ( ), W y normal values of systolic blood pressure would cluster around the 0. By this, it is obvious that the statistic, ( ), T y is informative about the nature of the systolic blood pressure values under its underlying probability density function. As a result, the use of the statistic, ( ), T y can serve as an important element in developing a principled approach for learning from data, an appropriate normal systolic blood pressure settings in situations where the normal data is not available.

Implementation
With the high likelihood for the probability density function of real systolic blood pressure to be skewed naturally, it is vital to apply the proposed methods to an appropriate subset of the data. From our experimentation, we noticed that direct application of the proposed method to the training dataset without considering the nature of the density affects the performance. As a result, we outline two different implementation schemes in comparison with the direct method, which we have termed the direct probability density (DPDF) approach. With this implementation, no appropriate data is mined from the given training dataset for further processing. However, model (1) is directly fitted using the training data set, y, followed by the application of the statistics in (5), on the corresponding fitted results. The two variant implementation schemes are discussed in brief next.

Single-fit probability density (SFPDF) based approach
With this implementation scheme, the probability density function of the training data is applied once. The process can be detailed as follows: ( (3) Apply equation (5) to the results in (2) 3.2. Double-fit probability density (DFDPF) based approach Under this scheme, the estimated probability density function of the data is used twice. First, it is employed as a preprocessing tool for exploiting a region of appropriate data for further processing, using density-based robust statistics. Second, the appropriate data are then subjected to further analysis. The flow of the scheme can be summarized in the following steps.

Performance Assessment
This section focuses on the implementation and performance assessment of the proposed methods for performing automated normal systolic blood pressure learning. We consider three examples covering real physiological vital signs data example and two simulations based on random perturbation of the real data.
In all our examples, we used the natural threshold informed by the test statistic, ( ), T y for the detection of normal systolic blood pressure observations. In particular, the algorithm declares an observation as normal if its corresponding test statistic value is positive. First, we introduce in brief the performance measures considered for the evaluation here, before applying them to our examples. The assessment is based on statistical measures of performance utilized for assessing binary classifiers, namely, sensitivity, specificity and accuracy. These measures are usually, computed for binary classifiers based on contingency table shown in Table 1.
Sensitivity (true positive rate) measures the proportion of the actual positives that are correctly identified as such by the algorithm. The specificity (true negative rate) of an algorithm measures the proportion of the actual negatives that are correctly identified as such. Based on Table I, sensitivity (SS), specificity (SP) and accuracy (ACC) can be expressed as Z It is clear from (6) and (7) that sensitivity and specificity are  (8), it is apparent that accuracy addresses this by making use of four numbers. Accuracy is therefore more balanced, representative and apprehensive than sensitivity and specificity (Vihinen, 2012). The measure of overall performance of the methods will be based on accuracy.
We consider a two-level assessment for the performance evaluation. First, a true systolic blood pressure annotation was performed with the help of a medical practitioner, based on the current standard physiological settings. Second, the annotation by the developed algorithms (schemes) are compared with the true annotation based on the performance measures discussed above.

Example 1: Real Systolic blood pressure data application
This application is based on real physiological vital signs data collected by the Singapore Heart Foundation for Biofourmis Private Company Limited, Singapore. The dataset is a de-identified data of size, 845 5,  covering Systolic blood pressure (SBP), Diastolic blood pressure (DBP), Mean arterial pressure (MAP), Pulse Rate (PP) and Heart Rate (HR). The data are obtained from adult Asians in the age group, 25 -62 years, enrolled in a SingHeart study. The subjects were continuously monitored over a period based on varied physiological states, for example, when the individual was sleeping, walking, exercising, sitting etc. We applied the proposed schemes on the SBP, denoted as, y. Figure 1 shows plot of the data and its distribution in terms of boxplot. The green lines denote the current convectional normal systolic blood pressure range (120 -139 mmHg) for adults. The varied physiological states characterizing the dataset are clearly evident with its distribution exhibiting right skewness. Also, there seems to be the existence of outlying observations. The skewness may be due to the potential outliers, which may have been engineered by some factors such as, activity levels, diet, location, etc. Learning the normal pattern or application of detection methods without consideration for potential outliers may lead to poor performance and consequently misleading conclusions. First, we examine the applicability of the methods to mine training dataset for an appropriate subset for further processing. Figure 2 shows a plot of ( ) V y against y (left), boxplot of , y  the subset data, obtained from the training dataset using SFPDF and DFPDF. The red line is the median, , The utility of the statistic, ( ), T y in the normal systolic blood detection process is shown in Figure 3. The red line denotes the natural threshold. It can be seen that positive values of ( ) T y naturally correspond to likely normal systolic blood pressure measurements. Furthermore, there exist differences in the normal SBP mining capabilities of the methods. It follows that, the narrower the curve, the better the detection, with both SFPDF and DFPDF able to learn physiology much better than DPDF, leading to suitable data driven normal values of SBP. The potential of the proposed methods for learning physiology specific normal systolic blood pressure observations from training data is shown in Table II. Clearly, SFPDF and DFPDF exhibits the same performance and both outperform DPDF. Also, we are able to detect 98% of the normal systolic blood pressure measurements from the training dataset.

Example 2: Perturbed systolic blood pressure data
We consider a second example in which the true systolic blood pressure data were perturbed randomly by an additive noise at the level of variability of the real dataset. In particular, the simulation followed the data generative process of the form where i y denotes the original systolic blood pressure measurement and 2   is set as the variance of the original dataset. The nature of the synthetic data is shown in Figure 4. The appropriateness of the dataset for normality pattern learning is evident. The results obtained from the application of the developed schemes on , y % are not very different from those obtained in Example 1. Figure 5 illustrates the automated training data preprocessing potential of SFPDF and DFPDF in terms of the acquisition of . y  Again, the use of the statistic, ( ), V y allows appropriate selection of subset of training data for efficient SBP normal pattern detection. Also, both schemes yield the same subset.
The underlying characteristics of the proposed statistic, ( ), T y and its detection potential for the simulated systolic blood pressure data is demonstrated in Figure 6. The intrinsic differences among the three schemes are clearly seen. Table III shows the sensitivity, specificity and accuracy of the three schemes for the simulated dataset considered. Again, it is easily seen that both SFPDF and DFPDF outperform DPDF by an order of magnitude. However, slight performance improvement is recorded for SFPDF over DFPDF in terms of specificity and accuracy.

 
Let y denotes the vector of synthetic blood pressure measurement. We apply the developed schemes using y based on an value as in Examples 1 and 2 considered in subsection 4.1 and subsection 4.2 respectively. Figure  7 shows the plot of the data. Figure 8 shows the distribution of the subset, , y  of the simulated dataset, y considered appropriate by SFPDF and DFPDF. Figure 9 illustrates the nature of the three schemes in terms of the statistic, ( ), T y and Table   IV gives their corresponding detection performances. DFPDF exhibits marginal improvement over its SFPDF counterpart.

Conclusion
In this paper, we have presented novel non-parametric statistical approaches for automated normal systolic blood pressure detection from training data, based on statistics derived from the probability density function (pdf) of the training data in the absence of known normal pattern. The pdf of the given training data is estimated via kernel density estimation and utilized to obtain an appropriate data from the training data, based on robust statistics of either its pdf or the data itself. Furthermore, incorporation of appropriate data pre-processing modules into developed artificial intelligence (AI) improves detection performance. In particular, the real data application recorded performance improvement of magnitude, 35% in sensitivity for both SFPDF and DFPDF, 41% in specificity for SFPDF and 39% for DFPDF. In terms of accuracy, SFPDF recorded a 39% improvement while DFPDF registered 38%. The perturbed data application also registered substantial improvements in performance across sensitivity (40%), specificity (47%) and accuracy (45%). The proposed methods have the potential of transforming healthcare by building robust automated systolic blood pressure monitoring system for pragmatic and timely monitoring, if integrated into prototype blood pressure devices utilized in existing blood pressure monitoring systems, especially those involved in mobile health monitoring. Thus, making healthcare more accessible, through reduced cost, reduced health insurance claims and allowing remote monitoring by physicians any time anywhere.