Acute Stress State Classification Based on Electrodermal Activity Modeling

Acute stress is a physiological condition that may induce several neural dysfunctions with a significant impact on life quality. Accordingly, it would be important to monitor stress in everyday life unobtrusively and inexpensively. In this paper, we presented a new methodological pipeline to recognize acute stress conditions using electrodermal activity (EDA) exclusively. Particularly, we combined a rigorous and robust model (cvxEDA) for EDA processing and decomposition, with an algorithm based on a support vector machine to classify the stress state at a single-subject level. Indeed, our method, based on a single sensor, is robust to noise, applies a rigorous phasic decomposition, and implements an unbiased multiclass classification. To this end, we analyzed the EDA of 65 volunteers subjected to different acute stress stimuli induced by a modified version of the Trier Social Stress Test. Our results show that stress is successfully detected with an average accuracy of 94.62 percent. Besides, we proposed a further 4-class pattern recognition system able to distinguish between non-stress condition and three different stressful stimuli achieving an average accuracy as high as 75.00 percent. These results, obtained under controlled conditions, are the first step towards applications in ecological scenarios.

distress to more serious and rare issues that can lead to heart attacks, arrhythmias, or even sudden death [9], [10], [11]. If acute episodes proliferate and persist as in a chronic condition even the milder stress symptoms can cause extensive damage, strongly reducing the quality of life [10]. Indeed, it has been identified as a risk factor for hypertension and coronary disease [9], [12], irritable bowel syndrome, gastroesophageal reflux disease [13], and mental diseases, e.g., anxiety disorder and depression [14]. All this also has inevitable economic consequences by increasing absenteeism, staff turnover, presenteeism, and tardiness [14]. Accordingly, the development of early stress-detection methods would be extremely important.
The measure of psychological stress is conventionally based on three main methods. Most of the psychiatric or psychological diagnoses of stress are based on interviews, questionnaires and self-assessment tests, including the State-Trait Anxiety Inventory test (STAI). Although these methods have been validated, there is still the issue of subjective response bias that leads to the need for objective measures. In the clinical practice, this latter often consists in invasive methods, i.e., blood analysis or cortisol samples, that do not apply to the purpose of long-term monitoring in ecological scenarios, e.g., in the workplace [15]. To overcome these issues, in recent years different methods have been proposed to measure human psychological stress using physiological data, and more specifically Autonomic Nervous System (ANS) peripheral correlates. Indeed, extant studies have demonstrated that stress stimuli (stressors) trigger the hypothalamus brain area (hypothalamus-pituitary-adrenal (HPA) axis) that is involved in the hormone secretion procedure and the emotion processing [16]. The HPA axis activates the sympathetic ("fight-orflight" response) and parasympathetic ("rest-and-digest" response) branches of the ANS maintaining the homeostasis of the human body. The action of biological agents upon the parasympathetic and sympathetic nervous system (PNS and SNS respectively) produces specific reactions on peripheral ANS signals [16]. Particularly, the SNS activity induces pupil dilation and a sudden increase in the sweat gland activity, during the initial alarm response, and in the breath rate [16], [17], [18], [19]. At the cardiovascular level, the sympathovagal balance moves toward sympathetic predominance determining an increase in the mean heart rate and a shift of the power spectrum toward the low-frequency band [20]. Furthermore, stress-induced vasopressin secretion leads to blood volume, pressure, and peripheral blood vessel resistance increase [17].
Accordingly, typical applications of physiological data analysis for stress state recognition have concerned a combination of different noninvasive physiological signals such as heart rate variability (HRV), respiration activity, electrodermal activity (EDA), skin temperature, photoplethysmogram (PPG) or electromyography (EMG). Most of previous studies have relied on a similar analysis chain, but with striking differences at each node. First, they have induced stress using specific experimental paradigms designed to acquire multiple physiological data in laboratory-controlled conditions [21], [22], [23], [24], [25] or, less commonly, during simulated ecological scenarios using wearable devices [26], [27], [28]. Typical stimuli have involved the Trier Social Stress Test [29], [30], Stroop test [31], video stimuli, or arithmetic tasks [32]. Afterward, a variety of features have been extracted and then used as input of a machine learning model, e.g., support vector machines [30], [33], random forest [33], neural networks [34], Bayesian networks [35] or fuzzy logic [22], [36] to automatically discriminate stress states. Of note, only a few studies have shown a classification of multiple levels of stress, e.g., [23], [24], [33], usually related to the intensity of a single kind of stressor.
The aforementioned studies suggest the need for a large number of sensors to detect stress, making it difficult to design practical or commercial systems for real-world applications due to their obtrusiveness and computational cost. Indeed, a significant part of the literature considers a multivariate, multi-organ approach necessary for a comprehensive characterization of the ANS activity [37]. However, it is often indispensable to find a compromise between a deep ANS characterization and the possibility to export the results in daily life scenarios. For this reason, research should be directed towards the development of an effective method to detect stress by using the smallest practical number of sensors, without negatively affecting the quality of stress detection. Healey et al. [23] have pointed out the relevance of EDA and HRV as the best correlates in stress detection tasks among other physiological signals. Particularly, EDA could provide a reliable metrics of stress and it has been used as a ground-truth to compare the performance of other signals [38].
In this study, we focused on stress classification using solely EDA analysis. Electrodermal signal describes the alterations of skin electrical properties due to psychologically-induced eccrine sweat. It can be decomposed into its phasic and tonic components, which are characterized by different time scales and contain complementary psychophysiological information. Phasic electrodermal response (EDR) is the event-related relatively fast-changing component, whereas the tonic component represents the slowvarying baseline. EDA can be considered as a robust indicator of the ANS activity and is a widely used measurement in the affective computing field [39], [40], [41], [42]. Of note, unlike other physiological signals, which are affected by both the sympathetic and parasympathetic branches of ANS (e.g., HRV), EDA reflects only the SNS activity [43]. Furthermore, EDA has also been applied to investigate different stress disorders such as psychosocial stress [44], social anxiety disorders on children [45], or acute stress in autism spectrum disorder [46]. Some studies have already used EDA as the only input of a supervised classifier to discriminate between stress and non-stress or to differentiate between cognitive load and stressful tasks [47], [48], [49], [50], [51]. Despite the efforts made, there are methodological barriers that prevent the export of the aforementioned studies based solely on EDA to real-life scenarios. First, in daily use, EDA signals are frequently contaminated by noise and motion artifacts whose frequency spectrum is often overlapped to the EDA frequency band. Second, there is a high probability of multiple uncontrollable stressful stimuli with very short interstimulus intervals (ISI), which leads to overlapped EDRs. This overlapping is probably the main limitation in a set of factors regarding the decomposition of EDA into its phasic and tonic components inducing an inaccurate estimation of their psychophysiological information. Third, the computational cost and the overfitting risk of the classification should be mitigated. Hence, it is necessary to perform an unbiased subject-independent classification and to control the number of features extracted. Lastly, given the multiple daily sources and different types of stress, it would be important to distinguish multiple stressors. Nevertheless, in the previously mentioned literature, most of these issues have not been addressed or authors have applied heuristic methods and ad-hoc analyses. On the other hand, empirical evidence suggests that the model-based approach for EDA analysis has greater validity when compared to conventional methods [52]. The former makes explicit accessible assumptions, enabling probabilistic model inversion to suppress measurement noise and increase the sensitivity of statistical inference. Accordingly, model-based methods have, in general, increased sensitivity and replicability compared to operational approaches [52], [53], [54], [55], [56].
In this paper, we propose a processing chain for acute stress states classification based on EDA modeling, a proper learning model and a feature selection algorithm aimed at improving robustness, replicability, interpretability and subject-independence. EDA modeling is based on a previously validated model, called cvxEDA, providing a robust and rigorous method to estimate the latent variable, i.e., the SNS activity directed to sweat glands, from the observed EDA [40], [42], [55], [57]. Our approach will be tested on a modified Trier Social Stress Test (TSST), which includes different types of stress stimuli [58], [59]. See details in the next sections that are organized as follows: first, in the Section 2, we describe the TSST protocol and cvxEDA methods, feature extraction, statistical analysis, and classification, then, we present experimental results and a discussion on them in Sections 3.1 and 4 respectively.

Experimental Protocol and Data Collection
We enrolled 65 healthy young participants in the study. The group of subjects was a socio-demographic homogeneous sample with an average age of 21:76 AE 4:48 years old. The local Ethics Committee approved the experimental protocol and each volunteer signed an informed consent. The subjects were instructed to avoid the consumption of alcohol, tobacco, or any kind of psychotropic substances. They also had to avoid practicing physical exercise for 24 hours before the experiment, to wake up two hours before, and to have a light breakfast without coffee or tea.
Each subject underwent on two experimental sessions in two different days. The first session consisted of a 35-min resting-state phase. The second session aimed to induce emotional stress through a modified Trier Social Stress Test (TSST) [58], [60], which is currently widely used and accepted in experimental psychology as a robust and reliable acute stressor [60], [61]. In the classical TSST, participants are first requested to prepare a speech to be presented in front of a panel of assessors, and then to perform a mental arithmetic task. The combination of public speaking, mental arithmetic task, anticipation, and social evaluation have been shown to produce consistent moderate stress response [62]. Several adaptations of the TSST employing stressor tasks associated with social-evaluative threat and uncontrollability have been successfully proposed in the literature [62]. In our study, to emphasize the stressful effect, a pool of psychologists have proposed to replace the public speaking task with a speaking memory task, which was video recorded and then displayed interleaved with the video of an actor correctly performing the memory task. This modification has been applied maintaining the philosophy and general structure of the original TSST. More in detail, the presented stress session was made up of the following stages (see Fig. 1): 1) Baseline stage (BS) of relaxing audition for 10 minutes. 2) Story-telling (ST): the subject listens to three stories with many details and the subject is requested to remember as many details as possible, demanding a great amount of attention. 3) Memory task (MT): the subject is requested to repeat each story aloud as detailed as he/she can remember. 4) Stress anticipation (SA): the subject waits 10 minutes in a room for the memory test evaluation. 5) Video exposition (VE): a projection of a video with the subject's performance in the memory test is shown. The video shows twice each one of the three stories. First, an actor correctly repeats the story, trying to make the subjects believe that this is the common case. Subsequently, the subject (recorded during the MT stage) repeating the story is displayed. 6) Arithmetic task (AT): within 5 minutes, the subject is asked to perform a count down subtraction starting from 1022, repetitively subtracting 13 and giving the answer aloud each time. If a mistake is made, the subject is asked to start over, from 1022. For the total duration of both resting and stress sessions, EDA was recorded at a sampling rate of 250 Hz through the Encephalan-EEGR-19/26 device (Medicom MTD ltd). Two electrodes were placed on the distal phalanx of the index and ring fingers of the non-dominant hand. Before the beginning of the experiment, the subjects were asked, if possible, to avoid just sudden hand movements, but we did not prevent or control any natural gesture.

Stress Assessment: Psychometric Tests
The stress reference variables were estimated from psychometric tests. More specifically, after each session, the subjects filled out the Perceived Stress Scale (PSS) [63], the Visual Analog Scale (VASS) [64] and the STAI [65]. The PSS measures the degree of overall stress of the individual or the extent to which life situations are appraised as stressful.
The VASS records the self-perceived stress level. The STAI evaluates anxiety from two different points of view: as a measure of the subject state (STAI-s) at a given time, and as the trait (STAI-t) or stable tendency of the individual to respond by increasing his/her level of anxiety in stressful situations.

EDA Processing Using cvxEDA Algorithm
The EDA signal has been recorded by applying an electrical potential of 0.5 V between the distal phalanges of the index and ring fingers of the non-dominant hand and then estimating the skin conductance. It consists of two different components, tonic and phasic, within the frequency range of 0À2Hz. The tonic component, which refers to the baseline slow variations of the EDA, i.e electrodermal level (EDL), contains information about the subject's general psychophysiological state and his autonomic regulation [66]. On the other hand, the phasic component reflects the short-time response directly induced by an external stimulus (electrodermal response, EDR). The shape of a single EDR is characterized by a rapid ascension from the baseline signal succeeded by an asymptotic exponential decrease towards the baseline. In the case of an ongoing sustained elicitation, or, more generally, when the phasic response can not be linked to a specific stimulus, it can reflect spontaneous fluctuations; in this case, we refer to the single EDR as non-specific EDR (NS-EDR). Variations in the EDA signal are strictly related to sweat production and diffusion process, which are directly controlled by the SNS. Therefore, EDA can be considered an effective way to indirectly monitor the SNS activity.
The main difficulty of the decomposition procedure is represented by the overlapping of two consecutive EDRs (or NS-EDR), which occurs when the time interval between consecutive stimuli is shorter than EDR recovery time. This generates an inaccurate estimation of the information contained in the two components. To overcome this problem different model approaches have been presented in the literature (see e.g., [52], [55], [67], or [56]).
In this paper, we applied the convex optimization model (cvxEDA) presented in [55]. It proposes a representation of the electrodermal signal as the output of a linear timeinvariant system to accurately decompose consecutive overlapped EDRs. More in detail, cvxEDA directly estimates the unobservable process, i.e., the neural sympathetic activity, that underlies the recorded electrodermal signal. This model is grounded on Bayesian statistics and on a simple yet physiologically sound representation of the observed EDA as the sum of three components: the tonic (t) signal, the phasic (r) signal, and an independent identically distributed zero-average Gaussian noise term (), which incorporates all measurement errors and artifacts: The phasic activity is defined as the convolution between the neural activity of the SNS p (more specifically the sudomotor nerve activity, SMNA) and a bi-exponential function hðtÞ (known as Bateman function) which models the sweat diffusion process [54]: where t 1 and t 2 are the two time constants that determine the shape of the single phasic response (with t 1 > t 2 ), and uðtÞ is the unitary stepwise function. The discrete-time approximation (with sampling time d) of the Laplace transform of hðtÞ function defines an autoregressive moving average (ARMA) model, which is represented by H ¼ MA À1 . M and A are tridiagonal matrices with the MA and AR coefficients along the diagonals. Consequently, the phasic component. r, is given by: where q is an auxiliary variable defined as q ¼ A À1 p. Concerning the tonic component, according to its smooth dynamics and frequency domain ( 0:05Hz), t is modeled by means of cubic B-spline functions with equally-spaced knots every 10s, and an offset term: where B is a tall matrix of which columns are cubic B-spline basis functions, ' is the vector of spline coefficients, C is a N Â 2 matrix with C i;1 ¼ 1, C i;2 ¼ i=N and N is the length of the EDA time series, d is a 2 Â 1 vector with the offset and slope coefficients for the linear trend.
In conclusion, substituting both the phasic and tonic terms, the final EDA model is the following: Once the observation model is defined, the goal is to estimate the maximum a posteriori tonic signal (t) and the neural bursts (p), parametrized by ½q; '; d, for the measured EDA signal (y): Considering the three parameters, representing the phasic and tonic activity and the drift, as independent and applying Bayes' theorem, we obtain: Unlike other approaches in the literature, cvxEDA relies only on the definition of the three priors -which are described in detail in [55] -without the need for pre-processing or post-processing heuristics procedures (e.g., to comply with negative neural activations). Finally, to solve the MAP problem and obtain the neural phasic activity and the smooth tonic signal, cvxEDA converts the Bayesian problem (7) into a constrained minimization quadratic programming convex problem (see [55] for further details): subj. to Aq ! 0: The parameters a and g control the strength of the penalty for the phasic and tonic components, respectively.
In Fig 2 an example of the resulting signals of the decomposition algorithm is reported.

Feature Extraction
In this study, we analyzed only the stress stages during which the subjects were not talking to avoid any interaction between EDA signals and speech [66]. Indeed, speech induces physiological irregular respiration that activates the sympathetic reflex and consequently affects the sweat gland dynamics and the related EDA signal [66], [68]. Accordingly, we selected three stress stages: ST, SA, VE in addition to the basal stage (BS). For each of these stages, to estimate the sympathetic activity, several features in the time and frequency domains were extracted from both the phasic and tonic components. A summary of the extracted features is shown in Table 1. Of note, EDASymp is considered a reliable index of the sympathetic activity, which has been recently introduced by Posada-Quintero [69]. EDASymp was calculated by computing the power spectral density in the frequency range of 0:045À0:25Hz of the EDA signal, i.e the sum of r and t components.

Statistical Analysis
To evaluate psychometric changes induced by the stressful protocol, we performed a paired Wilcoxon statistical test between the psychometric test scores obtained on the first day (basal session) and those obtained on the second day (stress session).
Concerning the EDA analysis, features extracted during the first resting state session were compared to those extracted during the baseline stage of the second acquisition session. Afterward, the three stress stages ST, SA, VE together with the BS were statistically compared using the non-parametric Friedman's test for each feature. The choice of this specific statistical test is justified by the non-normal distribution of the dataset. Moreover, a posthoc analysis was performed using the non-parametric Wilcoxon signed-rank test with Bonferroni correction: the significance level a initially set to 0.05, was reduced to a new value computed as a bon ¼ 0:05=m, where m is the number of comparisons.

Classification -Stress and Stressor Recognition
The EDA feature-set was used to discriminate both the stress/non-stress condition and the four stressors selected from the TSST protocol. To this aim, two supervised binary and multiclass classifications were performed by using a Support Vector Machine with Recursive Feature Elimination algorithm (SVM-RFE), recently proposed by Yan and Zhang [70]. This learning algorithm implements an embedded method (EM) as a feature selection strategy, which allows not only to maximize the classification accuracy but also to facilitate data understanding and to reduce the risk of overfitting. In EMs, the search of the optimal subset is not separated from the learning part, but it is built into the classifier construction [71], therefore the structure of the classifier plays a crucial role. Due to their characteristics, EMs have the important advantages to be less computationally expensive and less prone to overfitting than other feature selection strategies [72]. In our study, we tested three common classifier categories whose structure allows having a built-in ability to select features [71], i.e., SVM, Random Forest, and Linear Discriminant Analysis (LDA). SVM-RFE outperformed with higher recognition accuracy, both Random Forest and LDA. This could be partially explained by the ability of SVM-RFE to handle sparse features (as those related to the phasic EDA component), and to always find the optimal margin solution. Furthermore, the adopted SVM-RFE algorithm presents the unique characteristic of incorporating an embedded correlation bias reduction [70]. More specifically, we used a nonlinear nSVM-RFE employing a radial basis function (RBF) kernel, with the following parameters: n equal to 0.5; g (of the RBF kernel) equal to 0.1; and tolerance of termination criterion equal to 0.001. The RFE method [73] selects a subset of size r among m features (r < m), which maximizes the performance of the SVM classifier. The method is based on a backward sequential selection. Specifically, at each iteration of the RFE procedure, the feature that has the least influence on the SVM weight-vector norm is removed. Of note, in the multiclass  case, as suggested by the authors of [70], to rank the features, we simply add the feature weights of each binaryclass sub-problems.
To assess the out-of-sample predictive accuracy of the system we adopted a Leave-One-Subject-Out crossvalidation (LOSO). More in detail, the LOSO procedure used, at each iteration, the data records of a single subject as the validation set and the remaining observations of n À 1 subjects as the training set (where n is the total number of subjects). Accordingly, we ranked the features and repeated the SVM classification m times removing the last ranked features at each repetition [73]. The accuracy level of each repetition was calculated in compliance with the LOSO cross-validation to avoid biased performance estimation.
We implemented the classification code in Matlab Ò using the LIBSVM library [74]. An overall block diagram of the mentioned method is presented in Fig. 3.

EXPERIMENTAL RESULTS
In this section, we show the results achieved through each of the previously described analytical methods.
Psychometric scores confirmed the difference in the perceived stress between the basal condition and the stress elicitation. This difference was strongly reflected also by most of the EDA features, which showed statistically significant variations between the stress stages and the initial resting state. Furthermore, MedianTonic, aucTonic, and EDASymp significantly differed even among different kinds of stressors.
The devised pattern recognition systems achieved good accuracy for both classification analyses (binary and multiclass). Particularity, our methodological pipeline was able to recognize the stress condition with an average accuracy of 94.62 percent and to distinguish among the 4 experimental stages with an average accuracy of 75.00 percent, selecting 8 and 6 features from both phasic and tonic groups, respectively.

Group Differences in Psychometric Scores and EDA Features
The statistical analysis computed on the four psychometric tests showed a significant statistical increase, after the experimental session, of the STAI-s and VASS scores (see Table 2). On the other hand, the STAI-t subscale as well as the PSS did not show significant changes. Concerning the statistical comparison of EDA features among the four stages, some of both phasic and tonic features resulted to be effective in distinguishing, not only between the baseline and stress stages, but often also among the different stressors (ST, SA, VE), see Fig. 4. More in detail, tonic features (except for stdTonic) and maxPhasic showed significant statistical differences between the stages ST, SA, VE. However, only EDASymp discriminated among all the stressors and the baseline. On the other hand, the remaining phasic features were capable to discriminate between BS and each of the other stress stages with a p < 0:001.

Single-Subject Classification and Feature Selection
After the exploratory statistical analysis, we performed the automatic classification of stress and stressors using the nonlinear SVM-RFE algorithm with LOSO cross-validation, as described in Section 2.6. Figs. 5 and 6 show the average accuracy at each iteration of the SVM-RFE algorithm for the binary and 4-class classifications, respectively. The peak of the accuracy trend for the stress/non-stress problem was 94.62 percent and was obtained when the first eight most relevant features, according to the RFE criterion, were considered. It is worthwhile noting that the difference between the minimum and the maximum accuracy level was less than 2 percent. Instead, concerning the 4-class problem, the accuracy trend was much more variable and we achieved a maximum of 75.00 percent with the first six selected features. The features were ranked as shown in Table 5. In addition, classification results obtained through the subset of features that achieved the maximum average accuracy are shown also in the form of a confusion matrix and reported in Tables 3 (binary) and 4 (multiclass). The diagonal of the two tables represent the percentage of true positive of each class. As is shown in Table 4, we achieved an average accuracy of 75 percent for the 4-class problem. Stages BS and SA showed the best correct detection rates, i.e., 90.77 and 81.54 percent, respectively. The accuracy decreased considering, instead,  concerning the most accurate result of the binary classification, the confusion matrix shows a balanced accuracy as high as 91.03 percent (i.e., the average of the specificity and sensitivity of the confusion matrix). Of note, in this case, the balanced accuracy differs from the average one because the two classes did not comprise the same number of observations. In Table 3, we can observe a false positive rate (error type I) of 15.38 percent and a very low false negative rate of 2.56 percent (error type II). Considering the relevance of the tonic-related features, we performed a further classification analysis using only this feature subset as input. Results confirmed a high accuracy of 93.46 percent in the binary case, however, for the multi-class problem, the balanced accuracy significantly decreases to 65.38 percent. Therefore, the pattern      aucTonic AmpSum NS-EDR-freq recognition system guaranteed good performance even without the phasic-related information when the classification was quite simple, as the stress/non-stress detection, but it showed the relevant role of both components once that the recognition problem increased its complexity.

DISCUSSION AND CONCLUSION
In the current scientific literature about stress detection, there are methodological barriers that prevent the replicability of laboratory results in real-life scenarios.
Here, we presented a methodological pipeline that aims at creating the conditions to export our promising results outside the laboratory setting. Particularly, following a specific analysis strategy, we tried to overcome some of the common issues in the scientific literature, as it follows. First, we used a single sensor, i.e., the EDA sensor, selecting the one whose correlation with psychometric stress metrics has been proven to outperform the other ANS correlates frequently used in previous laboratory studies: ECG, electromyogram, and respiration [23]. EDA is particularly advantageous for stress monitoring because the sweat glands are exclusively innervated by the SNS, whereas most other organs are under the influence of both autonomic branches [75]. Moreover, EDA sensors have become a popular form of stress detection since they are non-obtrusive and easy to use due to a little cumbersome setup [10], [15], [76].
Second, we analyzed the recorded EDA signals using the cvxEDA method. This is a rigorous and robust model [55] to decompose the EDA into the phasic and tonic components automatically removing the noise and measurement errors and overcoming the common issues related to overlapped phasic responses both characteristic of real-life recordings [55]. In addition, cvxEDA can estimate the latent SNS activity signal. While EDA intrinsically shows very large variations among subjects, the estimated SNS activity, due to its neutral nature, has a reduced variability among subjects helping the statistical analysis. Indeed, the statistical comparison between the two relaxation sessions recorded on two different days, which showed no significant differences, gave the first evidence of the replicability of our method.
Finally, the last link in the processing chain lays in the pattern recognition system. Our approach combined the ability to reduce the number of features identifying the most relevant subset and performing a subject-independent stress recognition. Both two characteristics are extremely relevant in ecological scenarios. Indeed, they (i) reduce the computational cost, which is a critical factor for wearable systems, (ii) make our result more interpretable, identifying which signal properties are important, and (iii) make the estimated accuracy applicable to any new single-subject observation with no need of history from previous recording (thanks to the LOSO cross-validation). In addition, in real-world scenarios, it might be important to discriminate, along with different levels of stress, also different stressors. Accordingly, we tested whether our approach could distinguish the four experimental stages of our TSST, even though these stressors might be different and/or a subset of those encountered in real-life.
Indeed, although the final aim is to export results in ecological scenarios, the first step to investigate the psychophysiological response to acute stress needs a reliable and valid stressor, able to robustly induce an acute stress response under controlled conditions. The early detection of acute stress episodes can be a valid strategy to control their proliferation and prevent the transition to chronic stress. To this aim, we designed an experimental paradigm where we maintained the essential structure of the TSST, but we modified the stressors including social-evaluative threat and uncontrollability to induce more consistent stress responses [17], [62]. Analyses of psychometric measures, physiological signals, saliva cortisol change, blood copeptin, and prolactin, performed on a subset of subjects, have validated the effectiveness of our experimental protocol in inducing stress, as shown in [17]. Particularly, the psychometric test demonstrated a significant increase of the STAI-s and VAS scores among subjects after the TSST experiment, confirming that the TSST may produce measurable stress in the participants. In addition, as expected, the STAI-t and the PSS were not affected by the TSST. In fact, while questions related to state (STAIs) refer to the present moment, questions related to the trait (STAIt) refer to the subject's general tendency to perceive many situations as threatening. Therefore, specific time-limited acute stress elicited by the TSST was expected to cause changes in STAIs but not in STAIt [77]. Likewise, PSS is based on the reported feelings and thoughts during the last week, so it is supposed not to be affected by the stress session.
To date, only a few studies have proposed a stress detection based only on EDA, showing methodological limitations that reduce the application to real-life scenarios. In fact, despite satisfactory accuracies, they have often shown limitations in the EDA decomposition method, which affect the reliability of the estimated sympathetic response [47], [48], [50], [51]. More specifically, they have applied troughto-peak algorithms (i.e., without decompose into tonic and phasic components) [48], [50] or filtering approaches [47], [51], achieving an average accuracy between 74.19 and 85.5 percent in binary classification. These two methods have been extensively demonstrated to underestimate the sympathetic response in the frequent case of overlapped skin conductance phasic responses compared to a model-based approach (e.g., cvxEDA) [52], [54], [55], [66] (and others).
Both our statistical and classification results support the goodness of our methodological approach. In fact, as a preliminary step, we performed a statistical comparison between EDA features recorded at rest the day before the TSST protocol, and the baseline stage prior to the stress stimulation. No statistical differences were found between these two resting conditions suggesting that the subjects started the experiment from a non-stress condition.
A further monovariate statistical analysis was performed to compare the EDA features extracted during the TSST stress stages with those acquired at baseline. Results showed that most of the proposed time and frequency domain analysis provide significant features to distinguish not only the BS from the stress stages but also the different sources of stress: ST, SA, and VT. More in detail, we found that features medianTonic and aucTonic were able to differentiate among the stages pairs, except for ST-VE one, for which none of the time-domain features was found statistically different. In sight of this, we analyzed EDA in the frequency-domain, calculating the EDASymp index introduced by [69]. EDASymp detected significant variations between all stress stages, including ST and VE. EDASymp, as an SNS indicator, confirmed that during an acute stress condition, the sympathetic activity significantly increased and it may be modulated by the level of stress and the kind of stressors.
Afterward, to investigate whether EDA solely was able to provide accurate discrimination of stress states, we employed an automatic classification algorithm. Specifically, we chose the well-known SVM-RFE method of classification due to its increasingly widespread use in biomedical literature. A previous study on this dataset [17] has combined multiple physiological signals (i.e., skin temperature, heart rate, and pulse wave signals) and identified five statistically different stress levels induced by the different experimental tasks.
Through a LOSO validation, the present study successfully develops a single-subject level classification of stress state-based on EDA dynamics alone. Indeed, we obtained an accuracy of 94.62 percent from a binary classification of the stress/non-stress condition. Note that with a single feature (i.e., meanTonic) we already achieved an accuracy of over 92 percent. Moreover, we performed a further classification to distinguish among the three selected stressors together with the BS. The 4-class classification yielded an average accuracy of 75.00 percent. The lower detection rates were 64.61 and 63.07 percent for ST and VE, respectively. Meanwhile, the higher rates were obtained for BS and ST with 90.77 and 81.54 percent, respectively. It is worthwhile noting that the classification results demonstrated that stages VE and ST generate a distinguishable physiological response even showing slight similarities in the stress elicitation.
The feature-selection stage did not give a big contribution to improving binary accuracy. The maximum was achieved with eight features out of eleven and the difference among the RFE iterations was not significant. Nevertheless, it may perhaps be observed that the majority of the most relevant features were related to the tonic activity. Instead, when the 4-class problem is considered, the feature-selection procedure strongly improved the prediction performance. The most performing subset of features is a nonlinear combination of tonic, phasic and frequency-domain features. This result demonstrates that an effective, comprehensive assessment of ANS dynamics should include multivariate measures.

Limitations
Despite the promising results, results, some limitations have to be considered. First, the accuracy is still estimated on data acquired in controlled conditions. In ecological scenarios, the stress/non-stress conditions may not be such a clear-cut distinction or, at least, they may coexist simultaneously with other relevant psychophysiological conditions Therefore, it will be necessary to test our methodological approach also on new datasets acquired in the real environment. In this context, additional precautions would be likely considered, e.g., signal portions irremediably corrupted by macroscopic hand movements should be identified and excluded from the successive processing phase. The automatic identification of such segments can be performed by combining EDA sensors with an accelerometer, which is a common strategy in most of the commercial EDA devices (e.g., Empatica E4 or Shimmer GSR+). Also, the disadvantages due to a continuous, totally uncontrolled measure of the EDA should be avoided. Instead, it could be better to acquire data only at specific intervals of the day where confounding factor effects are mitigated (e.g., sitting in front of a desk for 30 sec). A second limitation concerns the TSST design, although we followed the same strategy of canonical TSST paradigms [61], [62], [78], the fixed-order stimulation might affect the perception of the stress induced at each different stage. Moreover, we could not administrate the psychometric stress tests after each TSST stage to verify the perceived level of stress. Third the sample population is very homogeneous in age. Since there might be an effect of age on the physiological response to TSST [62] as well as on ANS functioning [79], [80] and skin conductance measurement [81], having a homogeneous sample population is important when dealing with reduced sample sizes. However, these results might not be extrapolated to other agegroups, and further studies will be needed.
Furthermore, it is important to note that acute stress also induces (gender dependent) emotional response such as increases in anxiety and irritability, which may occur independently of SNS physiological response [82]. There is an open debate about whether the sympathetic, HPA axis, and affective components of the stress response are distinct events or elements of a coordinated integrated response [83], [84], [85]. In the literature, the TSST has repeatedly been demonstrated to induce reliable HPA responses, but less attention has been paid to their subjective-psychological (affective) concomitants [85]. In the current study, there are nuances in terms of the affective response that the different stressors could produce: e.g., SA and, especially, VE might induce a more affective response than the other TSST stages. However, here, we have not investigated this debated relationship and we have not performed a specific record of the affective response limiting our study on the recognition of different physiological (sympathetic) reactions to different stressors. Thus, further studies simultaneously evaluating both the physiological and psychological components of the stress response are needed to investigate their interrelationships and to understand how variation in stress responses, including sex differences, might influence the progression toward stress-related disorders [82], [85].

CONCLUSION
In conclusion, EDA was confirmed to be a good marker of stress by means of its phasic and tonic components. Accordingly, we proposed a pattern recognition system based on cvxEDA model and SVM algorithm able to automatically recognize stress at a single-subject level using only EDA correlates. This study is a further demonstration of reliable monitoring of acute stress levels using physiological signals and responds to the growing request to control this alarming disorder also in a free-living condition. EDA can be seen as an alternative to the more common ECG measure. Indeed, ECG acquired during the same experimental protocol has been already successfully analyzed in recent studies showing good results for the stress/non-stress assessment and the advantages in developing wearable systems with a limited number of sensors [58], [86]. However, our results outperform stress recognition performance shown using ECG exclusively, paving the path towards an unobtrusive and wearable device to assess stress, e.g., in working environments or during rehabilitative procedures in a natural fashion. Indeed, although many improvements are still needed in terms of the quality of the signal provided, several portable EDA sensors have been already developed. Some of them have been also integrated into wearable devices that are very popular in the everyday life (e.g., smartwatches). Accordingly, future endeavors will be directed to implement our approach into a portable device, which integrates EDA sensors. Particularly, such an EDA sensor will be able to implement a pseudo-real-time EDA deconvolution based on the cvxEDA approach. This will automatically and in pseudo-real-time filter the measurement noise and estimate the SNS neural activity addressing the issue caused by overlapped phasic responses.