SELECTING FEATURES BY UTILIZING INTUITIONISTIC FUZZY ENTROPY METHOD

: Feature selection is the most significant pre-processing activity, which intends to reduce the data dimensionality for enhancing the machine learning process. The evaluation of feature selection must consider classification, performance, efficiency, stability, and many factors. Nowadays, uncertainty is commonly occurred in the feature selection process due to time limitations, imprecise information, and the subjectivity of human minds. Moreover, the theory of intuitionistic fuzzy set has been proven as an extremely valuable tool to tackle the uncertainty and ambiguity that arises in many practical situations. Thus, this study introduces a novel feature selection framework using intuitionistic fuzzy entropy. In this regard, new entropy for IFS is proposed first and then compared with some of the previously developed entropy measures. As entropy is a measure of uncertainty present in data (features), features with higher entropy values are filtered out, and the remaining features having lower entropy values have been used to classify the data. To verify the effectiveness of the proposed entropy-based feature selection, some experiments are done with ten standard benchmark datasets by employing a support vector machine, K-nearest neighbor, and Naïve Bias classifiers. The outcomes of the study validate that the proposed entropy-based filter feature selection is more feasible and impressive than existing filter-based feature selection methods.


Introduction
Overfitting and the curse of dimensionality are the two biggest problems in machine learning."Feature selection (FS)" assists to evade both of these concerns by decreasing the number of features in the model and optimizing the framework's performance.In addition, FS gives an additional benefit of model interpretation.In case of minor features, the output model becomes easier to understand and more reliable for a human to expect future forecasts made by the model.Existing studies have classified the FS method into three categories: filter, wrapper, and embedded techniques (Blum and Langley, 1997).The filter method chooses features independent of any classification algorithm (Parlak and Uysal, 2021).This method has broadly been used for FS of text data due to its simplicity and not being prone to overfitting (Revanasiddappa and Harish, 2018).The wrapper method selects features based on the learning algorithm (Kohavi and John, 1997).It is more computationally expensive and takes more time than the filter method, which results from calling the learning algorithm for each feature set considered (Blum and Langley, 1997), and makes it unreasonable.On the other hand, the embedded method works as a part of the classification algorithm and ranks the feature during the learning stage.Compared to wrapper and embedded techniques, the filter-based techniques operate independently of any learning algorithm-undesirable features are filtered out of the data before induction commences and thus are more adaptable and less computationally intensive for diverse datasets and classifiers (Parlak and Uysal, 2021).
Uncertainty is an inherent feature of information.In several scientific and industrial applications, we make decisions in an environment with diverse kinds of uncertainty.The "fuzzy set theory (FST)" invented by Zadeh (1965) has successfully been employed in varied areas and demonstrated its powerful ability to treat with the vague and uncertain information.In the literature, several doctrines and principles have been studied on FST (Precup et al., 2020;Kumar et al., 2022;Bairagi et al., 2022).Further, Atanassov (1986) extended the FST to "intuitionistic fuzzy set (IFS)", which deals with the uncertain and ambiguous information more accurately.In IFSs, each object is defined with the degrees of membership and non-membership.The theory of IFS is one of the powerful and suitable tools to cope with the vagueness presented in numerous realistic decision-making applications.Research works on IFS theory and its applications in different settings are developing speedily, and several significant outcomes have been obtained (Kushwaha et al., 2020;Tripathi et al., 2022a, b;Hezam et al., 2022).
The notion of entropy provides a measure of information gained by comparing dissimilar attributes and a measure about the randomness of data.In the FS process, entropy can compute the amount of uncertainty and the quality of data information and improve the model's accuracy (Zhao et al., 2020;High et al., 2021).The Shannon entropy is usually considered the standard and most natural way to measure the expected value of the information.To deal with uncertain and vague information, the classic fuzzy entropy measure based on Shannon entropy has been widely applied for FS process (Tran et al., 2021).In the past, Lee et al. (2001) employed the fuzzy entropy to assess the information of pattern distribution by dividing the data into non-overlapping decision support systems.Further, Luukka (2011) used a feature selection approach using fuzzy entropy.As an extended version of FST, the notion of IFS (Atanassov, 1986) gives the details of both degrees of membership and nonmembership along with the hesitancy margin of its elements.Thus, it has a strong capability to describe the vagueness of the data in comparison with the FST.Very few authors have studied the FS methods based on intuitionistic fuzzy entropy (Revanasiddappa and Harish, 2018;Tiwari et al., 2019;Singh et al., 2019).
Based on the earlier studies, we identify the following research challenges: (i) Entropy, as an information measure, has been gained much attention in many fields such as deep learning, data science, machine learning, image segmentation, texture analysis etc.In the recent past, the notion of IFSs is proven as more flexible and superior way to model the vagueness and imprecision of complex real-life applications.However, several authors have proposed different intuitionistic fuzzy entropy measures, but these measures have some counter intuitive cases.(ii) In the literature, there is no study regarding the use of intuitionistic fuzzy entropy for feature selection.
Thus, the key contributions of the article are given by • New entropy for intuitionistic fuzzy set is developed and compared with the extant entropies under IFSs.• Proposed entropy has been utilized to choose the features which contain lower uncertainty (higher information) in the data.This work is based on "support vector machine (SVM)", "K-nearest neighbor (KNN)" and "Naïve Bias from the classification domain" for investigating the quality of solutions obtained by means of initial preprocessing and the information indicated by the intuitionistic fuzzy entropy.The rest parts of the sections are summarized as Section 2 confers the related works of FS process.Section 3 offers the proposed entropy for IFSs and shows its validity by comparing extant entropies.Section 4 introduces a novel FS framework using developed IF-entropy.Section 5 explains the experimental work of the proposed algorithm.Section 6 presents the conclusion with the needful potential research direction.

Literature review
FS is defined as "one of the well-known dimensionality reduction methods, which can select a small subset of significant and non-redundant features from the original information systems" (Omuya et al., 2021).The main aim of FS is to remove redundant and/or irrelevant features, improve the learning algorithm's performance, reduce the cost of computation, and offer more explicit and concise description of data (Murugesan et al., 2021).Several notions of FS have been presented in machine learning, data mining, bioinformatics, text categorization, signal processing and others (Kim and Zzang, 2019;Álvarez et al., 2019, Ruan et al., 2021, Pintas et al., 2021).In general, the FS techniques are partitioned into a filter, wrapper and embedded methods (Tang et al., 2016;Rehman et al., 2017).A filter method performs statistical analysis to recognize the more relevant features from the less significant ones.Consequently, it is free from the classification algorithm (Revanasiddappa and Harish, 2018).The wrapper method uses the particle swarm optimization algorithm (Ji et al., 2020), the genetic algorithm (Rostami et al., 2021) or other searching algorithms to find an optimum feature set (Chen et al.,2021), while the embedded method selects the features during the model training.As compared to the wrapper and embedded techniques, the filter-based techniques are more flexible, speedy and have less computational complexity.
Due to the advantages of filter-based methods over other methods, numerous research efforts have been made in the literature.Existing studies have presented diverse filter-based methods, which are document frequency (Kim and Zzang, 2019), term frequency-inverse document frequency (Thakkar and Chaudhari, 2020), mutual information (Gao and Wu, 2020), entropy-based FS (Zhang et al., 2021a) and others (Bahassine et al., 2020;Omuya et al., 2021).Amongst these FS methods, the entropybased FS technique calculates the amount of uncertainty and the eminence of information.Several authors have focused their attention on entropy-based FS techniques.For illustration, Sun et al. (2012) adopted the concept of Shannon entropy to develop a universal FS method to select relevant and significant features.Inspired by the idea of entropy, Jaganathan and Kuppuchamy (2013) discussed an innovative FS method and its application in medical database classification.Zhang et al. (2016) formulated a hybrid FS model by combining entropy and fuzzy rough set for managing the mixed dataset.Inspired by Luukka (2011), Lohrmann et al. (2018) offered a fuzzy entropy and degree of similarity-based FS technique to distinguish the appropriate features.In a study, Sun et al. (2019) used the notions of Lebesgue and entropy measures to introduce a hybrid FS model for mixed and incomplete neighbourhood decision systems.Recently, Qu et al. (2020) studied an innovative FS model based on non-unique decision differential entropy for handling the nominal data.Based on correlation and relative entropy, Aremu et al. (2020) pioneered a novel feature engineering method for raw asset data.Later, Sun et al. (2021) recommended a hybrid fuzzy neighborhood entropy-based FS model for heterogeneous datasets.Motivated by the fuzzy-neighborhood relative decision entropy, Zhang et al. (2021b) designed an integrated feature ranking method and verified its validity through a data experiment and a numerical example.As an extended version of the fuzzy set, IFSs (Atanassov, 1986) are more reliable and efficient to tackle the uncertainty in real applications.Feature selection models based on intuitionistic fuzzy entropy can provide more suitability to extract the features in data mining, machine learning and text categorization.However, very few studies have been presented regarding the intuitionistic fuzzy entropy-based FS methods for classifiers in the literature (Singh et al., 2019;Tiwari et al., 2019).

Proposed intuitionistic fuzzy entropy
The current section firstly presents basic definitions.Then, new entropy for IFS is proposed with its enviable characteristics.

Preliminaries
This section concentrates on the demonstration of the decision information based on IFSs.In the following step, this study focuses on the aggregation based on the IF-CoCoSo method.
In the FSs doctrine, the MF of an element is represented based on the interval number of [0, 1], whereas the NF essentially is complement.Though, in concern, this hypothesis does not meet with human opinions.Hence, Atanassov (1986) defined the IFSs as follows: Definition 1.In the following, Zadeh (1965) presents the mathematical definition of fuzzy set F on a fixed universal set   12 , , ..., : In 1986, Atanassov (1986) defined the mathematical form of an IFS F on Y as where represent the grades of membership and non-membership of For each , i yY  the indeterminacy degree is defined by  is defined as an "intuitionistic fuzzy number (IFN)" and denoted by ( ) Definition 3 (Atanassov, 1986).For any two intuitionistic fuzzy numbers ( ) the basic operational laws are presented as ( ) is called entropy for IFS if (a1).
( ) iff the hesitancy index of the elements of G is less than the hesitancy index of the elements of F, i.e., .
GF  

F IFS Y 
then the entropy measure for IFS is given by Theorem 1.A mapping ( ) eF given by Eq. ( 7), is a valid IF-entropy.

Proof:
In order to verify the theorem, the mapping ( ) eF must hold the postulates (a1)-(a5) of Definition 4.
(a1).In IFS, we know that ( ) ( )  == Then, from Eq. ( 7), we have For each , i yY  the left-hand side of Eq. ( 8) cannot exceed 1.And, Eq. ( 1) is satisfied only when for all , i yY  the left-hand side should be 1.
therefore, Eq. ( 7) can only be satisfied when ( ) and this can only be zero when ( ) ( ) 1.
side of Eq. ( 9) has minimum value -1.Hence, Eq. ( 4) is satisfied only when the lefthand side equals to -1 for all , Eq. ( 10) is obtained when This completes the proof.

Performance of the proposed entropy for IFS with extant entropies
In the following, we first recall some of the extant intuitionistic fuzzy entropy measures to show the usefulness and utility of proposed entropy for IFSs: ( ) .

Calculation of proposed intuitionistic fuzzy entropy for the dataset
The current section contains an algorithm for the calculation of IF-entropy of the attributes for the dataset.For calculating membership and non-membership functions, we have used the Bell-shaped function implemented by Rangasamy ( 2021 ( ) ( ) We have taken datasets and the features of these datasets bear uncertainty.We need to filter out the higher uncertainty features before performing the classification task.As entropy is utilized as a measure of uncertainty (Zhang et al., 2021a), we have calculated the entropy of features using proposed intuitionistic fuzzy entropy and then filter out the features with higher entropies and provide the dataset (with filtered attributes) to the SVM, KNN and Naïve Bias classifiers.Then, we have measured classification accuracy with raw data as well as with filtered data.In most cases, we are getting better accuracy with a reduced feature set (as discussed below in Section 5).

Experimentation
For evaluating the proposed approach, we have used 10 benchmark datasets.All datasets except cancer patients are obtained from the "UCI (University of California, Irvine) machine learning repository" (Dua and Graff, 2019) for our proposed algorithm's qualitative and quantitative analysis.The indicators within each dataset have a diversity of characteristics (i.e., several binary/discrete and several continuous).Table 2 contains the details of benchmark datasets used for experimentation.3 presents the features obtained by various filter-based methods and our proposed IFE-FS method for benchmark datasets for filtering out.Here, the rank represents order of features obtained by feature selection methods for filtering out, i.e., if the threshold number is 2 then features at rank 1 and rank 2 will be filtered out.If the threshold number is 3, then 3 features (at rank 1, rank 2 and rank 3) will be filtered out.For each dataset, we have assigned numbers to features to increase their presence in datasets.The threshold number is decided experimentally.We have tried different features and selected the number of features giving the highest accuracy.For lack of space, only the first three ranks are shown in Table .As we can see in the first row of the table (Cancer Coimbra dataset) that our proposed IFE based feature selection (IFE-FS) method obtains p5 at first rank while FE-FS (fuzzy entropy-based feature selection), ReliefF, mRMR and Mutinf obtains the same feature p8 at first rank.Lasso method obtains p9 and the Laplacian method obtains p2.For each data set three rows are mentioning first three features for filtering out.
To prove the performance of introduced technique, we have provided selected features (after removing/filtering out higher entropy features) based on our proposed IFE-FS method and six other filter methods, i.e., fuzzy entropy base feature selection(FE-FS) (Luukka, 2011), ReliefF (Eiras-Franco and Guijarro-Berdiñas, 2021), Lasso (Coelho and Costa et al.,2020), Laplacian (He et al., 2005), mRMR (Jo et al., 2019) and Mutinf (Estevez, and Tesmer et al., 2009) to three types of classifiers, i.e., SVM and KNN and Naïve Bias classifiers.For all experimental work, we have used MATLAB 2020.For classification, we have used Classification Learner App available in MATLAB 2020.We have used Linear SVM and Fine KNN and Naïve Bias Classifiers.Table 4(a)-4(c) summarize the accuracy of SVM, KNN and Naïve Bias classifiers with the benchmark datasets using raw data as well as with selected features.The brief introductions of these classifiers are given below: Naïve Bias: "Naïve Bayesian classifier is a statistical supervised machine learning algorithm that predicts class membership probabilities.NB achieves high accuracy and speed when applied to a large dataset (Shah and Jivani, 2013), but it also works very well in small datasets (Delizo et al, 2020).The Naïve Bayes algorithm does not depend on the presence of other parameters, and that is why it is called naïve.Naïve Bias is a greedy classifier and has interpretability (Zaidi et al., 2013)".
Support Vector Machine: "The support vector machine (SVM) is preferred by data scientists because it can achieve good generalization performance without the need for former knowledge or experience (Chapelle et al., 2018).The SVM algorithm makes use of a hyperplane that separates the instances, putting the same classes in the same division while maximizing each group's distance in the same division while maximizing each group's distance from the dividing hyperplane".
k-Nearest Neighbors: "k-Nearest Neighbors (k-NN)" is one of the simplest and oldest supervised machine learning algorithms used in classification; it classifies a given instance via the majority of the classes among its k-nearest neighbors found in the dataset (Sun et al., 2010).This algorithm relies on the distance metric used to determine the nearest neighbors of the given instance.The two primary benefits of the k-Nearest Neighbor algorithm are efficiency and flexibility".
It is evident from Table 4(a) that the classification accuracy of the SVM classifier with selected features obtained by our proposed method can be enhanced or kept by in all of the cases (for all benchmark datasets) compared with the raw data.For 60 % of benchmark datasets, SVM has achieved highest or similar to highest accuracy with selected features using our proposed method except (40% cases) Cancer Coimbra, Sonar, Heart and Glass datasets, e.g., for Cancer Coimbra, the highest accuracy is 76.7% (Lasso method).In comparison, in our case, it is 72.4%.For the Heart dataset highest accuracy is 83.2% (Laplacian Method), while in our case it is 82.8%.For Sonar dataset, the highest accuracy is 78.4% (mRMR method) while it is 76%.For the Glass dataset highest accuracy is 64.5% (mRMR method), while in our case, it is 64%.Moreover, the average accuracy obtained by the SVM classifier is highest, i.e., 78.83%using selected features by our proposed method.It is evident from Table 4(b) that the classification accuracy of the KNN classifier with selected features obtained by our proposed method can be enhanced or kept in all of the cases (for all benchmark datasets) compared with the raw data.For 80% of benchmark datasets, KNN has achieved the highest or similar to highest accuracy with selected features using our proposed method except for (20% benchmark datasets) Seeds and Sonar datasets, e.g., for seeds dataset, the highest accuracy is 93.3% (mRMR method).In comparison, in our case, it is 92.4%.For the Sonar dataset highest accuracy is 83.2% (Laplacian Method), while in our case, it is 82.8%.For the Sonar dataset highest accuracy is 88.9% (mRMR method), while in our case, it is 85.6%.However, the average accuracy obtained by the KNN classifier is highest, i.e., 79.13% using selected features by our proposed method.
It is evident from Table 4(c) that the classification accuracy of the Naïve Bias classifier with selected features obtained by our proposed method can be enhanced or kept in all of the cases (for all benchmark datasets) when compared with the raw data.For 80% of benchmark datasets, KNN has achieved highest or similar to highest accuracy with selected features using our proposed method except for (20% benchmark datasets) Cancer Coimbra and Sonar datasets, e.g., for Cancer Coimbra dataset, the highest accuracy is 68.1% (Lasso method).In comparison, in our case, it is 67.2%.For the Sonar dataset highest accuracy is 68.8% (Mutinf Method), while in our case, it is 68.3%.However, the average accuracy obtained by the Naïve Bias classifier is highest, i.e., 75.03% using selected features by our proposed method.

Conclusions
This study proposes a new intuitionistic fuzzy entropy-based algorithm for feature selection before classification tasks in an information system.For this purpose, new intuitionistic fuzzy entropy has been developed to measure feature entropy (uncertainty) as parameters for feature selection.The experimental results are compared with the existing filter-based technique and prove that the proposed technique perfectly fits the hidden information.Afterwards, the developed model was executed on ten real benchmark datasets.Classification accuracies for diverse classifiers, i.e., SVM, KNN and Naïve Bias, were evaluated.Based on experimental results, it has been acquired that the average accuracy was 78.83% with the SVM classifier using selected feature obtained by our proposed IFE-FS, which is the highest among all the accuracies which have been obtained using selected features obtained by other filter methods i.e., with fuzzy entropy based feature selection (75.82%),ReliefF (77.26%),Lasso (76.50%),Laplacian (77.22%), mRMR (77.28%) and Mutinf (75.76%), while with raw data it was 76.77%.In addition, we have noticed that the average performance of the SVM classifier with features obtained by the proposed IFE-FS model obtains an increment of 2.09%, which is the highest increment (with Relief (0.52%), with Laplacian (0.48%), with mRMR (0.54%) compared to the raw data, while with the fuzzy entropy-based feature selection algorithm, with Lasso feature selection method and with mutinfit obtains decrement of 0.24% and 0.92% and 0.98% respectively.The average accuracy was 79.41% with the KNN classifier using selected feature obtained by proposed IFE-FS which is highest among all the accuracies which have been obtained using selected features obtained by other filter methods, i.e., with fuzzy entropy-based feature selection (74.32%),ReliefF (74.93%),Lasso (74.83%),Laplacian (75.48%), mRMR (75.99%) and Mutinf (73.10%), while with raw data it was 76.41%.In addition, we have noticed that the average performance of KNN classifier with selected features obtained by the proposed IFE-FS method obtains an increment of 2.72%, while with selected features obtained by other filter methods, i.e., Fuzzy entropy-based method, Relief, Lasso, Laplacian, mRMR, and Mutinf it is getting decrement of 2.09%, 1.48%, 1.57%, 0.93%, 0.42% and 3.31% respectively, compared to the raw data.A similar observation has been obtained with the Naïve Bias classifier also.It has been found that the reduced datasets perform quite well than unreduced datasets in terms of resulting classification accuracy.While comparing FE-FS and some other filter filterbased feature selection methods, we have obtained different features at the same rank.While doing feature selection by filtering out features with higher entropies and providing remaining features to the classifiers, we have acquired better accuracy in most cases with our proposed method than other filter-based feature selection methods.Thus, we can conclude that our technique is more efficient in dealing with uncertainties of features and is perfectly fit to locate the hidden information while doing FS.
The certain limitations of the developed framework are important to be aware of.A practical difficulty is that the experts must be trained with the preference style to properly use the flexibility and potential of intuitionistic fuzzy numbers.In addition, this work has limitation in dealing with more uncertain decision-making problems because of the constraint condition of intuitionistic fuzzy set.Future research studies will try to handle the limitations of this work.Moreover, it would be exciting to use the presented model for feature selection with high-dimensional regression in the future.Also, we can generalize the introduced technique under different uncertain contexts such as spherical set, Pythagorean fuzzy set, neutrosophic set, q-rung orthopair fuzzy rough set and linguistic generalized orthopair fuzzy set.

Figure 1 .
Figure 1.Flowchart of the proposed methodologyFigure1presents the utilization of proposed IFE for feature selection.We have taken datasets and the features of these datasets bear uncertainty.We need to filter out the higher uncertainty features before performing the classification task.As entropy is utilized as a measure of uncertainty(Zhang et al., 2021a), we have calculated the entropy of features using proposed intuitionistic fuzzy entropy and then filter out the features with higher entropies and provide the dataset (with filtered attributes) to the SVM, KNN and Naïve Bias classifiers.Then, we have measured classification accuracy with raw data as well as with filtered data.In most cases, we are getting better accuracy with a reduced feature set (as discussed below in Section 5).

Figure 2 (Figure 2 .
Figure 2(c).Average classification accuracy with Naïve Bias Figure 2. Average classification accuracy of the developed technique with different classifiers

Table 1 .
Now, we compare the outcomes achieved by developed and extant IF-entropies.The entropy measures follow the pattern of Eq. (23).Results by different entropies

Table 2 .
Benchmark data for experimental work

Table 3 .
Lowly ranked features using various filter feature selection algorithms

Table 4 :
Comparative results showing performance of SVM and KNN and Naïve Bias