PeerJ Computer Science - 雨日青学习小站

Vehicle detection and classification using an ensemble of EfficientDet and YOLOv8

Lv C, Mittal U, Madaan V and Agrawal P

With the rapid increase in vehicle numbers, efficient traffic management has become a critical challenge for society. Traditional methods of vehicle detection and classification often struggle with the diverse characteristics of vehicles, such as varying shapes, colors, edges, shadows, and textures. To address this, we proposed an innovative ensemble method that combines two state-of-the-art deep learning models EfficientDet and YOLOv8. The proposed work leverages data from the Forward-Looking Infrared (FLIR) dataset, which provides both thermal and RGB images. To enhance the model performance and to address the class imbalances, we applied several data augmentation techniques. Experimental results demonstrate that the proposed ensemble model achieves a mean average precision (mAP) of 95.5% on thermal images, outperforming the individual performances of EfficientDet and YOLOv8, which achieved mAPs of 92.6% and 89.4% respectively. Additionally, the ensemble model attained an average recall (AR) of 0.93 and an optimal localization recall precision (oLRP) of 0.08 on thermal images. For RGB images, the ensemble model achieved mAP of 93.1%, AR of 0.91, and oLRP of 0.10, consistently surpassing the performance of its constituent models. These findings highlight the effectiveness of proposed ensemble approach in improving vehicle detection and classification. The integration of thermal imaging further enhances detection capabilities under various lighting conditions, making the system robust for real-world applications in intelligent traffic management.

View more:

Pubmed

PeerJ Comput Sci

PMC Article

Representation of negative numbers: point estimation tasks using multi-reference sonification mappings

Putra ZP and Setiawan D

In this study, we examine different approaches to the presentation of coordinates in mobile auditory graphs, including the representation of negative numbers. These studies involved both normally sighted and visually impaired users, as there are applications where normally sighted users might employ auditory graphs, such as the unseen monitoring of stocks, or fuel consumption in a car. Multi-reference sonification schemes are investigated as a means of improving the performance of mobile non-visual point estimation tasks. The results demonstrated that both populations are able to carry out point estimation tasks with a good level of performance when presented with auditory graphs using multiple reference tones. Additionally, visually impaired participants performed better on graphs represented in this format than normally sighted participants. This work also implements the component representation approach for negative numbers to represent the mapping by using the same positive mapping reference for the digit and adding a sign before the digit which leads to a better accuracy of the polarity sign. This work contributes to the areas of the design process of mobile auditory devices in human-computer interaction and proposed a methodological framework related to improving auditory graph performance in graph reproduction.

View more:

Pubmed

PeerJ Comput Sci

PMC Article

A feature-enhanced knowledge graph neural network for machine learning method recommendation

Zhang X and Guo J

Large amounts of machine learning methods with condensed names bring great challenges for researchers to select a suitable approach for a target dataset in the area of academic research. Although the graph neural networks based on the knowledge graph have been proven helpful in recommending a machine learning method for a given dataset, the issues of inadequate entity representation and over-smoothing of embeddings still need to be addressed. This article proposes a recommendation framework that integrates the feature-enhanced graph neural network and an anti-smoothing aggregation network. In the proposed framework, in addition to utilizing the textual description information of the target entities, each node is enhanced through its neighborhood information before participating in the higher-order propagation process. In addition, an anti-smoothing aggregation network is designed to reduce the influence of central nodes in each information aggregation by an exponential decay function. Extensive experiments on the public dataset demonstrate that the proposed approach exhibits substantial advantages over the strong baselines in recommendation tasks.

View more:

Pubmed

PeerJ Comput Sci

PMC Article

Detecting rumors in social media using emotion based deep learning approach

Sharma D and Srivastava A

Social media, an undeniable facet of the modern era, has become a primary pathway for disseminating information. Unverified and potentially harmful rumors can have detrimental effects on both society and individuals. Owing to the plethora of content generated, it is essential to assess its alignment with factual accuracy and determine its veracity. Previous research has explored various approaches, including feature engineering and deep learning techniques, that leverage propagation theory to identify rumors. In our study, we place significant importance on examining the emotional and sentimental aspects of tweets using deep learning approaches to improve our ability to detect rumors. Leveraging the findings from the previous analysis, we propose a Sentiment and EMotion driven TransformEr Classifier method (SEMTEC). Unlike the existing studies, our method leverages the extraction of emotion and sentiment tags alongside the assimilation of the content-based information from the textual modality, , the main tweet. This meticulous semantic analysis allows us to measure the user's emotional state, leading to an impressive accuracy rate of 92% for rumor detection on the "PHEME" dataset. The validation is carried out on a novel dataset named "Twitter24". Furthermore, SEMTEC exceeds standard methods accuracy by around 2% on "Twitter24" dataset.

View more:

Pubmed

PeerJ Comput Sci

PMC Article

Anonymous group structure algorithm based on community structure

Kuang L, Si K and Zhang J

A social network is a platform that users can share data through the internet. With the ever-increasing intertwining of social networks and daily existence, the accumulation of personal privacy information is steadily mounting. However, the exposure of such data could lead to disastrous consequences. To mitigate this problem, an anonymous group structure algorithm based on community structure is proposed in this article. At first, a privacy protection scheme model is designed, which can be adjusted dynamically according to the network size and user demand. Secondly, based on the community characteristics, the concept of fuzzy subordinate degree is introduced, then three kinds of community structure mining algorithms are designed: the fuzzy subordinate degree-based algorithm, the improved Kernighan-Lin algorithm, and the enhanced label propagation algorithm. At last, according to the level of privacy, different anonymous graph construction algorithms based on community structure are designed. Furthermore, the simulation experiments show that the three methods of community division can divide the network community effectively. They can be utilized at different privacy levels. In addition, the scheme can satisfy the privacy requirement with minor changes.

View more:

Pubmed

PeerJ Comput Sci

PMC Article

Enhancing intrusion detection performance using explainable ensemble deep learning

Ben Ncir CE, Ben HajKacem MA and Alattas M

Given the exponential growth of available data in large networks, the need for an accurate and explainable intrusion detection system has become of high necessity to effectively discover attacks in such networks. To deal with this challenge, we propose a two-phase Explainable Ensemble deep learning-based method (EED) for intrusion detection. In the first phase, a new ensemble intrusion detection model using three one-dimensional long short-term memory networks (LSTM) is designed for an accurate attack identification. The outputs of three classifiers are aggregated using a meta-learner algorithm resulting in refined and improved results. In the second phase, interpretability and explainability of EED outputs are enhanced by leveraging the capabilities of SHape Additive exPplanations (SHAP). Factors contributing to the identification and classification of attacks are highlighted which allows security experts to understand and interpret the attack behavior and then implement effective response strategies to improve the network security. Experiments conducted on real datasets have shown the effectiveness of EED compared to conventional intrusion detection methods in terms of both accuracy and explainability. The EED method exhibits high accuracy in accurately identifying and classifying attacks while providing transparency and interpretability.

View more:

Pubmed

PeerJ Comput Sci

PMC Article

Dynamic stacking ensemble for cross-language code smell detection

Aljamaan H

Code smells refer to poor design and implementation choices by software engineers that might affect the overall software quality. Code smells detection using machine learning models has become a popular area to build effective models that are capable of detecting different code smells in multiple programming languages. However, the process of building of such effective models has not reached a state of stability, and most of the existing research focuses on Java code smells detection. The main objective of this article is to propose dynamic ensembles using two strategies, namely greedy search and backward elimination, which are capable of accurately detecting code smells in two programming languages (., Java and Python), and which are less complex than full stacking ensembles. The detection performance of dynamic ensembles were investigated within the context of four Java and two Python code smells. The greedy search and backward elimination strategies yielded different base models lists to build dynamic ensembles. In comparison to full stacking ensembles, dynamic ensembles yielded less complex models when they were used to detect most of the investigated Java and Python code smells, with the backward elimination strategy resulting in less complex models. Dynamic ensembles were able to perform comparably against full stacking ensembles with no significant detection loss. This article concludes that dynamic stacking ensembles were able to facilitate the effective and stable detection performance of Java and Python code smells over all base models and with less complexity than full stacking ensembles.

View more:

Pubmed

PeerJ Comput Sci

PMC Article

Enhancing geotechnical damage detection with deep learning: a convolutional neural network approach

de Araujo TMA, Teixeira CAM and Francês CRL

Most natural disasters result from geodynamic events such as landslides and slope collapse. These failures cause catastrophes that directly impact the environment and cause financial and human losses. Visual inspection is the primary method for detecting failures in geotechnical structures, but on-site visits can be risky due to unstable soil. In addition, the body design and hostile and remote installation conditions make monitoring these structures inviable. When a fast and secure evaluation is required, analysis by computational methods becomes feasible. In this study, a convolutional neural network (CNN) approach to computer vision is applied to identify defects in the surface of geotechnical structures aided by unmanned aerial vehicle (UAV) and mobile devices, aiming to reduce the reliance on human-led on-site inspections. However, studies in computer vision algorithms still need to be explored in this field due to particularities of geotechnical engineering, such as limited public datasets and redundant images. Thus, this study obtained images of surface failure indicators from slopes near a Brazilian national road, assisted by UAV and mobile devices. We then proposed a custom CNN and low complexity model architecture to build a binary classifier image-aided to detect faults in geotechnical surfaces. The model achieved a satisfactory average accuracy rate of 94.26%. An AUC metric score of 0.99 from the receiver operator characteristic (ROC) curve and matrix confusion with a testing dataset show satisfactory results. The results suggest that the capability of the model to distinguish between the classes 'damage' and 'intact' is excellent. It enables the identification of failure indicators. Early failure indicator detection on the surface of slopes can facilitate proper maintenance and alarms and prevent disasters, as the integrity of the soil directly affects the structures built around and above it.

View more:

Pubmed

PeerJ Comput Sci

PMC Article

Terrorism group prediction using feature combination and BiGRU with self-attention mechanism

Abdalsalam M, Li C, Dahou A and Kryvinska N

The world faces the ongoing challenge of terrorism and extremism, which threaten the stability of nations, the security of their citizens, and the integrity of political, economic, and social systems. Given the complexity and multifaceted nature of this phenomenon, combating it requires a collective effort, with tailored methods to address its various aspects. Identifying the terrorist organization responsible for an attack is a critical step in combating terrorism. Historical data plays a pivotal role in this process, providing insights that can inform prevention and response strategies. With advancements in technology and artificial intelligence (AI), particularly in military applications, there is growing interest in utilizing these developments to enhance national and regional security against terrorism. Central to this effort are terrorism databases, which serve as rich resources for data on armed organizations, extremist entities, and terrorist incidents. The Global Terrorism Database (GTD) stands out as one of the most widely used and accessible resources for researchers. Recent progress in machine learning (ML), deep learning (DL), and natural language processing (NLP) offers promising avenues for improving the identification and classification of terrorist organizations. This study introduces a framework designed to classify and predict terrorist groups using bidirectional recurrent units and self-attention mechanisms, referred to as BiGRU-SA. This approach utilizes the comprehensive data in the GTD by integrating textual features extracted by DistilBERT with features that show a high correlation with terrorist organizations. Additionally, the Synthetic Minority Over-sampling Technique with Tomek links (SMOTE-T) was employed to address data imbalance and enhance the robustness of our predictions. The BiGRU-SA model captures temporal dependencies and contextual information within the data. By processing data sequences in both forward and reverse directions, BiGRU-SA offers a comprehensive view of the temporal dynamics, significantly enhancing classification accuracy. To evaluate the effectiveness of our framework, we compared ten models, including six traditional ML models and four DL algorithms. The proposed BiGRU-SA framework demonstrated outstanding performance in classifying 36 terrorist organizations responsible for terrorist attacks, achieving an accuracy of 98.68%, precision of 96.06%, sensitivity of 96.83%, specificity of 99.50%, and a Matthews correlation coefficient of 97.50%. Compared to state-of-the-art methods, the proposed model outperformed others, confirming its effectiveness and accuracy in the classification and prediction of terrorist organizations.

View more:

Pubmed

PeerJ Comput Sci

PMC Article

A novel approach to secure communication in mega events through Arabic text steganography utilizing invisible Unicode characters

Khan EA

Mega events attract mega crowds, and many data exchange transactions are involved among organizers, stakeholders, and individuals, which increase the risk of covert eavesdropping. Data hiding is essential for safeguarding the security, confidentiality, and integrity of information during mega events. It plays a vital role in reducing cyber risks and ensuring the seamless execution of these extensive gatherings. In this paper, a steganographic approach suitable for mega events communication is proposed. The proposed method utilizes the characteristics of Arabic letters and invisible Unicode characters to hide secret data, where each Arabic letter can hide two secret bits. The secret messages hidden using the proposed technique can be exchanged emails, text messages, and social media, as these are the main communication channels in mega events. The proposed technique demonstrated notable performance with a high-capacity ratio averaging 178% and a perfect imperceptibility ratio of 100%, outperforming most of the previous work. In addition, it proves a performance of security comparable to previous approaches, with an average ratio of 72%. Furthermore, it is better in robustness than all related work, with a robustness against 70% of the possible attacks.

View more:

Pubmed

PeerJ Comput Sci

PMC Article

Improving synthetic media generation and detection using generative adversarial networks

Zia R, Rehman M, Hussain A, Nazeer S and Anjum M

Synthetic images ar---e created using computer graphics modeling and artificial intelligence techniques, referred to as deepfakes. They modify human features by using generative models and deep learning algorithms, posing risks violations of social media regulations and spread false information. To address these concerns, the study proposed an improved generative adversarial network (GAN) model which improves accuracy while differentiating between real and fake images focusing on data augmentation and label smoothing strategies for GAN training. The study utilizes a dataset containing human faces and employs DCGAN (deep convolutional generative adversarial network) as the base model. In comparison with the traditional GANs, the proposed GAN outperform in terms of frequently used metrics ., Fréchet Inception Distance (FID) and accuracy. The model effectiveness is demonstrated through evaluation on the Flickr-Faces Nvidia dataset and Fakefaces d--ataset, achieving an FID score of 55.67, an accuracy of 98.82%, and an F1-score of 0.99 in detection. This study optimizes the model parameters to achieve optimal parameter settings. This study fine-tune the model parameters to reach optimal settings, thereby reducing risks in synthetic image generation. The article introduces an effective framework for both image manipulation and detection.

View more:

Pubmed

PeerJ Comput Sci

PMC Article

Towards an automated classification phase in the software maintenance process using decision tree

Alturki S and Almoaiqel S

The software maintenance process is costly, accounting for up to 70% of the total cost in the software development life cycle (SDLC). The difficulty of maintaining software increases with its size and complexity, requiring significant time and effort. One way to alleviate these costs is to automate parts of the maintenance process. This research focuses on the automation of the classification phase using decision trees (DT) to sort, rank, and accept/reject maintenance requests (MRs) for mobile applications. Our dataset consisted of 1,656 MRs. We found that DTs could automate sorting and accepting/rejecting MRs with accuracies of 71.08% and 64.15%, respectively, though ranking accuracy was lower at 50%. While DTs can reduce costs, effort, and time, human verification is still necessary.

View more:

Pubmed

PeerJ Comput Sci

PMC Article

Joint coordinate attention mechanism and instance normalization for COVID online comments text classification

Zhu R, Gao HH and Wang Y

The majority of extant methodologies for text classification prioritize the extraction of feature representations from texts with high degrees of distinction, a process that may result in computational inefficiencies. To address this limitation, the current study proposes a novel approach by directly leveraging label information to construct text representations. This integration aims to optimize the use of label data alongside textual content.

View more:

Pubmed

PeerJ Comput Sci

PMC Article

Art design integrating visual relation and affective semantics based on Convolutional Block Attention Mechanism-generative adversarial network model

Shen J and Wang J

Scene-based image semantic extraction and its precise sentiment expression significantly enhance artistic design. To address the incongruity between image features and sentiment features caused by non-bilinear pooling, this study introduces a generative adversarial network (GAN) model that integrates visual relationships with sentiment semantics. The GAN-based regularizer is utilized during training to incorporate target information derived from the contextual information into the process. This regularization mechanism imposes stronger penalties for inaccuracies in subject-object type predictions and integrates a sentiment corpus to generate more human-like descriptive statements. The capsule network is employed to reconstruct sentences and predict probabilities in the discriminator. To preserve crucial focal points in feature extraction, the Convolutional Block Attention Mechanism (CBAM) is introduced. Furthermore, two bidirectional long short-term memory (LSTM) modules are used to model both target and relational contexts, thereby refining target labels and inter-target relationships. Experimental results highlight the model's superiority over comparative models in terms of accuracy, BiLingual Evaluation Understudy (BLEU) score, and text preservation rate. The proposed model achieves an accuracy of 95.40% and the highest BLEU score of 16.79, effectively capturing both the label content and the emotional nuances within the image.

View more:

Pubmed

PeerJ Comput Sci

PMC Article

Predicting social media users' indirect aggression through pre-trained models

Zhou Z, Yu M, Peng X and He Y

Indirect aggression has become a prevalent phenomenon that erodes the social media environment. Due to the expense and the difficulty in determining objectively what constitutes indirect aggression, the traditional self-reporting questionnaire is hard to be employed in the current cyber area. In this study, we present a model for predicting indirect aggression online based on pre-trained models. Building on Weibo users' social media activities, we constructed basic, dynamic, and content features and classified indirect aggression into three subtypes: social exclusion, malicious humour, and guilt induction. We then built the prediction model by combining it with large-scale pre-trained models. The empirical evidence shows that this prediction model (ERNIE) outperforms the pre-trained models and predicts indirect aggression online much better than the models without extra pre-trained information. This study offers a practical model to predict users' indirect aggression. Furthermore, this work contributes to a better understanding of indirect aggression behaviors and can support social media platforms' organization and management.

View more:

Pubmed

PeerJ Comput Sci

PMC Article

TechMark: a framework for the development, engagement, and motivation of software teams in IT organizations based on gamification

Obaid I and Farooq MS

In today's fast-moving world of information technology (IT), software professionals are crucial for a company's success. However, they frequently experience low motivation as a result of competitive pressures, unclear incentives, and communication gaps. This underscores the critical need to handle these internal marketing challenges such as employee motivation, development, and engagement in IT organizations. Internal marketing practices aiming at attracting, engaging, and inspiring employees to use excellent services have become increasingly important. Internal marketing is attracting, engaging, and motivating employees as internal customers to utilize their quality services. Gamification has emerged as a significant trend over recent years. Despite the expanding use of gamification in the workplace, there is still a lack of focus on internal marketing tactics that incorporate gamification approaches. Thus, addressing the challenges related to employee motivation, development, and engagement is crucial. Therefore, as a principal contribution, this research presents a comprehensive framework designed to implement gamified solutions for software teams of IT organizations. This framework has been tailored to effectively address the challenges posed by internal marketing by optimizing motivation, development, and engagement. Moreover, the framework is applied to design and implement a gamified work portal (GWP) through a systematic process, including the design of low-fidelity and high-fidelity prototypes. Additionally, the GWP is validated through a quasi-experiment involving IT professionals from different IT organizations to authenticate the effectiveness of framework. Finally, the outclass results obtained by the gamification-based GWP highlight the effectiveness of the proposed gamification approach in enhancing development, motivation, and engagement while fostering ongoing knowledge of the employees.

View more:

Pubmed

PeerJ Comput Sci

PMC Article

Hybrid computing framework security in dynamic offloading for IoT-enabled smart home system

Khan S, Jiangbin Z, Ullah F, Pervez Akhter M, Khan S, Awwad FA and Ismail EAA

In the distributed computing era, cloud computing has completely changed organizational operations by facilitating simple access to resources. However, the rapid development of the IoT has led to collaborative computing, which raises scalability and security challenges. To fully realize the potential of the Internet of Things (IoT) in smart home technologies, there is still a need for strong data security solutions, which are essential in dynamic offloading in conjunction with edge, fog, and cloud computing. This research on smart home challenges covers in-depth examinations of data security, privacy, processing speed, storage capacity restrictions, and analytics inside networked IoT devices. We introduce the Trusted IoT Big Data Analytics (TIBDA) framework as a comprehensive solution to reshape smart living. Our primary focus is mitigating pervasive data security and privacy issues. TIBDA incorporates robust trust mechanisms, prioritizing data privacy and reliability for secure processing and user information confidentiality within the smart home environment. We achieve this by employing a hybrid cryptosystem that combines Elliptic Curve Cryptography (ECC), Post Quantum Cryptography (PQC), and Blockchain technology (BCT) to protect user privacy and confidentiality. Additionally, we comprehensively compared four prominent Artificial Intelligence anomaly detection algorithms (Isolation Forest, Local Outlier Factor, One-Class SVM, and Elliptic Envelope). We utilized machine learning classification algorithms (random forest, k-nearest neighbors, support vector machines, linear discriminant analysis, and quadratic discriminant analysis) for detecting malicious and non-malicious activities in smart home systems. Furthermore, the main part of the research is with the help of an artificial neural network (ANN) dynamic algorithm; the TIBDA framework designs a hybrid computing system that integrates edge, fog, and cloud architecture and efficiently supports numerous users while processing data from IoT devices in real-time. The analysis shows that TIBDA outperforms these systems significantly across various metrics. In terms of response time, TIBDA demonstrated a reduction of 10-20% compared to the other systems under varying user loads, device counts, and transaction volumes. Regarding security, TIBDA's AUC values were consistently higher by 5-15%, indicating superior protection against threats. Additionally, TIBDA exhibited the highest trustworthiness with an uptime percentage 10-12% greater than its competitors. TIBDA's Isolation Forest algorithm achieved an accuracy of 99.30%, and the random forest algorithm achieved an accuracy of 94.70%, outperforming other methods by 8-11%. Furthermore, our ANN-based offloading decision-making model achieved a validation accuracy of 99% and reduced loss to 0.11, demonstrating significant improvements in resource utilization and system performance.

View more:

Pubmed

PeerJ Comput Sci

PMC Article

Decoding Bitcoin: leveraging macro- and micro-factors in time series analysis for price prediction

Jung HS, Kim JH and Lee H

Predicting Bitcoin prices is crucial because they reflect trends in the overall cryptocurrency market. Owing to the market's short history and high price volatility, previous research has focused on the factors influencing Bitcoin price fluctuations. Although previous studies used sentiment analysis or diversified input features, this study's novelty lies in its utilization of data classified into more than five major categories. Moreover, the use of data spanning more than 2,000 days adds novelty to this study. With this extensive dataset, the authors aimed to predict Bitcoin prices across various timeframes using time series analysis. The authors incorporated a broad spectrum of inputs, including technical indicators, sentiment analysis from social media, news sources, and Google Trends. In addition, this study integrated macroeconomic indicators, on-chain Bitcoin transaction details, and traditional financial asset data. The primary objective was to evaluate extensive machine learning and deep learning frameworks for time series prediction, determine optimal window sizes, and enhance Bitcoin price prediction accuracy by leveraging diverse input features. Consequently, employing the bidirectional long short-term memory (Bi-LSTM) yielded significant results even without excluding the COVID-19 outbreak as a black swan outlier. Specifically, using a window size of 3, Bi-LSTM achieved a root mean squared error of 0.01824, mean absolute error of 0.01213, mean absolute percentage error of 2.97%, and an R-squared value of 0.98791. Additionally, to ascertain the importance of input features, gradient importance was examined to identify which variables specifically influenced prediction results. Ablation test was also conducted to validate the effectiveness and validity of input features. The proposed methodology provides a varied examination of the factors influencing price formation, helping investors make informed decisions regarding Bitcoin-related investments, and enabling policymakers to legislate considering these factors.

View more:

Pubmed

PeerJ Comput Sci

PMC Article

Design of compensation algorithms for zero padding and its application to a patch based deep neural network

Ullah S and Song SH

In this article, compensation algorithms for zero padding are suggested to enhance the performance of deep convolutional neural networks. By considering the characteristics of convolving filters, the proposed methods efficiently compensate convolutional output errors due to zero padded inputs in a convolutional neural network. Primarily the algorithms are developed for patch based SRResNet for Single Image Super Resolution and the performance comparison is carried out using the SRResNet model but due to generalized nature of the padding algorithms its efficacy is tested in U-Net for Lung CT Image Segmentation. The proposed algorithms show better performance than the existing algorithm called partial convolution based padding (PCP), developed recently.

View more:

Pubmed

PeerJ Comput Sci

PMC Article

Machine learning and natural language processing to assess the emotional impact of influencers' mental health content on Instagram

Merayo N, Ayuso-Lanchares A and González-Sanguino C

This study aims to examine, through artificial intelligence, specifically machine learning, the emotional impact generated by disclosures about mental health on social media. In contrast to previous research, which primarily focused on identifying psychopathologies, our study investigates the emotional response to mental health-related content on Instagram, particularly content created by influencers/celebrities. This platform, especially favored by the youth, is the stage where these influencers exert significant social impact, and where their analysis holds strong relevance. Analyzing mental health with machine learning techniques on Instagram is unprecedented, as all existing research has primarily focused on Twitter.

View more:

Pubmed

PeerJ Comput Sci

PMC Article

SPCANet: congested crowd counting strip pooling combined attention network

Yuan Z

Crowd counting aims to estimate the number and distribution of the population in crowded places, which is an important research direction in object counting. It is widely used in public place management, crowd behavior analysis, and other scenarios, showing its robust practicality. In recent years, crowd-counting technology has been developing rapidly. However, in highly crowded and noisy scenes, the counting effect of most models is still seriously affected by the distortion of view angle, dense occlusion, and inconsistent crowd distribution. Perspective distortion causes crowds to appear in different sizes and shapes in the image, and dense occlusion and inconsistent crowd distributions result in parts of the crowd not being captured completely. This ultimately results in the imperfect capture of spatial information in the model. To solve such problems, we propose a strip pooling combined attention (SPCANet) network model based on normed-deformable convolution (NDConv). We model long-distance dependencies more efficiently by introducing strip pooling. In contrast to traditional square kernel pooling, strip pooling uses long and narrow kernels (1×N or N×1) to deal with dense crowds, mutual occlusion, and overlap. Efficient channel attention (ECA), a mechanism for learning channel attention using a local cross-channel interaction strategy, is also introduced in SPCANet. This module generates channel attention through a fast 1D convolution to reduce model complexity while improving performance as much as possible. Four mainstream datasets, Shanghai Tech Part A, Shanghai Tech Part B, UCF-QNRF, and UCF CC 50, were utilized in extensive experiments, and mean absolute error (MAE) exceeds the baseline, which is 60.9, 7.3, 90.8, and 161.1, validating the effectiveness of SPCANet. Meanwhile, mean squared error (MSE) decreases by 5.7% on average over the four datasets, and the robustness is greatly improved.

View more:

Pubmed

PeerJ Comput Sci

PMC Article