📝 LumiNet: The Bright Side of Perceptual Knowledge Distillation 🔭 "By meticulously analyzing intra-class dynamics, LumiNet reconstructs more granular inter-class relationships, enabling the student model to learn a richer breadth of knowledge." [gal30b+] 🤖 #CV ⚙️ https://github.com/ismail31416/LumiNet 🔗 https://arxiv.org/abs/2310.03669v1 #arxiv https://creative.ai/system/media_attachments/files/111/193/713/380/180/941/original/49c3a31c8b53cc9e.jpg https://creative.ai/system/media_attachments/files/111/193/713/442/273/168/original/7d8252df7460f31e.jpg https://creative.ai/system/media_attachments/files/111/193/713/506/562/426/original/8ab495b2d18e4777.jpg https://creative.ai/system/media_attachments/files/111/193/713/582/018/941/original/b91ed9dcc95f6819.jpg
📝 Towards Unified Deep Image Deraining: A Survey and a New Benchmark 🔭 "A novel benchmark named HQ-RAIN is constructed to further conduct extensive evaluation, consisting of 5000 paired high-resolution images with higher harmony and realism." [gal30b+] 🤖 #CV ⚙️ https://github.com/PaddlePaddle/PaddleDetection 🔗 https://arxiv.org/abs/2310.03535v1 #arxiv https://creative.ai/system/media_attachments/files/111/193/241/587/225/577/original/28115cd0ec82573c.jpg https://creative.ai/system/media_attachments/files/111/193/241/652/778/551/original/f331a60d884c20e6.jpg https://creative.ai/system/media_attachments/files/111/193/241/745/789/948/original/34d9c681ec77f02a.jpg https://creative.ai/system/media_attachments/files/111/193/241/832/791/378/original/f2a06b4336480be8.jpg
📝 PrototypeFormer: Learning to Explore Prototype Relationships for Few-Shot Image Classification 🔭 "PrototypeFormer uses a transformer to build a prototype extraction module and then applies contrastive loss to optimize prototypes for better feature representation in few-shot image classification tasks." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.03517v1 #arxiv https://creative.ai/system/media_attachments/files/111/192/828/640/470/676/original/9da50a018d7558a8.jpg https://creative.ai/system/media_attachments/files/111/192/828/691/691/088/original/20f2f0508d7e9178.jpg https://creative.ai/system/media_attachments/files/111/192/828/766/900/454/original/da7c3cc171a413b6.jpg https://creative.ai/system/media_attachments/files/111/192/828/816/860/611/original/88f67fd914278e3a.jpg
📝 Exploring DINO: Emergent Properties and Limitations for Synthetic Aperture Radar Imagery 🔭 "Uses the Self-Distillation with No Labels (DINO) method for SAR image classification on unlabeled data and then fine-tune the pre-trained model on the labeled data to predict land cover maps." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.03513v1 #arxiv https://creative.ai/system/media_attachments/files/111/192/356/737/498/497/original/7b64211778b0216c.jpg https://creative.ai/system/media_attachments/files/111/192/356/831/822/561/original/68fd262679ed9b3a.jpg https://creative.ai/system/media_attachments/files/111/192/356/891/690/060/original/fb5be8e94a45a541.jpg https://creative.ai/system/media_attachments/files/111/192/356/955/149/258/original/0041243aa15ac627.jpg
📝 Kandinsky: An Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion 🔭 "Kandinsky1 is an exploration of latent diffusion architecture, with a modified MoVQ implementation serving as the image autoencoder component and image prior model trained to map text to image embedding from a pre-trained CLIP model." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.03502v1 #arxiv https://creative.ai/system/media_attachments/files/111/191/825/938/789/398/original/1c3887def543d83a.jpg https://creative.ai/system/media_attachments/files/111/191/825/992/525/152/original/7d279e0462f5a271.jpg https://creative.ai/system/media_attachments/files/111/191/826/050/820/826/original/b6e4512875ea76db.jpg https://creative.ai/system/media_attachments/files/111/191/826/103/506/646/original/7014e413febb3159.jpg
📝 Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization 🔭🧠 "MRAV-FF fuses audio-visual features across different temporal resolutions using a hierarchical gated cross-attention mechanism that weighs the importance of audio information at diverse temporal scales." [gal30b+] 🤖 #CV #LG #MM 🔗 https://arxiv.org/abs/2310.03456v1 #arxiv https://creative.ai/system/media_attachments/files/111/191/177/184/045/779/original/605abc8ce63e34db.jpg https://creative.ai/system/media_attachments/files/111/191/177/248/872/695/original/18dffdce61fc1611.jpg
📝 Denoising Diffusion Step-Aware Models 🔭 "DDSM employs a spectrum of neural networks whose sizes are adapted to the importance of each generative step, determined through evolutionary search (Fig 1), thus effectively circumventing redundant computational efforts, particularly in the less critical steps." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.03337v1 #arxiv https://creative.ai/system/media_attachments/files/111/190/705/125/838/498/original/94c31b372081b370.jpg https://creative.ai/system/media_attachments/files/111/190/705/190/855/221/original/4a9a0dd9f441c7d5.jpg https://creative.ai/system/media_attachments/files/111/190/705/247/663/314/original/9075bdaaf88afb60.jpg https://creative.ai/system/media_attachments/files/111/190/705/307/077/330/original/72025607a3f315d4.jpg
📝 Investigating the Limitation of CLIP Models: The Worst-Performing Categories 🔭🧠 "Proposes the Class-wise Matching Margin (\cmm) to measure the inference confusion and find the worst-performing categories of CLIP models without any manual prompt engineering, laborious optimization, or access to labeled validation data." [gal30b+] 🤖 #CV #LG ⚙️ https://github.com/openai/CLIP 🔗 https://arxiv.org/abs/2310.03324v1 #arxiv https://creative.ai/system/media_attachments/files/111/190/292/457/863/039/original/5685dbbeaab8ee65.jpg https://creative.ai/system/media_attachments/files/111/190/292/519/546/649/original/9e41141bf3007451.jpg https://creative.ai/system/media_attachments/files/111/190/292/579/494/295/original/0f492461e37f9023.jpg https://creative.ai/system/media_attachments/files/111/190/292/647/778/170/original/643da9649d77e274.jpg
📝 Can Pre-Trained Models Assist in Dataset Distillation? 🔭 "Pre-trained Models transfer knowledge to synthetic datasets to guide Dataset Distillation accurately by selecting optimal options, including initialization parameters, model architecture, training epoch and domain knowledge." [gal30b+] 🤖 #CV ⚙️ https://github.com/yaolu-zjut/DDInterpreter 🔗 https://arxiv.org/abs/2310.03295v1 #arxiv https://creative.ai/system/media_attachments/files/111/189/643/534/842/220/original/76810c93edcc7e6f.jpg https://creative.ai/system/media_attachments/files/111/189/643/590/016/456/original/8b76124ac901d4af.jpg https://creative.ai/system/media_attachments/files/111/189/643/646/144/256/original/94e004ae4734a57e.jpg
📝 SimVLG: Simple and Efficient Pretraining of Visual Language Generative Models 🔭 "SimVLG is a streamlined framework for the pre-training of computationally intensive vision-language generative models, leveraging frozen pre-trained large language models (LLMs)." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.03291v1 #arxiv https://creative.ai/system/media_attachments/files/111/189/171/719/622/132/original/9baa6b1e7364a804.jpg https://creative.ai/system/media_attachments/files/111/189/171/786/110/671/original/9351db94426fbce3.jpg https://creative.ai/system/media_attachments/files/111/189/171/843/593/607/original/5650d619f8e63b15.jpg https://creative.ai/system/media_attachments/files/111/189/171/916/171/613/original/707ae4d63a18fbc2.jpg
📝 Ablation Study to Clarify the Mechanism of Object Segmentation in Multi-Object Representation Learning 🔭🧠 "Works by maximizing the attention mask of the image region best represented by a single latent vector corresponding to the attention mask, and by minimizing reconstruction loss between the input image and decoded component image." [gal30b+] 🤖 #CV #LG 🔗 https://arxiv.org/abs/2310.03273v1 #arxiv https://creative.ai/system/media_attachments/files/111/188/699/861/727/514/original/843cfd6f45596084.jpg https://creative.ai/system/media_attachments/files/111/188/699/914/071/258/original/97fe449a2f2c06b3.jpg https://creative.ai/system/media_attachments/files/111/188/699/964/511/039/original/5f3583398545f665.jpg https://creative.ai/system/media_attachments/files/111/188/700/017/124/899/original/a7aa3da631cdfd2c.jpg
📝 EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models 🔭 "Leverages a quantization-aware variant of the low-rank adapter to transfer the denoising capabilities of full-precision models to their quantized counterparts, eliminating the need for training data." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.03270v1 #arxiv https://creative.ai/system/media_attachments/files/111/188/346/006/898/454/original/3eca0a9d35fac333.jpg https://creative.ai/system/media_attachments/files/111/188/346/066/093/680/original/054b3e5676f82ae8.jpg https://creative.ai/system/media_attachments/files/111/188/346/123/754/211/original/463fe94342dc5f88.jpg https://creative.ai/system/media_attachments/files/111/188/346/190/353/632/original/4e3da9afeb7f955e.jpg
📝 Reinforcement Learning-Based Mixture of Vision Transformers for Video Violence Recognition 🔭 "The proposed transformer-based Mixture of Experts (MoE) video violence recognition system consists of two main modules: (1) a backbone network and (2) an intelligent router." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.03108v1 #arxiv https://creative.ai/system/media_attachments/files/111/187/992/126/163/637/original/916b2727e08f4532.jpg https://creative.ai/system/media_attachments/files/111/187/992/196/317/774/original/ad0df19c5f24a731.jpg https://creative.ai/system/media_attachments/files/111/187/992/251/302/816/original/d726792826f2e237.jpg https://creative.ai/system/media_attachments/files/111/187/992/323/054/661/original/1987f19a11b28f6a.jpg
📝 OMG-ATTACK: Self-Supervised on-Manifold Generation of Transferable Evasion Attacks 🧠🔭 "A self-supervised, computationally economical method for generating adversarial examples that are more related to the data rather than the model it was attacking, making it more transferable." [gal30b+] 🤖 #LG #CV 🔗 https://arxiv.org/abs/2310.03707v1 #arxiv https://creative.ai/system/media_attachments/files/111/187/461/274/214/678/original/469c60333fcd701d.jpg https://creative.ai/system/media_attachments/files/111/187/461/325/394/900/original/95b84c3b45231949.jpg https://creative.ai/system/media_attachments/files/111/187/461/378/225/273/original/afb0f768ceef5d6d.jpg https://creative.ai/system/media_attachments/files/111/187/461/434/666/260/original/9ce78ff2ee9455bc.jpg
📝 Batch-Less Stochastic Gradient Descent for Compressive Learning of Deep Regularization for Image Denoising 🧠🔭 "Proposes a new method to efficiently learn a regularizer parametrized by a deep neural net (DNN) using stochastic gradient descent (SGD) on the compressed version of the training database." [gal30b+] 🤖 #LG #CV #IT 🔗 https://arxiv.org/abs/2310.03085v1 #arxiv https://creative.ai/system/media_attachments/files/111/186/989/139/851/828/original/c6e7f9975c7f01e4.jpg https://creative.ai/system/media_attachments/files/111/186/989/216/380/404/original/b8a3ae46827cc94e.jpg https://creative.ai/system/media_attachments/files/111/186/989/294/629/473/original/5cbcbf5fdcddd4aa.jpg https://creative.ai/system/media_attachments/files/111/186/989/343/770/673/original/5d63d7cfcb4c0e99.jpg
📝 Decoding Human Activities: Analyzing Wearable Accelerometer and Gyroscope Data for Activity Recognition 🔭🧠 "A stratified multi-structural approach based on a Residual network (ResNet) ensembled with Residual MobileNet, termed as FusionActNet, is proposed for classifying different activities based on the unique features of the human body's static and dynamic movements." [gal30b+] 🤖 #CV #LG 🔗 https://arxiv.org/abs/2310.02011v1 #arxiv https://creative.ai/system/media_attachments/files/111/184/807/809/059/792/original/344f61c967c5ff90.jpg https://creative.ai/system/media_attachments/files/111/184/807/859/395/183/original/f0682d9feccb1188.jpg https://creative.ai/system/media_attachments/files/111/184/807/910/424/757/original/063e7209fa44e30e.jpg https://creative.ai/system/media_attachments/files/111/184/807/963/780/403/original/fabbd060d43287f8.jpg
📝 Development of Machine Vision Approach for Mechanical Component Identification Based on Its Dimension and Pitch 🔭 "Uses a Raspberry Pi Camera, Raspberry Pi 4, and some open-source computer vision libraries to calculate the required features of the bolts used in the assembly line." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.01995v1 #arxiv https://creative.ai/system/media_attachments/files/111/184/395/047/370/875/original/aedc947bbdba4036.jpg https://creative.ai/system/media_attachments/files/111/184/395/100/192/184/original/7d5614edd7befd8b.jpg https://creative.ai/system/media_attachments/files/111/184/395/154/321/277/original/52ceab819d671050.jpg https://creative.ai/system/media_attachments/files/111/184/395/209/261/141/original/a247d4728aa097fa.jpg
📝 Understanding Masked Autoencoders From a Local Contrastive Perspective 🔭 "Explores a new perspective to explain what truly contributes to the "rich hidden representations inside the MAE" and reformulate the reconstruction based MAE into a local-contrastive version." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.01994v1 #arxiv https://creative.ai/system/media_attachments/files/111/183/687/327/880/081/original/d83da1624e852b32.jpg https://creative.ai/system/media_attachments/files/111/183/687/385/427/934/original/05f83c3be3d4b93b.jpg https://creative.ai/system/media_attachments/files/111/183/687/436/681/313/original/7a9911234b981a23.jpg https://creative.ai/system/media_attachments/files/111/183/687/490/953/173/original/d794a5262111dd00.jpg
📝 CoralVOS: Dataset and Benchmark for Coral Video Segmentation 🔭 "Proposes a novel coral video segmentation dataset: CoralVOS, which contains 50 high-quality videos with dense pixel-level segmentation labels on corals and backgrounds." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.01946v1 #arxiv https://creative.ai/system/media_attachments/files/111/183/333/332/427/555/original/14b753ee668e66f0.jpg https://creative.ai/system/media_attachments/files/111/183/333/382/504/690/original/bc3c6e3253522940.jpg https://creative.ai/system/media_attachments/files/111/183/333/447/662/365/original/669336ec1bf1647a.jpg https://creative.ai/system/media_attachments/files/111/183/333/498/675/545/original/578812e9175205e1.jpg
📝 Constructing Image-Text Pair Dataset From Books 🔭 "A dataset construction pipeline, comprising an optical character reader (OCR), an object detector, and a layout analyzer for the autonomous extraction of image-text pairs." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.01936v1 #arxiv https://creative.ai/system/media_attachments/files/111/182/920/507/532/928/original/69c3aac2694dde2f.jpg https://creative.ai/system/media_attachments/files/111/182/920/565/555/051/original/0e21244eb612ce4d.jpg https://creative.ai/system/media_attachments/files/111/182/920/613/542/813/original/9c61b383f851c2ad.jpg https://creative.ai/system/media_attachments/files/111/182/920/666/222/615/original/8cd3e701fee3a9e3.jpg
📝 MarineDet: Towards Open-Marine Object Detection 🔭 "Formulates a joint visual-text semantic space through pre-training and then perform marine-specific training to achieve in-air-to-marine knowledge transfer by leveraging the pre-trained model." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.01931v1 #arxiv https://creative.ai/system/media_attachments/files/111/182/625/531/740/825/original/08401462e2a662fc.jpg https://creative.ai/system/media_attachments/files/111/182/625/599/209/907/original/a2aeb5c7567084eb.jpg https://creative.ai/system/media_attachments/files/111/182/625/677/587/775/original/d388ac600c797469.jpg https://creative.ai/system/media_attachments/files/111/182/625/731/120/839/original/5a95db2e555f9f85.jpg
📝 Beyond the Benchmark: Detecting Diverse Anomalies in Videos 🔭🧠 "MFAD is built upon the AI-VAD framework, which utilizes single-frame features (such as pose estimation and deep image encoding) and two-frame features (such as object velocity)." [gal30b+] 🤖 #CV #LG ⚙️ https://github.com/yoavarad/MFAD 🔗 https://arxiv.org/abs/2310.01904v1 #arxiv https://creative.ai/system/media_attachments/files/111/181/858/832/081/790/original/2124b831b30cfa11.jpg https://creative.ai/system/media_attachments/files/111/181/858/892/694/063/original/4057a0d82e7eabdd.jpg https://creative.ai/system/media_attachments/files/111/181/858/960/015/755/original/a40708d383927cc2.jpg https://creative.ai/system/media_attachments/files/111/181/859/010/031/975/original/8b391293f3566302.jpg
📝 A Dual Attentive Generative Adversarial Network for Remote Sensing Image Change Detection 🔭 "Designs a multi-level feature extractor to effectively fuse multi-level features and introduce aggregate connections to fuse them, which uses the pre-trained model to extract multi-level features from bi-temporal images and introduce aggregate connections to fuse them." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.01876v1 #arxiv https://creative.ai/system/media_attachments/files/111/181/622/982/401/125/original/905c460d4089b76f.jpg https://creative.ai/system/media_attachments/files/111/181/623/060/041/577/original/e2891f3d81cc6f51.jpg https://creative.ai/system/media_attachments/files/111/181/623/109/918/573/original/a1b6883e6ae51fcc.jpg https://creative.ai/system/media_attachments/files/111/181/623/166/271/893/original/e5138613fd3f33b1.jpg
📝 Selective Feature Adapter for Dense Vision Transformers 🔭 "Selective feature adapter (SFA), consisting of external adapters and internal adapters, are sequentially operated over a transformer model to achieve SoTA performance under any given budget of trainable parameters." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.01843v1 #arxiv https://creative.ai/system/media_attachments/files/111/180/915/121/545/638/original/133538217b46bd77.jpg https://creative.ai/system/media_attachments/files/111/180/915/177/115/975/original/37b96deddde5b252.jpg https://creative.ai/system/media_attachments/files/111/180/915/241/602/752/original/1d0f13f960507c11.jpg https://creative.ai/system/media_attachments/files/111/180/915/298/537/583/original/7315e68b612b7182.jpg
📝 SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-Based Question Answering 🔭 "Extracts a scene graph from an input image using a pre-trained scene graph generator and employ semantically-preserving augmentation with self-supervised techniques to learn joint embeddings by optimizing the informational content in their representations using an un-normalized contrastive approach." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.01842v1 #arxiv https://creative.ai/system/media_attachments/files/111/180/620/217/984/672/original/7f0291575de91f63.jpg https://creative.ai/system/media_attachments/files/111/180/620/267/978/206/original/115b99c9a9b59793.jpg https://creative.ai/system/media_attachments/files/111/180/620/321/579/143/original/8c0cf68cc1e716da.jpg https://creative.ai/system/media_attachments/files/111/180/620/373/730/057/original/927c7b1e58ea8b11.jpg
📝 Self-Supervised High Dynamic Range Imaging with Multi-Exposure Images in Dynamic Scenes 🔭 "Proposes to train a reconstruction network under the supervision of two complementary components, which are estimated using multi-exposure images and focus on HDR color as well as structure, respectively." [gal30b+] 🤖 #CV ⚙️ https://github.com/cszhilu1998/SelfHDR 🔗 https://arxiv.org/abs/2310.01840v1 #arxiv https://creative.ai/system/media_attachments/files/111/180/207/166/174/282/original/b9caa97ad6205be0.jpg https://creative.ai/system/media_attachments/files/111/180/207/237/100/816/original/7ce231b6e7cc6de0.jpg https://creative.ai/system/media_attachments/files/111/180/207/302/806/318/original/ba950c8a45b97132.jpg https://creative.ai/system/media_attachments/files/111/180/207/350/641/577/original/b92d58aad61358d2.jpg
📝 Skin the Sheep Not Only Once: Reusing Various Depth Datasets to Drive the Learning of Optical Flow 🔭 "Proposes to leverage the geometric connection between optical flow estimation and stereo matching (based on the similarity upon finding pixel correspondences across images) to unify various real-world depth estimation datasets for generating supervised training data upon optical flow." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.01833v1 #arxiv https://creative.ai/system/media_attachments/files/111/179/853/293/180/756/original/cf85f1a8b3f62b65.jpg https://creative.ai/system/media_attachments/files/111/179/853/362/561/242/original/f1fa4644dee3b3d4.jpg https://creative.ai/system/media_attachments/files/111/179/853/424/422/597/original/ad35be98161dbeda.jpg https://creative.ai/system/media_attachments/files/111/179/853/475/854/935/original/0de7c55a93b5a060.jpg
📝 AI-Generated Images as Data Source: The Dawn of Synthetic Era 🔭 "Explores the innovative concept of leveraging these AI generated images as a new data source, reshaping traditional model paradigms in visual intelligence, from training machine learning models to simulating scenarios for modeling, testing, and validation." [gal30b+] 🤖 #CV ⚙️ https://github.com/mwxely/AIGS 🔗 https://arxiv.org/abs/2310.01830v1 #arxiv https://creative.ai/system/media_attachments/files/111/179/676/521/141/076/original/167f46aa54004eab.jpg https://creative.ai/system/media_attachments/files/111/179/676/611/809/300/original/822ac1baaf7823a0.jpg https://creative.ai/system/media_attachments/files/111/179/676/694/029/675/original/7a9fce2e813a1400.jpg https://creative.ai/system/media_attachments/files/111/179/676/758/778/824/original/6b2753a7c4fd2bc5.jpg
📝 Amazing Combinatorial Creation: Acceptable Swap-Sampling for Text-to-Image Generation 🔭 "Proposes a swapping mechanism and an acceptable region for sampling high-quality object images from a new image pool generated by using randomly exchanging column vectors of two text embeddings." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.01819v1 #arxiv https://creative.ai/system/media_attachments/files/111/179/086/691/077/277/original/bfeca70cb5210a35.jpg https://creative.ai/system/media_attachments/files/111/179/086/789/309/475/original/e64962e37c41e9b0.jpg https://creative.ai/system/media_attachments/files/111/179/086/878/891/776/original/0b24cb9f06ef9fa5.jpg https://creative.ai/system/media_attachments/files/111/179/086/946/958/459/original/6a59b1c5abd6a7af.jpg
📝 PPT: Token Pruning and Pooling for Efficient Vision Transformers 🔭 "By heuristically integrating both token pruning and token pooling techniques in ViTs without additional trainable parameters, PPT reduces the model complexity while maintaining its predictive accuracy on vision tasks." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.01812v1 #arxiv https://creative.ai/system/media_attachments/files/111/178/673/754/572/961/original/bcab5ebce41584ae.jpg https://creative.ai/system/media_attachments/files/111/178/673/803/345/270/original/58d91cb6ec19e128.jpg https://creative.ai/system/media_attachments/files/111/178/673/875/923/314/original/2ee27515f00eb146.jpg https://creative.ai/system/media_attachments/files/111/178/673/947/666/063/original/ab1d69598b881459.jpg
📝 HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption 🔭 "$\textit{CCEval}$ is a GPT-4 assisted method that can assess the detailed captioning capability of large vision-language model (LVLM)." [gal30b+] 🤖 #CV ⚙️ https://github.com/haotian-liu/LLaVA 🔗 https://arxiv.org/abs/2310.01779v1 #arxiv https://creative.ai/system/media_attachments/files/111/178/437/816/986/283/original/21dc7386dc43cc55.jpg https://creative.ai/system/media_attachments/files/111/178/437/871/851/904/original/b3d8ec44a43d93f1.jpg
📝 ImageNet-OOD: Deciphering Modern Out-of-Distribution Detection Algorithms 🔭 "Presents ImageNet-OOD, a clean semantic shift dataset that minimizes the interference of covariate shift to enable in-depth analyses of out-of-distribution (OOD) detection algorithms." [gal30b+] 🤖 #CV ⚙️ https://github.com/princetonvisualai/imagenetood 🔗 https://arxiv.org/abs/2310.01755v1 #arxiv https://creative.ai/system/media_attachments/files/111/178/142/960/630/810/original/ddbb09053a079508.jpg https://creative.ai/system/media_attachments/files/111/178/143/029/671/738/original/b79adbf1580e9249.jpg https://creative.ai/system/media_attachments/files/111/178/143/111/762/360/original/5eb068a1a9b169ab.jpg https://creative.ai/system/media_attachments/files/111/178/143/179/556/311/original/aaa23c18d43355bd.jpg
📝 Adaptive Visual Scene Understanding: Incremental Scene Graph Generation 🔭 "Introduces Continual ScenE Graph Generation (CSEGG), a new benchmark for evaluating continual scene (graph) generation (SGG)." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.01636v1 #arxiv https://creative.ai/system/media_attachments/files/111/177/848/031/976/784/original/e635e1cb362889f4.jpg https://creative.ai/system/media_attachments/files/111/177/848/092/821/979/original/2d20d4ae3c5e13cd.jpg https://creative.ai/system/media_attachments/files/111/177/848/163/014/058/original/4b5fc8568bf48a4f.jpg https://creative.ai/system/media_attachments/files/111/177/848/225/150/059/original/6fe81ae7f2707e4f.jpg
📝 Direct Inversion: Boosting Diffusion-Based Editing with 3 Lines of Code 🔭 "By disentangling the source and target diffusion branches, direct inversion achieves optimal performance of both branches with just three lines of code, achieving state-of-the-art inversion quality and edit fidelity in real-time." [gal30b+] 🤖 #CV ⚙️ https://github.com/cure-lab/DirectInversion 🔗 https://arxiv.org/abs/2310.01506v1 #arxiv https://creative.ai/system/media_attachments/files/111/177/553/147/016/422/original/04ef3d2714a0a5c0.jpg https://creative.ai/system/media_attachments/files/111/177/553/243/127/251/original/d781ee52c2a6334b.jpg https://creative.ai/system/media_attachments/files/111/177/553/344/266/478/original/56aab49dd777ff44.jpg https://creative.ai/system/media_attachments/files/111/177/553/445/353/200/original/fc190051156b19f1.jpg
📝 Generative Autoencoding of Dropout Patterns 🧠🔭 "A unique dropout pattern is assigned to each data point in the training dataset, then an autoencoder is trained to reconstruct the corresponding data point using this pattern as information to be encoded." [gal30b+] 🤖 #LG #CV ⚙️ https://github.com/mseitzer/pytorch-fid 🔗 https://arxiv.org/abs/2310.01712v1 #arxiv https://creative.ai/system/media_attachments/files/111/176/373/173/131/341/original/20368636c07eafe2.jpg
📝 PRIME: Prioritizing Interpretability in Failure Mode Extraction 🔭 "Starts by obtaining human-understandable concepts (tags) of images in the dataset and then analyze the model's behavior based on the presence or absence of combinations of these tags." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.00164v1 #arxiv https://creative.ai/system/media_attachments/files/111/176/020/733/219/036/original/a25cae823e09eb79.jpg https://creative.ai/system/media_attachments/files/111/176/020/808/897/393/original/02bf0e2fce6e54c7.jpg https://creative.ai/system/media_attachments/files/111/176/020/877/891/785/original/3813432f7ea576ee.jpg https://creative.ai/system/media_attachments/files/111/176/020/939/882/347/original/a926927b48852e90.jpg
📝 Prior Mismatch and Adaptation in PnP-ADMM with a Nonconvex Convergence Analysis 🔭 "PnP-ADMM works by integrating a data-fidelity term and an image prior in an iterative algorithm for solving imaging inverse problems, such as denoising, deconvolution and image super-resolution." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.00133v1 #arxiv https://creative.ai/system/media_attachments/files/111/175/726/025/200/525/original/81fe7f64213b28cd.jpg https://creative.ai/system/media_attachments/files/111/175/726/078/333/290/original/e2cfd286f6b73c3b.jpg https://creative.ai/system/media_attachments/files/111/175/726/150/491/355/original/78fd439828b3d3a4.jpg https://creative.ai/system/media_attachments/files/111/175/726/222/370/303/original/7f6a14283ab420c2.jpg
📝 Denoising and Selecting Pseudo-Heatmaps for Semi-Supervised Human Pose Estimation 🔭 "Uses unlabeled images with multi-view augmentations to generate reliable target pseudo-heatmaps using a denoising scheme and a threshold-and-refine process, and selects reliable targets from this pool using cross-student uncertainty." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.00099v1 #arxiv https://creative.ai/system/media_attachments/files/111/175/312/955/002/964/original/06ce1b9887585040.jpg https://creative.ai/system/media_attachments/files/111/175/313/011/815/725/original/d6959500807fc2fb.jpg https://creative.ai/system/media_attachments/files/111/175/313/070/198/209/original/a9568609db1b3a4d.jpg https://creative.ai/system/media_attachments/files/111/175/313/122/696/868/original/d598dc7806aa754c.jpg
📝 Towards Few-Call Model Stealing via Active Self-Paced Knowledge Distillation and Diffusion-Based Image Generation 🔭🧠 "Proposes the following framework: Creates a synthetic data set (called proxy data set) by leveraging the ability of diffusion models to generate realistic and diverse images." [gal30b+] 🤖 #CV #LG 🔗 https://arxiv.org/abs/2310.00096v1 #arxiv https://creative.ai/system/media_attachments/files/111/175/018/033/153/247/original/c778aee133e51129.jpg https://creative.ai/system/media_attachments/files/111/175/018/128/612/344/original/3131f39a93f3e595.jpg https://creative.ai/system/media_attachments/files/111/175/018/186/390/083/original/f1f4eb231a037dd5.jpg https://creative.ai/system/media_attachments/files/111/175/018/239/466/513/original/43260547b93ad73e.jpg
📝 DataDAM: Efficient Dataset Distillation with Attention Matching 🔭🧠 "Trains a model that can generate images that match spatial attention maps of images from other datasets across various model architectures and layers in the network, achieving state-of-the-art performance." [gal30b+] 🤖 #CV #LG 🔗 https://arxiv.org/abs/2310.00093v1 #arxiv https://creative.ai/system/media_attachments/files/111/174/605/187/487/498/original/33358adb3b015b8d.jpg https://creative.ai/system/media_attachments/files/111/174/605/328/678/566/original/b0e5fcba4da00697.jpg https://creative.ai/system/media_attachments/files/111/174/605/440/487/506/original/303f3255cbe6d736.jpg https://creative.ai/system/media_attachments/files/111/174/605/540/644/757/original/a581616616604b8f.jpg
📝 Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks 🔭 "A deep fake is an image synthesized by a Generative Adversarial Network (GAN) that mimics the distribution of real image data and can fool the human visual system (HVS)." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.00076v1 #arxiv https://creative.ai/system/media_attachments/files/111/174/251/408/526/253/original/3ac04b06fef613b4.jpg https://creative.ai/system/media_attachments/files/111/174/251/482/283/951/original/4a7d873301ca7daa.jpg https://creative.ai/system/media_attachments/files/111/174/251/547/427/449/original/289e4dce00fa3c42.jpg https://creative.ai/system/media_attachments/files/111/174/251/609/937/312/original/89b592189e1e4d67.jpg
📝 Prompt-Enhanced Self-Supervised Representation Learning for Remote Sensing Image Understanding 🔭 "A reconstructive prompt that uses original image patches as a template and a prompt-enhanced generative branch that provides contextual information through semantic consistency constraints are used for self-supervised representation learning on remote sensing images." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.00022v1 #arxiv https://creative.ai/system/media_attachments/files/111/173/897/251/548/948/original/c03c33a06d0060ba.jpg https://creative.ai/system/media_attachments/files/111/173/897/313/358/109/original/7addc8f8f5d72f3f.jpg https://creative.ai/system/media_attachments/files/111/173/897/386/758/601/original/dc265a177978a8d8.jpg https://creative.ai/system/media_attachments/files/111/173/897/432/715/538/original/939fc5c57e6f622a.jpg
📝 Joint Self-Supervised Depth and Optical Flow Estimation Towards Dynamic Objects 🔭 "A joint depth and optical flow estimation framework, which predicts depths in various motions by minimizing pixel wrap errors in bilateral photometric re-projections and optical flow, is proposed." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.00011v1 #arxiv https://creative.ai/system/media_attachments/files/111/173/543/459/139/348/original/350013500ce7a95c.jpg https://creative.ai/system/media_attachments/files/111/173/543/516/516/037/original/2f94d7ed5c9807ad.jpg https://creative.ai/system/media_attachments/files/111/173/543/579/086/831/original/422ff90114d8771e.jpg https://creative.ai/system/media_attachments/files/111/173/543/636/241/676/original/03c7d685c5c712ca.jpg
📝 Rethinking Audiovisual Segmentation with Semantic Quantization and Decomposition 🔭 "Decomposes the multi-source audio semantics into single-source semantics, allowing for more effective interaction with visual content, and propose a semantic decomposition method based on product quantization, where the multi-source semantics can be decomposed and represented by several quantized single-source semantics." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2310.00132v1 #arxiv https://creative.ai/system/media_attachments/files/111/173/243/452/040/051/original/3fbffe68d7029205.jpg https://creative.ai/system/media_attachments/files/111/173/243/521/841/528/original/8253de4c189c2f14.jpg https://creative.ai/system/media_attachments/files/111/173/243/574/515/152/original/02ae92db34b93233.jpg https://creative.ai/system/media_attachments/files/111/173/243/646/882/862/original/8052ee811b5d50ab.jpg
📝 Network Memory Footprint Compression Through Jointly Learnable Codebooks and Mappings 🔭 "JLCM allows a very efficient approximation of any DNN: as such, a Llama 7B can be compressed down to 2Go and loaded on 5-year-old smartphones." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.17361v1 #arxiv https://creative.ai/system/media_attachments/files/111/172/994/825/239/480/original/d36454cc24b7440e.jpg https://creative.ai/system/media_attachments/files/111/172/994/880/648/031/original/1edb10a11e23e58d.jpg https://creative.ai/system/media_attachments/files/111/172/994/937/209/552/original/4e1615bad11a6c87.jpg https://creative.ai/system/media_attachments/files/111/172/994/990/796/073/original/3ae9fd35bf987914.jpg
📝 Towards Free Data Selection with General-Purpose Models 🔭🧠 "Defines semantic patterns extracted from intermediate features of the general-purpose model to capture subtle local information in each image, and enable the selection of all data samples in a single pass through distance-based sampling at the fine-grained semantic pattern level." [gal30b+] 🤖 #CV #LG ⚙️ https://github.com/yichen928/FreeSel 🔗 https://arxiv.org/abs/2309.17342v1 #arxiv https://creative.ai/system/media_attachments/files/111/172/758/940/976/946/original/c0216fdfd0182a29.jpg https://creative.ai/system/media_attachments/files/111/172/758/997/650/123/original/0697fd2be43f4686.jpg https://creative.ai/system/media_attachments/files/111/172/759/051/113/449/original/9cb432042688e25a.jpg https://creative.ai/system/media_attachments/files/111/172/759/103/174/600/original/79be92be988eedf7.jpg
📝 Information Flow in Self-Supervised Learning 🔭 "Proposes a novel pre-training method, matrix variational masked auto-encoder (M-MAE), as an enhancement to masked image modeling (MIM)." [gal30b+] 🤖 #CV ⚙️ https://github.com/yifanzhang-pro/M-MAE 🔗 https://arxiv.org/abs/2309.17281v1 #arxiv https://creative.ai/system/media_attachments/files/111/172/169/073/810/072/original/153c2ef7046be13e.jpg https://creative.ai/system/media_attachments/files/111/172/169/124/730/148/original/ee246ed74f088ba0.jpg https://creative.ai/system/media_attachments/files/111/172/169/174/009/955/original/ca86cb9760b15a00.jpg
📝 EGVD: Event-Guided Video Deraining 🔭 "Proposes an end-to-end learning-based network, dubbed Event-based Guided Video Deraining (EGVD), to unlock the potential of the event camera for video deraining." [gal30b+] 🤖 #CV ⚙️ https://github.com/booker-max/EGVD 🔗 https://arxiv.org/abs/2309.17239v1 #arxiv https://creative.ai/system/media_attachments/files/111/171/815/201/935/046/original/3cb31ff0b3d22155.jpg https://creative.ai/system/media_attachments/files/111/171/815/265/251/689/original/857be269ae260230.jpg https://creative.ai/system/media_attachments/files/111/171/815/327/053/956/original/6d2ed6986567a27f.jpg https://creative.ai/system/media_attachments/files/111/171/815/381/219/330/original/6dff039bd0dcf0fa.jpg
📝 When Epipolar Constraint Meets Non-Local Operators in Multi-View Stereo 🔭 "An Epipolar Transformer performs non-local feature augmentation within a pair of lines: each point only attends the corresponding pair of epipolar lines, reducing the 2D search space into the epipolar line." [gal30b+] 🤖 #CV ⚙️ https://github.com/TQTQliu/ET-MVSNet 🔗 https://arxiv.org/abs/2309.17218v1 #arxiv https://creative.ai/system/media_attachments/files/111/171/520/252/671/750/original/18c48d3792448204.jpg https://creative.ai/system/media_attachments/files/111/171/520/315/336/387/original/d1190c4a5d77bc63.jpg https://creative.ai/system/media_attachments/files/111/171/520/368/967/379/original/e4a47df90f4a9d97.jpg https://creative.ai/system/media_attachments/files/111/171/520/421/130/775/original/146934126d88d96b.jpg
📝 Instant Complexity Reduction in CNNs Using Locality-Sensitive Hashing 🔭🧠 "Proposes Hashing for Tractable Efficiency (HASTE), which is a parameter-free and data-free module that acts as a plug-and-play replacement for regular convolution modules." [gal30b+] 🤖 #CV #LG 🔗 https://arxiv.org/abs/2309.17211v1 #arxiv https://creative.ai/system/media_attachments/files/111/171/284/361/519/267/original/c6c6fc0da7726cd8.jpg https://creative.ai/system/media_attachments/files/111/171/284/416/043/662/original/a18f669aa42d80a6.jpg https://creative.ai/system/media_attachments/files/111/171/284/468/186/104/original/d8287dc15a29852b.jpg https://creative.ai/system/media_attachments/files/111/171/284/527/469/605/original/7d19d4f98112001a.jpg
📝 Towards Complex-Query Referring Image Segmentation: A Novel Benchmark 🔭 "A niche-targeting method to better tackle the RIS-CQ task, called dual-modality graph alignment model (\textsc{DuMoGa})." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.17205v1 #arxiv https://creative.ai/system/media_attachments/files/111/170/989/489/534/022/original/47f8d6866bf45159.jpg https://creative.ai/system/media_attachments/files/111/170/989/546/689/813/original/6d0edba5064464c9.jpg https://creative.ai/system/media_attachments/files/111/170/989/605/608/490/original/00df36e52593a47f.jpg https://creative.ai/system/media_attachments/files/111/170/989/664/377/012/original/e6ad3babec75acd8.jpg
📝 Domain-Adaptive Learning: Unsupervised Adaptation for Histology Images with Improved Loss Function Combination 🔭 "A novel loss function and carefully selected existing loss functions are tailored to address the challenges specific to histology images, such as tissue structure and cell morphology, to enhance adaptation performance in the histology domain." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.17172v1 #arxiv https://creative.ai/system/media_attachments/files/111/170/635/542/425/599/original/2a14cd666c205aa9.jpg https://creative.ai/system/media_attachments/files/111/170/635/598/898/516/original/2dbe8193c55e9ef0.jpg https://creative.ai/system/media_attachments/files/111/170/635/656/342/256/original/0901f092e38d6781.jpg https://creative.ai/system/media_attachments/files/111/170/635/714/610/208/original/e40ee45828f2a4a1.jpg
📝 Continual Action Assessment via Task-Consistent Score-Discriminative Feature Distribution Modeling 🔭 "An Action General-Specific Graph is developed to learn and decouple the action-general and action-specific knowledge so that the task-consistent score-discriminative features can be better extracted across various tasks." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.17105v1 #arxiv https://creative.ai/system/media_attachments/files/111/170/340/670/493/076/original/6281d8f9264c9f44.jpg https://creative.ai/system/media_attachments/files/111/170/340/739/030/777/original/b9c78e5a840dcd0d.jpg https://creative.ai/system/media_attachments/files/111/170/340/796/631/124/original/eff65da4b86ac450.jpg https://creative.ai/system/media_attachments/files/111/170/340/850/607/585/original/856b964a345c8051.jpg
📝 Guiding Instruction-Based Image Editing via Multimodal Large Language Models 🔭 "Derives expressive instructions and provides explicit guidance to guide editing models to capture the visual imagination of natural instructions and manipulate images accordingly through end-to-end training." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.17102v1 #arxiv https://creative.ai/system/media_attachments/files/111/170/104/606/630/314/original/63ee3dff42c1e5b1.jpg https://creative.ai/system/media_attachments/files/111/170/104/695/562/706/original/f7ebb56cd937ca98.jpg https://creative.ai/system/media_attachments/files/111/170/104/748/431/672/original/c761273927676c1e.jpg https://creative.ai/system/media_attachments/files/111/170/104/794/734/595/original/51058f54c19e2d72.jpg
📝 Prototype-Based Aleatoric Uncertainty Quantification for Cross-Modal Retrieval 🔭 "Proposes a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity." [gal30b+] 🤖 #CV ⚙️ https://github.com/leolee99/PAU 🔗 https://arxiv.org/abs/2309.17093v1 #arxiv https://creative.ai/system/media_attachments/files/111/169/750/782/934/366/original/7610d236613e48b2.jpg https://creative.ai/system/media_attachments/files/111/169/750/844/598/621/original/7502da3207060b58.jpg https://creative.ai/system/media_attachments/files/111/169/750/898/280/807/original/b4868bee2721db40.jpg https://creative.ai/system/media_attachments/files/111/169/750/954/374/091/original/00827dbf5f7c6c2a.jpg
📝 SegRCDB: Semantic Segmentation via Formula-Driven Supervised Learning 🔭 "SegRCDB is based on insights about what is important in pre-training for semantic segmentation and allows efficient pre-training, which achieved higher mIoU than the pre-training with COCO-Stuff for fine-tuning on ADE-20k and Cityscapes with the same training images." [gal30b+] 🤖 #CV ⚙️ https://github.com/dahlian00/SegRCDB 🔗 https://arxiv.org/abs/2309.17083v1 #arxiv https://creative.ai/system/media_attachments/files/111/169/514/874/264/581/original/f30004ea7d1cd269.jpg https://creative.ai/system/media_attachments/files/111/169/514/933/714/417/original/6225ac3ff0a0453a.jpg https://creative.ai/system/media_attachments/files/111/169/515/006/864/470/original/7a2a40311e43dab1.jpg https://creative.ai/system/media_attachments/files/111/169/515/062/131/451/original/aab5f2a098c6e83f.jpg
📝 DeeDiff: Dynamic Uncertainty-Aware Early Exiting for Accelerating Diffusion Model Generation 🔭 "Proposes DeeDiff, an early exiting framework for diffusion models by introducing an uncertainty estimation module (UEM), which is attached to each intermediate layer to estimate the prediction uncertainty of each layer." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.17074v1 #arxiv https://creative.ai/system/media_attachments/files/111/169/278/848/744/715/original/f71cb6fd9ea2de97.jpg https://creative.ai/system/media_attachments/files/111/169/278/903/277/156/original/fe24a343f8f4f2a2.jpg https://creative.ai/system/media_attachments/files/111/169/278/960/200/737/original/9ebcb984e1c97383.jpg https://creative.ai/system/media_attachments/files/111/169/279/024/441/081/original/4f84c6114490d63f.jpg
📝 Imagery Dataset for Condition Monitoring of Synthetic Fibre Ropes 🔭 "A comprehensive dataset has been generated, comprising a total of 6,942 raw images representing both normal and defective synthetic fibre ropes (SFRs)." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.17058v1 #arxiv https://creative.ai/system/media_attachments/files/111/169/160/964/617/938/original/9730eb93ef2ba35e.jpg https://creative.ai/system/media_attachments/files/111/169/161/016/271/314/original/688a74fc4241d8d3.jpg https://creative.ai/system/media_attachments/files/111/169/161/093/428/754/original/ea604495384d072a.jpg
📝 A 5-Point Minimal Solver for Event Camera Relative Motion Estimation 🔭 "Proposes eventails, which represent spatio-temporal structures generated from line features in a sequence of events, in order to derive a new minimal solver for event-based linear velocity estimation from a known rotation." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.17054v1 #arxiv https://creative.ai/system/media_attachments/files/111/168/807/024/241/913/original/b0fd66056c49ec1a.jpg https://creative.ai/system/media_attachments/files/111/168/807/084/837/762/original/3b84a3139da052d2.jpg https://creative.ai/system/media_attachments/files/111/168/807/131/819/900/original/2ec0af6f4fa77fdf.jpg https://creative.ai/system/media_attachments/files/111/168/807/183/781/363/original/e7f522696941d15c.jpg
📝 On Uniform Scalar Quantization for Learned Image Compression 🔭 "Proposes a method based on stochastic uniform annealing for learned image compression, which has an adjustable temperature coefficient to control a tradeoff between the train-test mismatch and gradient estimation risk." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.17051v1 #arxiv https://creative.ai/system/media_attachments/files/111/168/571/140/353/271/original/df391fe473436458.jpg https://creative.ai/system/media_attachments/files/111/168/571/217/234/274/original/987246cf2f27253f.jpg https://creative.ai/system/media_attachments/files/111/168/571/292/327/805/original/49349f89009165b8.jpg https://creative.ai/system/media_attachments/files/111/168/571/370/041/885/original/5c8d3551596d8156.jpg
📝 HoloAssist: An Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World 🔭 "HoloAssist is a large-scale egocentric human interaction dataset, where two people complete collaborative physical manipulation tasks by wearing a mixed-reality headset that captures seven synchronized data streams." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.17024v1 #arxiv https://creative.ai/system/media_attachments/files/111/168/335/214/914/185/original/8f1d8dffa592d011.jpg https://creative.ai/system/media_attachments/files/111/168/335/275/795/828/original/2b41b20888d18aa2.jpg https://creative.ai/system/media_attachments/files/111/168/335/340/719/708/original/139c72125738322f.jpg https://creative.ai/system/media_attachments/files/111/168/335/405/444/107/original/7938a20a9f44020d.jpg
📝 Segment Anything Model Is a Good Teacher for Local Feature Learning 🔭🧠 "The SAMFeat consists of three modules, which are Pixel Semantic Relational Distillation (PSRD), Weakly Supervised Contrastive Learning Based on Semantic Grouping (WSC), and Edge Attention Guidance (EAG)." [gal30b+] 🤖 #CV #LG ⚙️ https://github.com/vignywang/SAMFeat 🔗 https://arxiv.org/abs/2309.16992v1 #arxiv https://creative.ai/system/media_attachments/files/111/167/981/286/351/887/original/e03d549f1e63f8e2.jpg https://creative.ai/system/media_attachments/files/111/167/981/347/788/754/original/a18174142f900ac6.jpg https://creative.ai/system/media_attachments/files/111/167/981/408/938/012/original/477c5e428bbee31d.jpg https://creative.ai/system/media_attachments/files/111/167/981/459/473/839/original/4a5ff57b8be51b2d.jpg
📝 SpikeMOT: Event-Based Multi-Object Tracking with Sparse Motion Features 🔭 "SpikeMOT leverages spiking neural networks to extract sparse spatiotemporal features from event streams associated with objects and to track the object movement at high-frequency, while a simultaneous object detector provides updated spatial information of these objects at an equivalent frame rate." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.16987v1 #arxiv https://creative.ai/system/media_attachments/files/111/167/568/438/459/184/original/c52a164aa2734ff4.jpg https://creative.ai/system/media_attachments/files/111/167/568/507/905/005/original/3c95557172ddba47.jpg https://creative.ai/system/media_attachments/files/111/167/568/567/703/387/original/b910010b041b6b65.jpg https://creative.ai/system/media_attachments/files/111/167/568/630/472/249/original/60ed644ff2f75daf.jpg
📝 COMNet: Co-Occurrent Matching for Weakly Supervised Semantic Segmentation 🔭 "A novel co-occurrent matching network is proposed which can perform inter-matching on paired images and intra-matching on single image to boost the class activation map." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.16959v1 #arxiv https://creative.ai/system/media_attachments/files/111/167/096/537/162/601/original/a9d2fc3b360bbd0a.jpg https://creative.ai/system/media_attachments/files/111/167/096/602/381/959/original/071072863ee87007.jpg https://creative.ai/system/media_attachments/files/111/167/096/651/215/083/original/0e200faceb6e4a75.jpg https://creative.ai/system/media_attachments/files/111/167/096/696/444/258/original/9d3120cb34840cbc.jpg
📝 CrossZoom: Simultaneously Motion Deblurring and Event Super-Resolving 🔭 "A novel unified multi-scale blur-event fusion neural network (CZ-Net) is proposed to jointly recover sharp latent sequences in the exposure period of a blurry input and the corresponding High-Resolution (HR) events." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.16949v1 #arxiv https://creative.ai/system/media_attachments/files/111/166/742/805/670/154/original/dc2c371caa3c2169.jpg https://creative.ai/system/media_attachments/files/111/166/742/888/309/276/original/c45ed630c5d21dd0.jpg https://creative.ai/system/media_attachments/files/111/166/742/956/595/481/original/4a402a20b2b26c74.jpg https://creative.ai/system/media_attachments/files/111/166/743/027/155/353/original/c8ab94020c3a957c.jpg
📝 Incremental Rotation Averaging Revisited and More: A New Rotation Averaging Benchmark 🔭 "Introduces a novel Incremental Rotation Averaging method (denoted as IRAv4), in which a task-specific connected dominating set is extracted to serve as a more reliable and accurate reference for rotation global alignment." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.16924v1 #arxiv https://creative.ai/system/media_attachments/files/111/166/329/904/793/952/original/7f7ea01b6ac014d1.jpg https://creative.ai/system/media_attachments/files/111/166/329/955/174/369/original/9cac35367bebf5b9.jpg https://creative.ai/system/media_attachments/files/111/166/330/040/802/289/original/809606d714418bcd.jpg
📝 YOLOR-Based Multi-Task Learning 🔭 "Learns a single model that can perform object detection, instance segmentation, semantic segmentation, and image captioning tasks with competitive performance on all tasks while maintaining a low parameter count and without any pre-training." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.16921v1 #arxiv https://creative.ai/system/media_attachments/files/111/166/093/809/899/869/original/5a22170b0cbf2b0d.jpg https://creative.ai/system/media_attachments/files/111/166/093/872/257/159/original/522b32619f79bd28.jpg https://creative.ai/system/media_attachments/files/111/166/093/938/301/053/original/b134b3f1f9e88f52.jpg https://creative.ai/system/media_attachments/files/111/166/093/996/125/014/original/2d22e273655ee31b.jpg
📝 Investigating Shift Equivalence of Convolutional Neural Networks in Industrial Defect Segmentation 🔭 "A novel pair of down/upsampling layers is proposed as a replacement for the conventional sampling layers in CNNs: component attention polyphase sampling (CAPS)." [gal30b+] 🤖 #CV ⚙️ https://github.com/xiaozhen228/CAPS 🔗 https://arxiv.org/abs/2309.16902v1 #arxiv https://creative.ai/system/media_attachments/files/111/165/798/971/534/419/original/8cd4a8fcbce7b439.jpg https://creative.ai/system/media_attachments/files/111/165/799/025/905/945/original/4b49f2fa6fb6f123.jpg https://creative.ai/system/media_attachments/files/111/165/799/102/599/283/original/5b8b7a2e8f97b575.jpg https://creative.ai/system/media_attachments/files/111/165/799/191/951/888/original/0ec7bfb8a8569da8.jpg
📝 On the Contractivity of Plug-and-Play Operators 🔭 "The proximal operator in algorithms like ISTA and ADMM is replaced by a powerful denoiser, such as BM3D or NLM, and this substitution is surprisingly effective in practice." [gal30b+] 🤖 #CV ⚙️ https://github.com/Bhartendu-Kumar/PnP-Conv 🔗 https://arxiv.org/abs/2309.16899v1 #arxiv https://creative.ai/system/media_attachments/files/111/165/504/024/929/310/original/9a44f8dc39203591.jpg https://creative.ai/system/media_attachments/files/111/165/504/080/521/186/original/4e4b976f260bdbca.jpg https://creative.ai/system/media_attachments/files/111/165/504/132/831/320/original/a733e13878d8079b.jpg https://creative.ai/system/media_attachments/files/111/165/504/195/889/615/original/3f5a001836ca82ec.jpg
📝 Space-Time Attention with Shifted Non-Local Search 🔭🧠 "The method, named Shifted Non-Local Search, executes a small grid search surrounding the predicted offsets to correct small spatial errors and achieves state-of-the-art results on video denoising." [gal30b+] 🤖 #CV #LG 🔗 https://arxiv.org/abs/2309.16849v1 #arxiv https://creative.ai/system/media_attachments/files/111/165/032/191/545/077/original/830843159d9a2052.jpg https://creative.ai/system/media_attachments/files/111/165/032/244/403/400/original/0109f3c6451f9e23.jpg https://creative.ai/system/media_attachments/files/111/165/032/298/280/140/original/5ddc9c142b715e03.jpg https://creative.ai/system/media_attachments/files/111/165/032/359/074/490/original/4e2e74918f66cc4a.jpg
📝 ELIP: Efficient Language-Image Pre-Training with Fewer Vision Tokens 🔭 "ELIP is a computation-efficient, memory-efficient and trainable-parameter-free pruning method with the supervision of language outputs for the pre-training of VL models." [gal30b+] 🤖 #CV ⚙️ https://github.com/guoyang9/ELIP 🔗 https://arxiv.org/abs/2309.16738v1 #arxiv https://creative.ai/system/media_attachments/files/111/164/796/242/176/258/original/a60498c0379237b1.jpg https://creative.ai/system/media_attachments/files/111/164/796/294/489/888/original/0e8730cdc5bcff74.jpg https://creative.ai/system/media_attachments/files/111/164/796/353/761/485/original/02dd943fdb56e718.jpg https://creative.ai/system/media_attachments/files/111/164/796/415/575/802/original/6707ba135a872a8a.jpg
📝 Automatic Cadastral Boundary Detection of Very High Resolution Images Using Mask R-CNN 🔭🧠 "Instance segmentation is used to solve this problem which uses Mask R-CNN and backbone of ResNet-50 pre-trained on ImageNet dataset, with some geometric post-processing on its output to improve the performance of detection and segmentation of buildings." [gal30b+] 🤖 #CV #LG 🔗 https://arxiv.org/abs/2309.16708v1 #arxiv https://creative.ai/system/media_attachments/files/111/164/442/485/280/383/original/a35b17ae13d88aad.jpg https://creative.ai/system/media_attachments/files/111/164/442/572/290/606/original/e91d1fac808fed58.jpg https://creative.ai/system/media_attachments/files/111/164/442/651/268/417/original/202aff0cfc2b0b4f.jpg https://creative.ai/system/media_attachments/files/111/164/442/739/692/061/original/566e9bcc1666ea06.jpg
📝 Framework and Model Analysis on Bengali Document Layout Analysis Dataset: BaDLAD 🔭🧠 "Focuses on understanding Bengali Document Layouts using advanced computer programs: Detectron2, YOLOv8, and SAM (Structured Assembler Module)." [gal30b+] 🤖 #CV #LG 🔗 https://arxiv.org/abs/2309.16700v1 #arxiv https://creative.ai/system/media_attachments/files/111/164/147/158/842/926/original/e7bf84ba483cb6a8.jpg https://creative.ai/system/media_attachments/files/111/164/147/212/927/399/original/76a151451f539548.jpg https://creative.ai/system/media_attachments/files/111/164/147/251/249/556/original/3fe9f5d1d1ec2f65.jpg https://creative.ai/system/media_attachments/files/111/164/147/296/641/529/original/15f12185fdac3976.jpg
📝 Learning to Transform for Generalizable Instance-Wise Invariance 🔭🧠 "The normalizing flow predicts a distribution over transformations that maximizes the likelihood of the transformed image under the distribution of training instances, conditioned on the class label of the original image." [gal30b+] 🤖 #CV #LG ⚙️ https://github.com/sutkarsh/flow_inv/ 🔗 https://arxiv.org/abs/2309.16672v1 #arxiv https://creative.ai/system/media_attachments/files/111/161/492/670/382/254/original/8a479f558780392c.jpg https://creative.ai/system/media_attachments/files/111/161/492/836/636/371/original/e1defcae4b76e24c.jpg https://creative.ai/system/media_attachments/files/111/161/492/944/138/219/original/f7857c4497e07298.jpg https://creative.ai/system/media_attachments/files/111/161/493/153/807/211/original/e0b4a285528cf05c.jpg
📝 Training a Large Video Model on a Single Machine in a Day 🔭 "Uses a combination of state-of-the-art techniques including data pre-fetching, data pipelining, mixed precision, gradient checkpointing, and tensor-core offloading to optimize each component of the pipeline (IO, CPU, and GPU computation)." [gal30b+] 🤖 #CV ⚙️ https://github.com/zhaoyue-zephyrus/AVION 🔗 https://arxiv.org/abs/2309.16669v1 #arxiv https://creative.ai/system/media_attachments/files/111/161/315/589/280/671/original/b975c4b69b2bf082.jpg https://creative.ai/system/media_attachments/files/111/161/315/690/930/304/original/1d4185fdec238feb.jpg https://creative.ai/system/media_attachments/files/111/161/315/751/932/320/original/e50d1064608eda54.jpg https://creative.ai/system/media_attachments/files/111/161/315/858/378/981/original/0532b988b41cd58d.jpg
📝 Visual in-Context Learning for Few-Shot Eczema Segmentation 🔭🧠 "Visual in-context learning with a generalist vision model called SegGPT is proposed for eczema segmentation from skin image dataset of annotated eczema images." [gal30b+] 🤖 #CV #LG 🔗 https://arxiv.org/abs/2309.16656v1 #arxiv https://creative.ai/system/media_attachments/files/111/161/138/903/474/082/original/bd9aed6963442edb.jpg https://creative.ai/system/media_attachments/files/111/161/138/980/765/559/original/1548fcbd428408f8.jpg https://creative.ai/system/media_attachments/files/111/161/139/056/143/096/original/93cb5aab23064bc7.jpg https://creative.ai/system/media_attachments/files/111/161/139/149/022/203/original/959dff3e7624a9cd.jpg
📝 Novel Deep Learning Pipeline for Automatic Weapon Detection 🔭 "An ensemble of convolutional neural networks is used to detect the presence of weapons in surveillance footage in real-time and with high accuracy and recall while maintaining high specificity." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.16654v1 #arxiv https://creative.ai/system/media_attachments/files/111/160/784/755/033/199/original/fa3650bdcf422318.jpg https://creative.ai/system/media_attachments/files/111/160/784/825/984/648/original/c4ede7e1a99fbe4c.jpg https://creative.ai/system/media_attachments/files/111/160/784/905/425/486/original/c81eaf10c68fb78a.jpg https://creative.ai/system/media_attachments/files/111/160/784/975/397/020/original/6f6c2accbb55e187.jpg
📝 Improving Equivariance in State-of-the-Art Supervised Depth and Normal Predictors 🔭 "Proposes an equivariant regularization technique, consisting of a spatial averaging procedure and a self-consistency loss, to explicitly promote cropping-and-resizing equivariance in depth and normal networks." [gal30b+] 🤖 #CV ⚙️ https://github.com/mikuhatsune/equivariance 🔗 https://arxiv.org/abs/2309.16646v1 #arxiv https://creative.ai/system/media_attachments/files/111/160/549/155/972/818/original/8f21dbb0759f9e21.jpg https://creative.ai/system/media_attachments/files/111/160/549/223/797/067/original/58bf4d225a0c0fb3.jpg https://creative.ai/system/media_attachments/files/111/160/549/290/799/136/original/8a17bd06ef4cd6aa.jpg https://creative.ai/system/media_attachments/files/111/160/549/360/559/201/original/b9d4bdde1dfb8716.jpg
📝 Deep Geometrized Cartoon Line Inbetweening 🔭 "Geometrizes anime line drawings into graphs and reframes the inbetweening task as a graph fusion problem with vertex repositioning, which is achieved via a vertex geometric embedding module, a vertex correspondence Transformer, an effective mechanism for vertex repositioning and a visibility predictor." [gal30b+] 🤖 #CV ⚙️ https://github.com/lisiyao21/AnimeInbet 🔗 https://arxiv.org/abs/2309.16643v1 #arxiv https://creative.ai/system/media_attachments/files/111/160/312/939/998/801/original/c343b9da0c5cd0d7.jpg https://creative.ai/system/media_attachments/files/111/160/313/010/429/884/original/5fb16f5f1065043a.jpg https://creative.ai/system/media_attachments/files/111/160/313/137/110/465/original/0db74ba3d007a1e5.jpg https://creative.ai/system/media_attachments/files/111/160/313/210/381/737/original/df9445a61c68235b.jpg
📝 End-to-End (Instance)-Image Goal Navigation Through Correspondence as an Emergent Phenomenon 🔭 "The first stage pretext task is cross-view completion, where the model is trained to take an RGB-D observation from view A (source view) and complete the depth from a different view B (target view)." [gal30b+] 🤖 #CV ⚙️ https://github.com/naver/croco 🔗 https://arxiv.org/abs/2309.16634v1 #arxiv https://creative.ai/system/media_attachments/files/111/160/135/958/555/129/original/99e5ff621492e9e4.jpg https://creative.ai/system/media_attachments/files/111/160/136/013/789/567/original/94e1d8e61b84df21.jpg https://creative.ai/system/media_attachments/files/111/160/136/075/645/168/original/0ca32a39ee60155b.jpg https://creative.ai/system/media_attachments/files/111/160/136/124/352/752/original/052307eba1d1122e.jpg
📝 KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing 🔭 "The key of the KV Inversion is the design of the Key-Value (KV) structure, which can make the editing result follow the action semantics, and at the same time, retain the original object texture." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.16608v1 #arxiv https://creative.ai/system/media_attachments/files/111/159/900/030/464/045/original/ad3491434e9596be.jpg https://creative.ai/system/media_attachments/files/111/159/900/088/128/945/original/eedc56e346b87cbd.jpg https://creative.ai/system/media_attachments/files/111/159/900/150/912/017/original/648ababa6c8737b9.jpg https://creative.ai/system/media_attachments/files/111/159/900/207/834/911/original/8e8b13e93f5a98d5.jpg
📝 Tensor Factorization for Leveraging Cross-Modal Knowledge in Data-Constrained Infrared Object Detection 🔭🧠 "TensorFact decomposes each layer of a deep network (ResNet) into 3 factor matrices, one each for the spatial, filter and channel dimensions, thus reducing the total number of parameters." [gal30b+] 🤖 #CV #LG 🔗 https://arxiv.org/abs/2309.16592v1 #arxiv https://creative.ai/system/media_attachments/files/111/159/605/165/629/581/original/b4202e3ba35a2594.jpg https://creative.ai/system/media_attachments/files/111/159/605/217/409/963/original/66780ea40819109b.jpg https://creative.ai/system/media_attachments/files/111/159/605/270/865/456/original/98bc9047f3e24faf.jpg https://creative.ai/system/media_attachments/files/111/159/605/322/441/680/original/d13308cf57d92c11.jpg
📝 Vision Transformers Need Registers 🔭 "Adds additional tokens to the input sequence of the vision transformer that fills the role of low informative background areas, thus making the feature maps and attention maps more smooth." [gal30b+] 🤖 #CV ⚙️ https://github.com/facebookresearch/deit 🔗 https://arxiv.org/abs/2309.16588v1 #arxiv https://creative.ai/system/media_attachments/files/111/159/487/324/163/321/original/71b7e5ea3a9fbb6c.jpg https://creative.ai/system/media_attachments/files/111/159/487/415/279/611/original/2882cbe22a00ec33.jpg https://creative.ai/system/media_attachments/files/111/159/487/477/678/088/original/7ce633fe7ac9acec.jpg https://creative.ai/system/media_attachments/files/111/159/487/537/859/843/original/842774180aaab9fa.jpg
📝 Latent Noise Segmentation: How Neural Noise Leads to the Emergence of Segmentation and Grouping 🔭 "Neural noise can be used to separate objects from each other in the presence of background clutter, without any additional supervision, even in the presence of illusory contours, occlusion, and continuity." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.16515v1 #arxiv https://creative.ai/system/media_attachments/files/111/159/251/256/165/954/original/c0e0919220a669c4.jpg https://creative.ai/system/media_attachments/files/111/159/251/319/576/505/original/57256cea6fb19969.jpg https://creative.ai/system/media_attachments/files/111/159/251/386/004/005/original/3275adf610a5cdbe.jpg https://creative.ai/system/media_attachments/files/111/159/251/464/338/518/original/1c09d7c7b8c1aa4e.jpg
📝 CCEdit: Creative and Controllable Video Editing via Diffusion Models 🔭 "A ControlNet-based architecture that decouples structural and appearance aspects of a video, while maintaining consistency between them, to accommodate a wide spectrum of user editing requirements." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.16496v1 #arxiv https://creative.ai/system/media_attachments/files/111/158/897/239/141/087/original/b3cac78eace6c7b0.jpg https://creative.ai/system/media_attachments/files/111/158/897/321/175/207/original/c633bd74ae26b9cb.jpg https://creative.ai/system/media_attachments/files/111/158/897/376/640/620/original/d3daddf977dc0db5.jpg https://creative.ai/system/media_attachments/files/111/158/897/467/713/539/original/061a3f716402b0b3.jpg
📝 Deep Single Models vs. Ensembles: Insights for a Fast Deployment of Parking Monitoring Systems 🔭🧠 "Uses a deep learning model trained on publicly available data, which is fine-tuned on a small set of labeled target parking lot images (few-shot learning)." [gal30b+] 🤖 #CV #LG 🔗 https://arxiv.org/abs/2309.16495v1 #arxiv https://creative.ai/system/media_attachments/files/111/158/661/461/013/402/original/620b71484154b67c.jpg https://creative.ai/system/media_attachments/files/111/158/661/507/591/140/original/2988f259f8abb6aa.jpg https://creative.ai/system/media_attachments/files/111/158/661/552/808/098/original/8e817b4f59703658.jpg https://creative.ai/system/media_attachments/files/111/158/661/601/571/649/original/30b810f835b913a1.jpg
📝 Accurate and Lightweight Dehazing via Multi-Receptive-Field Non-Local Network and Novel Contrastive Regularization 🔭 "Multi-receptive-field non-local network (MRFNLN) is presented, which contains the multi-stream feature attention block (MSFAB) and cross non-local block (CNLB)." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.16494v1 #arxiv https://creative.ai/system/media_attachments/files/111/158/484/420/281/759/original/cb0393522c76cbe9.jpg https://creative.ai/system/media_attachments/files/111/158/484/491/069/512/original/610cbf2f7c9b0e7a.jpg https://creative.ai/system/media_attachments/files/111/158/484/549/238/784/original/4bb92e2994cede5c.jpg https://creative.ai/system/media_attachments/files/111/158/484/603/635/220/original/5bc7ddd64382316e.jpg
📝 Rethinking Domain Generalization: Discriminability and Generalizability 🔭 "A novel framework called Discriminative Microscopic Distribution Alignment is presented which concurrently imbues features with formidable discriminability and robust generalizability, consisting of two core components: Selective Channel Pruning and Micro-level Distribution Alignment." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.16483v1 #arxiv https://creative.ai/system/media_attachments/files/111/158/248/517/409/268/original/9fc798ea45ea6276.jpg https://creative.ai/system/media_attachments/files/111/158/248/582/049/122/original/ad12099a690df993.jpg https://creative.ai/system/media_attachments/files/111/158/248/646/805/213/original/03c39eae4e48c41e.jpg https://creative.ai/system/media_attachments/files/111/158/248/698/535/820/original/1eedcdb80b9e53bb.jpg
📝 Diverse Target and Contribution Scheduling for Domain Generalization 🔭 "Consists of Diverse Target Supervision (DTS) and Diverse Contribution Balance (DCB), with the aim of addressing the limitations associated with the common utilization of one-hot labels and equal contributions for source domains in Domain Generalization." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.16460v1 #arxiv https://creative.ai/system/media_attachments/files/111/158/071/571/406/496/original/ccbd8f20086db09f.jpg https://creative.ai/system/media_attachments/files/111/158/071/643/723/537/original/ae19ac05a5442a2d.jpg
📝 Towards Novel Class Discovery: A Study in Novel Skin Lesions Clustering 🔭 "Proposes a new framework for automatic discovery of new semantic classes from skin lesion dataset based on the knowledge of known classes, which leverages contrastive learning, multi-view cross pseudo-supervision, and neighborhood consensus." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.16451v1 #arxiv https://creative.ai/system/media_attachments/files/111/157/717/711/212/766/original/37939d22ac8cf9f0.jpg
📝 Distilling ODE Solvers of Diffusion Models Into Smaller Steps 🔭 "Proposes a straightforward distillation approach that optimizes the ODE solver rather than training the denoising network, resulting in a new ODE solver with improved speed-quality tradeoffs and preserving the sampling trajectory from the ODE solver." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.16421v1 #arxiv https://creative.ai/system/media_attachments/files/111/157/422/823/289/120/original/be857b7ef3816b27.jpg https://creative.ai/system/media_attachments/files/111/157/422/914/430/625/original/9917f6ff420e3538.jpg https://creative.ai/system/media_attachments/files/111/157/422/978/490/917/original/b0b89b48967f2c21.jpg https://creative.ai/system/media_attachments/files/111/157/423/079/465/247/original/f84166a05b90aba0.jpg
📝 HIC-YOLOv5: Improved YOLOv5 for Small Object Detection 🔭 "Works by adding an additional prediction head specific to small object detection in the original YOLOv5, adopting an involution block in between the backbone and neck to increase channel information of the feature map, an attention mechanism named CBAM is applied at the end of the backbone." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.16393v1 #arxiv https://creative.ai/system/media_attachments/files/111/157/127/859/413/736/original/d2fd625f286025e1.jpg https://creative.ai/system/media_attachments/files/111/157/127/910/096/735/original/fbf9d58115e0af29.jpg https://creative.ai/system/media_attachments/files/111/157/127/962/300/587/original/c3eeb62306dd4da2.jpg https://creative.ai/system/media_attachments/files/111/157/128/013/863/562/original/91ba5f6738cf5ebd.jpg
📝 Aperture Diffraction for Compact Snapshot Spectral Imaging 🔭 "Designs Aperture Diffraction Imaging Spectrometer (ADIS) which consists only of an imaging lens with an ultra-thin orthogonal aperture mask and a mosaic filter sensor, which realizes sub-super-pixel spatial resolution and high spectral resolution imaging by elaborating the imaging optical theory and reconstruction algorithm." [gal30b+] 🤖 #CV ⚙️ https://github.com/Krito-ex/CSST 🔗 https://arxiv.org/abs/2309.16372v1 #arxiv https://creative.ai/system/media_attachments/files/111/156/832/982/712/560/original/c58f6b682e953f24.jpg https://creative.ai/system/media_attachments/files/111/156/833/046/623/138/original/e84a6fd62366f06d.jpg https://creative.ai/system/media_attachments/files/111/156/833/105/013/561/original/21cdde2331715688.jpg https://creative.ai/system/media_attachments/files/111/156/833/162/295/006/original/9a73766db9a38b39.jpg
📝 Dark Side Augmentation: Generating Diverse Night Examples for Metric Learning 🔭 "Training a GAN based synthetic-image generator translating available day-time image examples into night images and using it in metric learning as a form of augmentation and supplying training data to the scarce domain." [gal30b+] 🤖 #CV ⚙️ https://github.com/mohwald/gandtr 🔗 https://arxiv.org/abs/2309.16351v1 #arxiv https://creative.ai/system/media_attachments/files/111/156/597/004/754/981/original/a4d360fc7f27a7ab.jpg https://creative.ai/system/media_attachments/files/111/156/597/056/547/925/original/db1bddfbf824cb41.jpg https://creative.ai/system/media_attachments/files/111/156/597/109/489/416/original/a2af030d40e4346b.jpg https://creative.ai/system/media_attachments/files/111/156/597/168/236/470/original/7c09b815ee815422.jpg
📝 Logarithm-Transform Aided Gaussian Sampling for Few-Shot Learning 🔭 "Proposes a novel Gaussian transform, that utilises Gaussianisation, and transforms experimental data into Gaussian-like distributions to train classifiers for few-shot classification tasks on the Omniglot dataset." [gal30b+] 🤖 #CV ⚙️ https://github.com/ganatra-v/gaussian-sampling-fsl 🔗 https://arxiv.org/abs/2309.16337v1 #arxiv https://creative.ai/system/media_attachments/files/111/156/302/096/263/333/original/21737f0c3730542b.jpg https://creative.ai/system/media_attachments/files/111/156/302/149/233/344/original/bafa86cb634e48d3.jpg https://creative.ai/system/media_attachments/files/111/156/302/202/350/549/original/67dcd466b1a7d4e9.jpg https://creative.ai/system/media_attachments/files/111/156/302/259/410/957/original/29bac87f4127ae28.jpg
📝 Can the Query-Based Object Detector Be Designed with Fewer Stages? 🔭 "GOLO is a query-based object detector with a two-stage decoder consisting of once query-to-query interaction in global decoder and once query-to-image interaction in local decoder." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.16306v1 #arxiv https://creative.ai/system/media_attachments/files/111/156/066/248/912/187/original/09574ef17568f130.jpg https://creative.ai/system/media_attachments/files/111/156/066/310/302/699/original/5430049bdb1097e6.jpg https://creative.ai/system/media_attachments/files/111/156/066/358/296/946/original/accdf3a93bba983b.jpg https://creative.ai/system/media_attachments/files/111/156/066/411/616/297/original/3c00b3d722921498.jpg
📝 GAFlow: Incorporating Gaussian Attention Into Optical Flow 🔭 "AGaussian-Constrained Layer (GCL) is proposed to highlight the local neighborhood during feature extraction while the proposed Gaussian-Guided Attention Module (G-GAM) is able to enforce the motion affinity during matching." [gal30b+] 🤖 #CV ⚙️ https://github.com/LA30/GAFlow 🔗 https://arxiv.org/abs/2309.16217v1 #arxiv https://creative.ai/system/media_attachments/files/111/155/889/149/508/226/original/1800867cb22dc937.jpg https://creative.ai/system/media_attachments/files/111/155/889/213/116/632/original/440857e316539ad3.jpg https://creative.ai/system/media_attachments/files/111/155/889/263/698/893/original/0a814baf33e2398f.jpg https://creative.ai/system/media_attachments/files/111/155/889/319/535/420/original/5d31b8e41b7a1885.jpg
📝 Nonconvex Third-Order Tensor Recovery Based on Logarithmic Minimax Function 🔭 "Can protect large singular values while imposing stronger penalization on small singular values, thus leading to an improved low-rank tensor recovery performance compared with other state-of-the-art methods." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.16208v1 #arxiv https://creative.ai/system/media_attachments/files/111/155/653/130/438/723/original/552dad1ccf8c67d0.jpg https://creative.ai/system/media_attachments/files/111/155/653/215/082/269/original/2dd0e8cf0afe0191.jpg https://creative.ai/system/media_attachments/files/111/155/653/297/178/457/original/05b4025559d394b0.jpg https://creative.ai/system/media_attachments/files/111/155/653/353/929/424/original/a7844fdff956e40f.jpg
📝 Parameter-Saving Adversarial Training: Reinforcing Multi-Perturbation Robustness via Hypernetworks 🔭 "PSAT utilizes hypernetworks to train specialized models against a single perturbation and aggregates these specialized models to defend against multiple perturbations, achieving multi-perturbation robustness and parameter efficiency." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.16207v1 #arxiv https://creative.ai/system/media_attachments/files/111/155/476/230/035/664/original/329ff341e9187d3f.jpg https://creative.ai/system/media_attachments/files/111/155/476/302/252/130/original/ffe0b29b3d1a467e.jpg
📝 Two-Step Active Learning for Instance Segmentation with Uncertainty and Diversity Sampling 🔭🧠 "Uncertainty-based sampling and diversity-based sampling are used to select the most informative images for labeling in a post-hoc setup where the segmentation model has already been trained." [gal30b+] 🤖 #CV #LG 🔗 https://arxiv.org/abs/2309.16139v1 #arxiv https://creative.ai/system/media_attachments/files/111/155/240/478/424/709/original/67e91a9c9e7d351b.jpg https://creative.ai/system/media_attachments/files/111/155/240/541/462/960/original/b961e10f9315abd2.jpg https://creative.ai/system/media_attachments/files/111/155/240/600/361/566/original/e7871896e1efc1a0.jpg
📝 Context-I2w: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval 🔭 "An Intent View Selector first dynamically learns a rotation rule to map the identical image to a task-specific manipulation view, and a Visual Target Extractor further captures local information covering the main targets in ZS-CIR tasks." [gal30b+] 🤖 #CV ⚙️ https://github.com/Pter61/context_i2w 🔗 https://arxiv.org/abs/2309.16137v1 #arxiv https://creative.ai/system/media_attachments/files/111/154/945/345/167/916/original/03df4b97a0bbe400.jpg https://creative.ai/system/media_attachments/files/111/154/945/395/137/752/original/da7fbf6a887cb408.jpg https://creative.ai/system/media_attachments/files/111/154/945/448/085/893/original/da8bc1ef1da00343.jpg https://creative.ai/system/media_attachments/files/111/154/945/511/610/530/original/bee4c8b80cddc2cc.jpg
📝 Open Compound Domain Adaptation with Object Style Compensation for Semantic Segmentation 🔭 "Enables a more accurate computation of the pseudo annotations for target domain's images, thus yielding state-of-the-art results on different datasets." [gal30b+] 🤖 #CV 🔗 https://arxiv.org/abs/2309.16127v1 #arxiv https://creative.ai/system/media_attachments/files/111/154/709/587/924/412/original/67200e768db33672.jpg https://creative.ai/system/media_attachments/files/111/154/709/650/982/145/original/c5305db2b303149e.jpg https://creative.ai/system/media_attachments/files/111/154/709/702/853/358/original/89b9ecf964a31cd3.jpg https://creative.ai/system/media_attachments/files/111/154/709/762/312/368/original/861be320113c4ad9.jpg
Notes by 9a622e93 | export