Datasets:
answer
string
| choice_a
string
| choice_b
string
| choice_c
string
| choice_d
string
| data_id
string
| data_type
string
| question
string
| question_id
string
| question_type_id
string
| segment
string
|
---|---|---|---|---|---|---|---|---|---|---|
"A" | "One" | "Two" | "Three" | "Four" | "1454426_2591111986" | "image" | "How many towels are in the image?" | "101669" | "5" | null |
"C" | "A hotel" | "A house" | "A cabin" | "A shed" | "1307737_3736205576" | "image" | "What type of building is in the image?" | "104933" | "3" | null |
"C" | "Standing with his arms crossed" | "Holding a cell phone" | "Taking a picture" | "Talking to someone" | "2809357_337019870" | "image" | "What is the man in the suit doing in the image?" | "89257" | "1" | null |
"A" | "White" | "Black" | "Gray" | "Brown" | "124217_564854171" | "image" | "What is the color of the chair seen on the right side of the image?" | "75209" | "3" | null |
"D" | "One" | "Two" | "Three" | "Four" | "175998_3410025962" | "image" | "How many people are present in the image?" | "34143" | "5" | null |
"B" | "One" | "Two" | "Three" | "Four" | "353719_422296094" | "image" | "How many men are on the airplane in the image?" | "39514" | "5" | null |
"D" | "Full" | "Stubble" | "Goatee" | "None" | "10350_546800778" | "image" | "What type of beard does one of the men in the car have?" | "8278" | "2" | null |
"B" | "Mouth" | "Floor" | "Ear" | "Toy" | "330536_2064876201" | "image" | "Which of the following objects are not detected in the image?" | "17213" | "2" | null |
"D" | "They are teammates" | "Cannot be determined" | "They are not interacting" | "They are opponents" | "653976_583609630" | "image" | "What is the relation between the soccer player and the football player?" | "62193" | "7" | null |
"B" | "White" | "Brown" | "Black" | "Gray" | "1542755_2483061906" | "image" | "What is the predominant color of the sand on the beach?" | "54363" | "3" | null |
"D" | "A cloudy sky" | "A clear blue sky" | "Trees and mountains" | "Buildings and houses" | "245140_725561305" | "image" | "What is in the background of the image?" | "105495" | "1" | null |
"C" | "Three" | "Two" | "One" | "Four" | "194865_1732781425" | "image" | "How many people are in the image?" | "77550" | "5" | null |
"B" | "A group of people playing soccer in a park" | "A crowd of people watching a live music performance" | "A group of people attending a political rally" | "A street performer playing music while people pass by" | "1488424_229203411" | "image" | "What is the main event taking place in the image?" | "26302" | "1" | null |
"A" | "A suit and tie" | "A casual t-shirt" | "A Hawaiian shirt" | "A sweater and jeans" | "2955271_3530307469" | "image" | "What is the man wearing in the image?" | "78896" | "3" | null |
"D" | "Guitar" | "Microphone" | "Arm" | "Shoe" | "2871019_2604587489" | "image" | "Which object in the image has an attribute of "leather"?" | "16402" | "8" | null |
"C" | "Sparrows" | "Hummingbirds" | "None of the above" | "White doves" | "2924904_2953932263" | "image" | "Which bird species is not present in the image?" | "59136" | "2" | null |
"D" | "A bedroom" | "A kitchen" | "A living room" | "A dining room" | "653182_1768283755" | "image" | "What is the main focus of the image?" | "47628" | "1" | null |
"C" | "The gift bag and candy canes in the foreground." | "The full moon in the background." | "The character on a plane with a bag of presents." | "The red heart with white letters." | "3201538_4179577514" | "image" | "What is the focus of the image?" | "47496" | "1" | null |
"D" | "Black" | "White" | "Gray" | "Beige" | "943658_438770307" | "image" | "What is the color of the man's suit?" | "49203" | "3" | null |
"B" | "Coffee table" | "Desk" | "Dining table" | "Side table" | "3053315_1422166703" | "image" | "What type of furniture is located in the center of the room in the image?" | "84847" | "4" | null |
"C" | "Bicycle" | "Television" | "All of the above" | "Window" | "1247129_3859787790" | "image" | "What objects are in the room besides the people and the guitar?" | "16288" | "2" | null |
"A" | "Crystal clear" | "Muddy and murky" | "Blue and calm" | "Rough and wavy" | "1608383_2048478380" | "image" | "How does the water appear in the image?" | "31218" | "3" | null |
"C" | "The man's hair and beard are the same length" | "The man's beard is longer than his hair" | "The man's hair is longer than his beard" | "The man has no beard" | "2536164_2771794248" | "image" | "What is the spatial relation between the man's hair and his beard?" | "18949" | "7" | null |
"D" | "Jumping" | "Sitting" | "Lying down" | "Standing" | "2204370_2732528001" | "image" | "How is the woman with blonde hair positioned in the image?" | "77411" | "3" | null |
"C" | "A mountain" | "The sky" | "A river" | "A tree" | "988218_3421889857" | "image" | "What is the primary natural element in the image?" | "55162" | "1" | null |
"A" | "A couple in the center" | "A bench in the foreground" | "A tree in the background" | "The sky in the distance" | "243640_4148469903" | "image" | "What is the main focus of the image?" | "11600" | "1" | null |
"A" | "Centered in the middle of the image" | "In the foreground, left side of the image" | "In the background, right side of the image" | "Cannot be determined" | "2287922_3072284593" | "image" | "What is the position of the red car in the image?" | "42998" | "4" | null |
"A" | "The tall skyscrapers in the city skyline" | "The boats and ships in the dock" | "The clock tower" | "The large body of water in front of the city" | "1091756_1159571361" | "image" | "What is the prominent feature of the image?" | "73159" | "1" | null |
"D" | "Rock" | "Window" | "Grass" | "Bench" | "89887_623811299" | "image" | "Which of the following objects is not detected in the image?" | "47557" | "2" | null |
"C" | "Yes" | "Cannot determine from the given information" | "No" | "None of the above" | "1224262_1570424753" | "image" | "Are there any animals visible in the image?" | "62290" | "2" | null |
"D" | "Living room" | "Bathroom" | "Kitchen" | "Classroom" | "232342_3583117003" | "image" | "What type of room is the image taken in?" | "7931" | "3" | null |
"A" | "A river" | "A pine tree" | "A wooden bench" | "A dirt road" | "660738_3936044231" | "image" | "Which of the following objects is NOT present in the image?" | "100166" | "2" | null |
"A" | "7" | "Can't be determined from the given information." | "15" | "10" | "2015086_3569149329" | "image" | "How many football players are present on the field in the image?" | "19816" | "5" | null |
"B" | "One" | "Four" | "Three" | "Two" | "1892671_1198054679" | "image" | "How many men are in the image?" | "50436" | "5" | null |
"C" | "In front of the camera" | "Beside the camera" | "Behind the camera" | "On top of the camera" | "2075762_1276402569" | "image" | "What is the position of the man relative to the camera in the photo?" | "7369" | "6" | null |
"B" | "Nature" | "Sports" | "Transportation" | "Music" | "1508859_2047500887" | "image" | "What is the overall theme of the image?" | "75111" | "1" | null |
"A" | "Black" | "Brown" | "Blond" | "Red" | "3217634_4249119237" | "image" | "What color is the woman's hair?" | "23878" | "3" | null |
"D" | "A view of the sky and some fluffy clouds" | "A tranquil forest with tall trees and a babbling brook" | "A large outdoor swimming pool surrounded by palm trees" | "A busy city complete with skyscrapers and buildings" | "518905_2696169184" | "image" | "What is on the other side of the large windows in the image?" | "4646" | "1" | null |
"B" | "1" | "3" | "2" | "4" | "2860282_964608455" | "image" | "How many hot air balloons are in the image?" | "47402" | "5" | null |
"C" | "Two" | "One" | "Three" | "Four" | "1314641_1837183097" | "image" | "How many people are in the car?" | "40775" | "5" | null |
"D" | "A bustling city with many landmarks visible" | "A snowy landscape with people snowboarding and skiing" | "A peaceful countryside scene with grazing animals" | "People enjoying leisure time at a sunny beach" | "2516531_637298721" | "image" | "What is the scene in the image like?" | "97708" | "1" | null |
"D" | "Next to the dining room table" | "It cannot be determined from the given information" | "There is no refrigerator in the image." | "In the kitchen area" | "825491_486878907" | "image" | "Where is the refrigerator located in the image?" | "3772" | "4" | null |
"A" | "A blue boat" | "A white sailboat" | "A beach" | "The sunset" | "2376839_3001418676" | "image" | "What is the main object in the image?" | "6898" | "1" | null |
"B" | "None" | "Two" | "One" | "Three" | "44538_2484200017" | "image" | "How many brown leaves can be seen in the image?" | "76212" | "5" | null |
"A" | "Circular" | "Rectangular" | "Triangular" | "Hexagonal" | "2354388_3479851363" | "image" | "What is the shape of the drum in the image?" | "78130" | "3" | null |
"A" | "Inside the tent" | "In front of the tent on the left side" | "In front of the tent on the right side" | "Behind the tent" | "256106_1396163876" | "image" | "Where is the woman standing in relation to the tent?" | "11857" | "6" | null |
"C" | "In the center" | "On the left" | "On the right" | "In the background" | "1475510_3914256759" | "image" | "Where is the guitar positioned in the image?" | "2811" | "4" | null |
"A" | "A fighter jet" | "A cargo plane" | "A commercial passenger jet" | "A helicopter" | "189886_1502552897" | "image" | "What type of plane is visible in the image?" | "41225" | "3" | null |
"B" | "White" | "Pink" | "Blue" | "Brown" | "2047123_319199038" | "image" | "What is the color of the cat's nose?" | "104561" | "3" | null |
"A" | "People are playing soccer." | "People are playing basketball." | "People are playing volleyball." | "People are playing baseball." | "311166_4276059141" | "image" | "What is happening in this image?" | "48323" | "7" | null |
"B" | "A person wearing goggles" | "A dog wearing an orange life vest" | "A small cat wearing a purple collar" | "A seagull with orange feet" | "1625886_961534325" | "image" | "What is riding on the surfboard in the ocean?" | "76703" | "2" | null |
"A" | "Gray" | "White" | "Black" | "Brown" | "87179_4272019300" | "image" | "What is the color of the bird in the image?" | "54128" | "3" | null |
"A" | "Playground" | "Roof" | "Bench" | "House" | "106803_661231283" | "image" | "Which of these elements is not visible in the image?" | "30312" | "2" | null |
"A" | "Yellow" | "Black" | "Green" | "Red" | "1132177_1759824972" | "image" | "What color is the picture frame detected in the attribute detection?" | "35121" | "3" | null |
"B" | "Elegant and minimalistic" | "Traditional and classic" | "Bright and colorful" | "Rustic and farmhouse" | "132352_1612169772" | "image" | "Which of these phrases describes the room's decor?" | "91233" | "1" | null |
"A" | "Brown" | "Black" | "Gray" | "Blue" | "3062189_535139107" | "image" | "What color are the shoes worn by the man in the image?" | "57917" | "3" | null |
"B" | "Rural" | "Suburban" | "Urban" | "Industrial" | "696941_655765094" | "image" | "What type of environment is the image captured in?" | "35282" | "1" | null |
"D" | "Not enough information" | "No" | "Cannot be determined" | "Yes" | "2842927_2452501499" | "image" | "Is there a person sitting in the stands?" | "71157" | "2" | null |
"D" | "Red and black" | "Blue and white" | "Yellow and green" | "Gray and brown" | "101577_3283049165" | "image" | "What is the main color scheme of the image?" | "77984" | "1" | null |
"C" | "Piano" | "Drums" | "Guitar" | "Violin" | "3180956_3397523392" | "image" | "What is the primary instrument being played in the image?" | "20102" | "2" | null |
"A" | "1" | "0" | "2" | "3" | "861881_3593300004" | "image" | "How many people are in the image?" | "77811" | "5" | null |
"B" | "Coral reef" | "Palm tree" | "Seashell" | "Fish" | "173943_1244813466" | "image" | "Which of the following is not present in the image?" | "88157" | "2" | null |
"D" | "Round" | "Rectangular" | "Triangular" | "It is impossible to determine from the given information" | "129664_2832120937" | "image" | "What is the shape of the desert on the table in the image?" | "57869" | "1" | null |
"B" | "Pink" | "Green" | "Blue" | "Red" | "1298043_1422062697" | "image" | "What kind of dress is the girl wearing in the image?" | "101936" | "3" | null |
"C" | "Left side of the stage" | "Right side of the stage" | "Center of the stage" | "They are sitting in front of the stage" | "2639615_1938166777" | "image" | "Where is the group of people playing music positioned on the stage?" | "1817" | "1" | null |
"B" | "Black and white" | "Red and white" | "Blue and white" | "Green and white" | "380790_840433008" | "image" | "What is the color theme of the clothing in the image?" | "89612" | "3" | null |
"C" | "Black" | "Blue" | "Pink" | "Red" | "467207_952932770" | "image" | "What color is the basket on the pink bike?" | "13046" | "3" | null |
"B" | "No" | "Yes" | "Cannot be determined with certainty" | "The answer is in the attribute detections" | "2167615_1061051137" | "image" | "Is there any indication of a drum or drums present in the image?" | "55172" | "2" | null |
"B" | "Dancing and cheering" | "Singing and playing instruments" | "Standing and watching the crowd" | "Resting and taking a break" | "1273098_149139029" | "image" | "What are the performers doing in the image?" | "42399" | "1" | null |
"B" | "The blue team" | "The black team" | "The red team" | "The orange and white team" | "2580537_1156369180" | "image" | "Which team is celebrating after the game?" | "89024" | "1" | null |
"C" | "Reflective and tranquil with a single figure and a dog." | "Busy with people and dogs." | "Exciting with surfers catching waves." | "Dreary and dull with no activity." | "2172010_3598729033" | "image" | "What is the general tone of the scene?" | "104474" | "1" | null |
"C" | "2" | "3" | "4" | "5" | "2948438_953295019" | "image" | "How many hockey equipment items can be seen in the image?" | "35433" | "5" | null |
"D" | "Mountains" | "Animals" | "Trees" | "Building" | "127755_397753326" | "image" | "What is visible in the image besides the sea?" | "20128" | "2" | null |
"C" | "Brown" | "Grey" | "Beige" | "Green" | "357657_3661797802" | "image" | "What color is the couch in the living room?" | "95039" | "3" | null |
"A" | "Trees" | "People" | "Fog" | "Dogs" | "374564_3761272675" | "image" | "What is the primary element in the foreground of the image?" | "10269" | "1" | null |
"B" | "Oak tree" | "Palm tree" | "Maple tree" | "Pine tree" | "1019827_1353321998" | "image" | "What type of tree can be seen multiple times in the image?" | "29425" | "3" | null |
"D" | "A bunch of flowers" | "A bunch of popsicles" | "A colorful hand" | "A bunch of balloons" | "861620_2812087232" | "image" | "What is the main object in the image?" | "79263" | "2" | null |
"D" | "They are homeless" | "They are street performers" | "They are waiting for a parade" | "They are tourists" | "1817914_1529607004" | "image" | "What can be inferred about the group of people sitting on the street?" | "908" | "8" | null |
"D" | "In the center of the image" | "In the top left corner of the image" | "In the bottom right corner of the image" | "Spread out across the image" | "380950_539887679" | "image" | "Where is the group of cookies with bears and trees located in the image?" | "15481" | "4" | null |
"C" | "Gray" | "Brown" | "Green" | "Blue" | "558919_4239930259" | "image" | "What is the predominant color of the field in the image?" | "70142" | "3" | null |
"C" | "Young and beardless" | "Bald with glasses" | "Old and bearded" | "Wearing a hat and a suit" | "79487_859112306" | "image" | "What is the man's appearance?" | "24097" | "3" | null |
"C" | "A red coat" | "A white shirt" | "A black sweater" | "A green dress" | "1683041_1696614759" | "image" | "What is the woman in the image wearing?" | "23211" | "3" | null |
"D" | "Natural landscape" | "Cultural festival" | "Urban architecture" | "Sports competition" | "216265_1786863825" | "image" | "What is the main theme of the image?" | "15899" | "1" | null |
"D" | "A cowboy hat" | "A beanie" | "A bike helmet" | "A baseball cap" | "675_1302587992" | "image" | "What type of equipment is the person standing closest to the base wearing on their head?" | "25989" | "3" | null |
"D" | "Green" | "Black" | "Orange" | "Brown" | "380950_539887679" | "image" | "What is the dominant color in the image?" | "15507" | "1" | null |
"B" | "White" | "Blue" | "Green" | "Gray" | "2872365_816268138" | "image" | "What color is prominent in the image?" | "9865" | "1" | null |
"B" | "Cannot determine from the information provided" | "Close to each other" | "Far from each other" | "Touching" | "8362_2563544936" | "image" | "What is the relative position of the woman's lips and her angry face?" | "5378" | "6" | null |
"B" | "Walking" | "Standing" | "Sitting" | "Running" | "2661794_4181374972" | "image" | "What are the people in the image doing?" | "102121" | "3" | null |
"D" | "White" | "Grey" | "Green" | "Blue" | "3185112_3608486333" | "image" | "What is the overall color of the scene in the image?" | "102506" | "1" | null |
"C" | "Cannot be determined from the information given" | "The hand is on the face" | "The hand and face are far from each other" | "The hand and face are close to each other" | "901352_2618970907" | "image" | "What is the relative position of the woman's hand and face in the image?" | "42462" | "6" | null |
"A" | "In front of the house" | "Behind the house" | "Next to the house" | "Inside the house" | "277086_1754448691" | "image" | "Where is the small pond located in the image?" | "65681" | "4" | null |
"A" | "Two" | "One" | "Three" | "Four" | "2166566_1513871626" | "image" | "How many women are in the image?" | "87148" | "5" | null |
"A" | "Orange" | "Black" | "Gray" | "White" | "690892_1610366862" | "image" | "What is the dominant color of the sky in the image?" | "59207" | "1" | null |
"C" | "On the left side" | "In the middle" | "Cannot be determined" | "On the right side" | "3127247_745447613" | "image" | "What is the relative position of the person sitting on the sidelines compared to the playing field?" | "99150" | "6" | null |
"D" | "Jumping in the air" | "Sitting down" | "Lying on the ground" | "Standing up" | "3235530_2174531512" | "image" | "What is the position of the man with the red hair?" | "42355" | "4" | null |
"C" | "On the left side of the building" | "On the right side of the building" | "In the middle of the building" | "Can't be determined from the given information" | "490524_2218221104" | "image" | "Where is the entrance located in relation to the building?" | "14531" | "6" | null |
"A" | "A basketball court" | "A white bench" | "A metal structure" | "Green trees" | "3129229_3459923882" | "image" | "Which of the following is a dominant feature in the scene?" | "103003" | "3" | null |
"B" | "One" | "Three" | "Two" | "Four" | "254353_1159215928" | "image" | "How many rooms are visible in the image?" | "53008" | "5" | null |
"A" | "Standing" | "Sitting" | "Bending" | "Jumping" | "555270_3546494036" | "image" | "What is the posture of the person in the bottom left corner of the image?" | "5950" | "3" | null |
"D" | "Singing" | "Playing the guitar" | "Dancing" | "Jumping" | "267199_348644650" | "image" | "What is the person doing who is the main focus of the image?" | "53376" | "3" | null |
SEED-Bench Card
Benchmark details
Benchmark type: SEED-Bench is a large-scale benchmark to evaluate Multimodal Large Language Models (MLLMs). It consists of 19K multiple choice questions with accurate human annotations, which covers 12 evaluation dimensions including the comprehension of both the image and video modality.
Benchmark date: SEED-Bench was collected in July 2023.
Paper or resources for more information: https://github.com/AILab-CVC/SEED-Bench
License: Attribution-NonCommercial 4.0 International. It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use.
For the images of SEED-Bench, we use the data from Conceptual Captions Dataset (https://ai.google.com/research/ConceptualCaptions/) following its license (https://github.com/google-research-datasets/conceptual-captions/blob/master/LICENSE). Tencent does not hold the copyright for these images and the copyright belongs to the original owner of Conceptual Captions Dataset.
For the videos of SEED-Bench, we use tha data from Something-Something v2 (https://developer.qualcomm.com/software/ai-datasets/something-something), Epic-kitchen 100 (https://epic-kitchens.github.io/2023) and Breakfast (https://serre-lab.clps.brown.edu/resource/breakfast-actions-dataset/). We only provide the video name. Please download them in their official websites.
Where to send questions or comments about the benchmark: https://github.com/AILab-CVC/SEED-Bench/issues
Intended use
Primary intended uses: The primary use of SEED-Bench is evaluate Multimodal Large Language Models on spatial and temporal understanding.
Primary intended users: The primary intended users of the Benchmark are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.
- Downloads last month
- 37