BABEL

BABEL data

We split all sequences in BABEL into a training (60%), validation (20%) and test set (20%).

Download BABEL here.

The folder contains 5 files: train.json, val.json, test.json, extra_train.json and extra_val.json.

Dense annotations

Files: train.json, val.json and test.json.

In these files, all sequences containing multiple actions are annotated with one set of frame labels. For sequences that only contain a sequence label, we assume that the action spans the entire duration of the sequence.

Each motion sequence in these files contains:

One sequence label.
Zero or one set of frame labels. Frame labels are usually only provided if the sequence contains more than 1 action.

The results in the current version of the paper, and the released checkpoints correspond to these dense labels -- we train on train.json, perform validation on val.json, and test on test.json.

This set provides dense labels, i.e., all actions in the sequence are labeled, in about ~ 37.5 hours of mocap.

Extra annotations

Files: extra_train.json and extra_val.json.

Apart from the labels we describe above, we collected additional annotations over the course of our project. We also make this data (corresponding to training and validation sequences) publicly available, in the hope that it will be useful.

Further, for some sequences, a few annotators consider there to be a single action, but the majority disagree. If there are no frame label annotations for these sequences, i.e., all actions in these sequences are not (densely) labeled, we include them here separately as "extra" annotations.

Each motion sequence in these files contains:

Zero or more sequence labels.
Zero or more sets of frame labels.

Overall, the sequences with dense annotations and extra annotations corresond to annotations for ~ 42 hours of mocap.

We do not yet publicly release extra_test.json (~1.5 hours of mocap), as we intend to use this as a held-out test set in the future.

NOTE: If you use the annotations from extra_train.json or extra_val.json for training or validation, please mention this clearly in your report or paper.

As an added measure to prevent unintended mixup of data, the data dictionary key with extra annotations are named seq_anns and frame_anns (as opposed to seq_ann and frame_ann in dense annotations).

Starter Code

We provide some starter code in our Github repo to: load the BABEL dataset, visualize rendered videos of mocap sequences, visualize action labels, compute simple stats. from BABEL, and retrieve mocap. sequences that contain a specific action.

Data format:

Example annotation:

{"1833": {
    "babel_sid": 1833,
    "url": "https://babel-renders.s3.eu-central-1.amazonaws.com/001833.mp4",
    "feat_p": "BMLmovi/BMLmovi/Subject_55_F_MoSh/Subject_55_F_21_poses.npz",
    "dur": 8.3,
    "seq_ann": {
      "babel_lid": "8b112ce1-2546-4675-9efa-27db9fcfee29",
      "anntr_id": "c24c8ba1-f43d-48db-9f76-165d8264bda1",
      "mul_act": true,
      "labels": [
        {
          "raw_label": "walk back and forth",
          "proc_label": "walk back and forth",
          "seg_id": "3ca74412-f765-445b-b51b-6c2e72316010",
          "act_cat": [
            "walk",
            "forward movement",
            "backwards movement"
          ]
        }
      ]
    },
    "frame_ann": {
      "babel_lid": "aabf3b2d-c8c7-47ac-aa75-4621d262434e",
      "anntr_id": "c0974846-d198-425d-846a-ce6808343841",
      "mul_act": true,
      "labels": [
        {
          "raw_label": "standing",
          "proc_label": "stand",
          "seg_id": "841150c3-030d-4369-a2e1-cf460f4a988a",
          "act_cat": [
            "stand"
          ],
          "start_t": 0,
          "end_t": 0.341
        },
        {
          "raw_label": "standing",
          "proc_label": "stand",
          "seg_id": "947c777d-ac8f-43f1-b2bf-08578293832e",
          "act_cat": [
            "stand"
          ],
          "start_t": 7.862,
          "end_t": 8.3
        },
        {
          "raw_label": "pacing",
          "proc_label": "pace",
          "seg_id": "59af352a-0f5e-446e-9690-18e5d1d9ebaf",
          "act_cat": [
            "walk"
          ],
          "start_t": 0.341,
          "end_t": 7.862
        }
      ]
    }
  },
  "9495": { ... }
}

`babel_sid`	Uniquely identifies an AMASS mocap sequence.
`feat_p`	Path to motion feature (from AMASS dataset).
`url`	URL to a 2D rendering of the mocap sequence.
`dur`	Duration of the mocap sequence in seconds.
`seq_ann`	Sequence label that describes the action(s) in the overall sequence.
`frame_ann`	Frame labels that precisely mark the start and end of all actions in the sequence.
`babel_lid`	Uniquely identifies every sequence label or frame label.
`anntr_id`	Uniquely identifies every annotator in BABEL.
`labels`	List of all action "segments" in the sequence.
`seg_id`	Uniquely identifies each action segment annotation (action label, start, end) in BABEL.
`raw_label`	Action label string from the annotator (without any pre-processing).
`proc_label`	Processed action label (after stemming).
`act_cat`	List of action categories that are associated with the processed action label. Note that this may be `None` for some annotations for which the manual assignment of `raw_label` --> `act_cat` does not exist.
`start_t`	Time-stamp denoting start of action (in seconds). For a sequence label, `start_t=0` (start of sequence).
`end_t`	Time-stamp denoting end of action (in seconds). For a sequence label, `end_t` is the duration of the sequence.

Selected Qualitative Examples

Dense labeling of the actions in sequences at the frame-level is important. The below sequence is labeled: "Picking up object". However, note that only 20% of the duration of movement corresponds to the labeled action.

BABEL contains diverse movements from AMASS. This challenging motion sequence has many diverse, and ambiguous actions.