Process

This page explains how to use the data processing functionalities in the usv-playpen GUI.

In order to run any of the functions detailed below, select an experimenter name from the dropdown menu and click the Process button on the GUI main display:

Processing Step 0

Clicking the Process button will open a new window with all the processing functionalities (see below):

Processing Step 0b

All the main functions are outlined in orange, and black fields are function-specific options tunable by the user in the GUI. It is important to note that these are not necessarily all the options the user can set, and the full list of options can be found under each function in the /usv-playpen/_parameter_settings/processing_settings.json file. Each time the user clicks the Next button in the window above, processing_settings.json is modified to the newest input configuration.

It is relevant to note here, that just like in the Record section, you have the capability to Notify e-mail(s) of PC usage. This is useful if you are running a long processing job and want to be notified when it is finished. The e-mails about start and end of jobs will be sent to the addresses listed in the Notify e-mail(s) of PC usage field (no space after comma for multiple e-mails), and it requires you to choose what particular PC you are using for this job. Since the e-mails are sent from a Google account, the first e-mail you receive may end up in the Spam folder, so make sure to check that:

Processing Step e-mail

The Root directories field enables you to list the directories containing the data you want to process. Each root directory should be in its own row; for example, three sessions should be listed as follows:

/mnt/falkner/Bartul/Data/20250430_145017
/mnt/falkner/Bartul/Data/20250430_165730
/mnt/falkner/Bartul/Data/20250430_182145

Certain processing functions take all root sessions together when operating on data, and others process each session separately. Additionally, in both of these categories, there is a specific order in executing individual functions.

For combined processing, the order of processing steps is as follows:

  1. Concatenate e-phys files

  2. Split clusters to sessions

  3. Prepare SLEAP cluster job

On the other hand, for processing sessions separately, the order of processing steps is as follows:

  1. Run video concatenation

  2. Run video re-encoding

  3. Convert to single-ch files

  4. Crop AUDIO (to VIDEO)

  5. Run A/V sync check

  6. Run E/V sync check

  7. Run HPSS

  8. Filter audio files

  9. Concatenate to MEMMAP

  10. Run SLP-H5 conversion

  11. Run AP calibration

  12. Run AP triangulation

  13. Re-coordinate

  14. Run DAS inference

  15. Curate DAS outputs

  16. Prepare USV assignment

  17. Run USV assignment

If you recorded a session with audio, e-phys and video data (imaginary example: 20250430_145017) and a calibration session (20250430_141750), the directory and file structure should look as follows:

/mnt/falkner/Bartul/Data/:
├── 20250430_145017
│   ├── 20250430_145017_metadata.yaml
│   ├── audio
│   │   ├── original (empty)
│   │   ├── original_mc
│   │       ├── m_250430145009.wav
│   │           ...
│   │   ├── audio_triggerbox_sync_info.json
│   ├── ephys
│   │   ├── imec0
│   │   │   ├── 20250430_145017.imec0.ap.bin
│   │   │   ├── 20250430_145017.imec0.ap.meta
│   │   ├── imec1
│   │       ├── 20250430_145017.imec1.ap.bin
│   │       ├── 20250430_145017.imec1.ap.meta
│   ├── sync
│   │   ├── CoolTerm Capture (coolterm_config.stc) 2024-04-30-14-50-14-236.txt
│   │   ├── 20250430_rec4_g0_t0.nidq.bin
│   │   ├── 20250430_rec4_g0_t0.nidq.meta
│   │
│   └── video
│       ├── 20250430_145027.21241563
│           ├── 000000.mp4
│           ├── 000000.npz
│           ├── 000001.mp4
│           ├── 000001.npz
│           ├── metadata.yaml
│       ...
│
├── 20250430_141750
│    ├── sync
│    │   ...
│    ├── video
│        ├── 20250430_141750.21241563
│        │   ...
│        ├── 20250430141750
│        │   ├── video
│        │   │   ├── 21241563
│        │   │   ...
│        │   │   ├── 20250430141750_calibration.metadata.h5
│        │   │   ├── 20250430141750_calibration.toml
│        │   │   ├── 20250430141750_reprojection_histogram.png
│        │   │   ...
│        ├── calibration_20250430_141321.21241563
│        │   ...

E-PHYS Processing

The processing of e-phys data passes several stages:

  1. Check e-phys data is synchronized with video

  2. Concatenate e-phys files of individual sessions for joint spike sorting

  3. Conduct spike sorting with Kilosort4 (not implemented in usv-playpen)

  4. Manually curate sorting outputs in Phy (not implemented in usv-playpen)

  5. Split cluster spikes back to individual sessions

  6. Trace probe tracks in Allen atlas coordinates with brainreg and brainglobe-segmentation to determine what brain regions individual channels were recorded from using iblapps (not implemented in usv-playpen)

  7. Compute unit quality metrics and categorize units with SpikeInterface (not implemented in usv-playpen)

Run E/V sync check

To run the e-phys/video synchronization check, you need to list the root directories of interest, select Run E/V sync check, click Next and then Process:

Processing Step 1

Neural recording data is aligned to the start of video recording, which is identifiable by searching for a ~2.3 s break in Loopbio Triggerbox pulses, which are constantly being transmitted to the Neuropixels digital input channel. The code recursively finds all the ap.bin files in the root directory and saves the digital input channel data (385th or last channel) to a separate Numpy file (which ends with _sync_ch_data.npy), if it hasn’t been saved already. After finding the tracking start and end (based on the largest Triggerbox break duration and total number of recording frames) in this Numpy file. The total video duration will then be compared to the total video-aligned neural recording, and you will get a report back whether that discrepancy is below 12 ms (in other words, less than 2 video frames, which is an acceptable level of distortion). Information at what Neuropixels sample the first and last video recording frame were detected will be saved to, for instance. /mnt/falkner/Bartul/EPHYS/20250430_imec0/changepoints_info_20250430_imec0.json, as exemplified below:

/mnt/falkner/Bartul/Data/:
├── 20250430_145017
│   ├── 20250430_145017_metadata.yaml
│   ├── audio
│   │   ...
│   ├── ephys
│   │   ├── imec0
│   │   │   ├── 20250430_145017.imec0.ap.bin
│   │   │   ├── 20250430_145017.imec0.ap.meta
│   │   │   ├── 20250430_145017_imec0_sync_ch_data.npy
│   │   ├── imec1
│   │       ├── 20250430_145017.imec1.ap.bin
│   │       ├── 20250430_145017.imec1.ap.meta
│   │       ├── 20250430_145017_imec1_sync_ch_data.npy
│   ├── sync
│   │   ...
│   │
│   └── video
│       ...
/mnt/falkner/Bartul/EPHYS:
├── 20250430_imec0
│   ├── changepoints_info_20250430_imec0.json
├── 20250430_imec1
│   ├── changepoints_info_20250430_imec1.json

In the changepoints JSON file, the E/V sync check process will save the tracking_start_end and largest_camera_break_duration values, and the latter, when divided with the Neuropixels sampling rate (should be ~30 kHz), should not be smaller than ~2.3 s.

"20250430_145017.imec0": {
    "session_start_end": [
        0,
        37825731
    ],
    "tracking_start_end": [
        850469,
        36867993
    ],
    "largest_camera_break_duration": 69341,
    "file_duration_samples": 37825731,
    "root_directory": "F:\Bartul\Data\20250430_145017",
    "total_num_channels": 385,
    "headstage_sn": "23280196",
    "imec_probe_sn": "22420015064"
}

The /usv-playpen/_parameter_settings/process_settings.json file also contains a section not modifiable in the GUI itself, but it can be modified manually:

  • npx_file_type : Neuropixels 1.0 had “lf” and “ap” files, this field allows you to switch between them

  • npx_ms_divergence_tolerance : the maximum allowed difference between the video and e-phys recording duration in milliseconds; the default value is 12 ms but it can be tuned to whatever the user thinks is appropriate.

"validate_ephys_video_sync": {
        "npx_file_type": "ap",
        "npx_ms_divergence_tolerance": 12.0
}

Concatenate e-phys files

To run the concatenation of e-phys files (ap.bin), you need to list all the root directories of interest in order you want them to be concatenated, select Concatenate e-phys files, click Next and then Process:

Processing Step 2

The code will find all the ap.bin files for each probe and conduct the concatenation to save the files in the EPHYS directory with the concatenated_ prefix:

/mnt/falkner/Bartul/Data/:
├── 20250430_145017
│   ├── 20250430_145017_metadata.yaml
│   ├── audio
│   │   ...
│   ├── ephys
│   │   ├── imec0
│   │   │   ├── 20250430_145017.imec0.ap.bin
│   │   │   ├── 20250430_145017.imec0.ap.meta
│   │   │   ├── 20250430_145017_imec0_sync_ch_data.npy
│   │   ├── imec1
│   │       ├── 20250430_145017.imec1.ap.bin
│   │       ├── 20250430_145017.imec1.ap.meta
│   │       ├── 20250430_145017_imec1_sync_ch_data.npy
│   ├── sync
│   │   ...
│   │
│   └── video
│       ...
/mnt/falkner/Bartul/EPHYS:
├── 20250430_imec0
│   ├── changepoints_info_20250430_imec0.json
│   ├── concatenated_20250430_imec0.ap.bin
├── 20250430_imec1
│   ├── changepoints_info_20250430_imec1.json
│   ├── concatenated_20250430_imec1.ap.bin

In the changepoints JSON file, the concatenation process will modify all other lines than the ones described above for E/V sync.

"20250430_145017.imec0": {
    "session_start_end": [
        0,
        37825731
    ],
    "tracking_start_end": [
        850469,
        36867993
    ],
    "largest_camera_break_duration": 69341,
    "file_duration_samples": 37825731,
    "root_directory": "F:\Bartul\Data\20250430_145017",
    "total_num_channels": 385,
    "headstage_sn": "23280196",
    "imec_probe_sn": "22420015064"
}

Split clusters to sessions

After spike sorting and post-sorting curation are complete, you can split the spikes of individual clusters back to the original sessions. To do this, even if you recorded multiple sessions in one day, it is sufficient to put only one root directory for that day, e.g., the first one. The script will find EPHYS root directory, and split spikes from all probes into sessions based on the inputs in the changepoints JSON file. Select Split clusters to sessions, click Next and then Process:

Processing Step 3

The code will create a cluster_data subdirectory in each session’s ephys/imec directory and populate it with Numpy files containing spike times in the shape of (2, number_of_spikes), where the first row contains spike times in seconds relative to start of tracking and the second row spike times according to what tracking frame they occurred in. Each cluster is named in the following format: probeID_clusterNumber_channelID_clusterType.npy.

├── 20250430_145017
│   ├── 20250430_145017_metadata.yaml
│   ├── audio
│   │   ...
│   ├── ephys
│   │   ├── imec0
│   │   │   ├── 20250430_145017.imec0.ap.bin
│   │   │   ├── 20250430_145017.imec0.ap.meta
│   │   │   ├── 20250430_145017_imec0_sync_ch_data.npy
│   │   │   ├── cluster_data
│   │   │   │   ├── imec0_cl0000_ch361_good.npy
│   │   │   │       ...
│   │   ├── imec1
│   │       ├── 20250430_145017.imec1.ap.bin
│   │       ├── 20250430_145017.imec1.ap.meta
│   │       ├── 20250430_145017_imec1_sync_ch_data.npy
│   │       ├── cluster_data
│   │       │   ├── imec1_cl0000_ch361_good.npy
│   │       │       ...
│   ├── sync
│   │   ...
│   │
│   └── video
│       ...

The /usv-playpen/_parameter_settings/process_settings.json file also contains a section partially modifiable in the GUI, but it can entirely be modified manually:

  • min_spike_num : eliminate clusters with fewer spikes than this (set 0 if you want to keep all)

  • kilosort_version : Kilosort version in use

"get_spike_times": {
    "min_spike_num": 100,
    "kilosort_version": "4"
  },

Video Processing

The processing of video data passes multiple stages:

  1. Video concatenation and re-encoding (runs locally <20 min)

  2. SLEAP inference (runs on cluster)

  3. SLEAP proofreading (bottleneck step, requires extensive human curation)

  4. SLP-H5 conversion (runs locally <1 min)

  5. SLEAP-Anipose triangulation (runs locally <40 min)

  6. Translate, rotate and scale SLEAP coordinates to metric units (runs locally <1 min)

Video concatenation and re-encoding

Before running this section, it is always a good idea to check that video files were copied to the file server correctly. These steps can be run separately (still in sequence, though), but for the sake of simplicity, they will be described jointly. To run video concatenation and re-encoding, you need to list the root directories of interest, select Run video concatenation and Run video re-encoding, click Next and then Process:

Processing Step 4

The re-encoding step will also result in the creation of the camera_frame_count_dict.json file, which contains numbers of frames for each camera in the session, as well as the total number of frames and video time for the camera with the least number of frames. The file will be saved in the video subdirectory of each session, and it will look like this:

{
"21241563": [
    180002,
    150.057
],
"21369048": [
    180000,
    150.057
],
"21372315": [
    180001,
    150.057
],
"21372316": [
    180001,
    150.056
],
"22085397": [
    180002,
    150.057
],
"total_frame_number_least": 180000,
"total_video_time_least": 1199.5477764606476,
"median_empirical_camera_sr": 150.057
}

These steps change videos and video directory structure from the native Loopbio format to one that is compatible with SLEAP-Anipose. Both rely on the usage of ffmpeg . After the steps are complete, the directory structure and file names should look as follows (displaying only one camera directory for brevity):

├── 20250430_145017
│   ├── 20250430_145017_metadata.yaml
│   ├── audio
│   │   ...
│   ├── ephys
│   │   ...
│   ├── sync
│   │   ...
│   │
│   └── video
│       ├── 20250430_145027.21241563
│       ...
│       ├── 20250430145035_camera_frame_count_dict.json
│       ├── 20250430145035
│       │    ├── 21241563
│       │    │   ├── calibration_images
│       │    │   ├── 21241563-20250430145035.mp4
│       ...

The /usv-playpen/_parameter_settings/process_settings.json file also contains a section partially modifiable in the GUI, but it can entirely be modified manually:

  • camera_serial_num : serial numbers of cameras used in the recording

  • video_extension : video type (usually “mp4”)

  • concatenated_video_name : name of the concatenated video file

  • conversion_target_file : name of the concatenated video file as target for re-encoding

  • constant_rate_factor : FFMPEG constant rate factor for re-encoding

  • encoding_preset : FFMPEG encoding preset for re-encoding

  • delete_old_file : whether to delete the concatenated file after re-encoding

"concatenate_video_files": {
    "camera_serial_num": [
      "21372315",
      "21372316",
      "21369048",
      "22085397",
      "21241563"
    ],
    "video_extension": "mp4",
    "concatenated_video_name": "concatenated_temp"
  },
  "rectify_video_fps": {
    "camera_serial_num": [
      "21372315",
      "21372316",
      "21369048",
      "22085397",
      "21241563"
    ],
    "conversion_target_file": "concatenated_temp",
    "video_extension": "mp4",
    "constant_rate_factor": 16,
    "encoding_preset": "veryfast",
    "delete_old_file": true
  }

Prepare SLEAP cluster job

The usv-playpen GUI assumes usage of SLEAP for animal pose tracking. To do this, one first needs to train one or multiple models on the data of interest (i.e., social interactions). Explaining how to do this is beyond the scope of this text, so we will assume you already have a top-down centroid and centered instance model ready for running inference.

Since the average office PC does not necessarily have GPU-capabilities, it is advised to run SLEAP inference on a high-performance computing cluster, as these usually have GPU-capabilities and allow for the parallelization of the inference process. The usv-playpen GUI helps you prepare the SLEAP cluster job, but you will need to run the job on the cluster yourself.

The preparation consists of creating a job_list.txt file which contains the paths to the video files and the model(s) to be used for inference. The job list can then be used by a shell script, such as the one in /usv-playpen/other/cluster/SLEAP/sleap.inference_global.sh to execute inference on all video files of interest.

To run the SLEAP cluster job preparation, you need to list the root directories of interest (which will search for all videos recorded in those sessions), select the SLEAP conda environment name used on the cluster, select directories of centroid and centered instance models, select the output inference directory, select Prepare SLEAP cluster job, click Next and finally Process:

Processing Step 5

This shouldn’t take longer than several seconds - it will create/update the job_list.txt file in, for example, /mnt/falkner/Bartul/SLEAP/inference directory:

/mnt/falkner/Bartul/SLEAP/inference:
├── job_list.txt
│   ...

The /usv-playpen/_parameter_settings/process_settings.json file contains a section partially modifiable in the GUI, but it can entirely be modified manually:

  • camera_names : camera serial numbers used in the recording

  • inference_root_dir : directory where the inference job list will be saved

  • centroid_model_path : path to the SLEAP centroid model

  • centered_instance_model_path : path to the SLEAP centered instance model

 "prepare_cluster_job": {
  "camera_names": [
    "21372315",
    "21372316",
    "21369048",
    "22085397",
    "21241563"
  ],
  "inference_root_dir": "/mnt/falkner/Bartul/SLEAP/inference",
  "centroid_model_path": "",
  "centered_instance_model_path": ""
}

SLEAP inference and proofreading

The SLEAP inference and proofreading steps are not implemented in the usv-playpen GUI. However, you can run the inference job on the cluster using the shell script mentioned above. The proofreading step is done in the SLEAP GUI, where it is crucial to correct identity switches and to keep the same animal identities across different video views. By current convention, that means the male mouse is always assigned identity 0, and the female mouse is always assigned identity 1.

Run SLP-H5 conversion

After proofreading, you convert SLP to H5 files, which is the format SLEAP-Anipose operates on (usv-playpen runs this in parallel for all views). To do this, you need to list the root directories of interest, select Run SLP-H5 conversion, click Next and then Process (NB: using the SLEAP uvx functionality, it is no longer necessary to install SLEAP to run this step):

Processing Step 6

This step shouldn’t take longer than two minutes to run; the directory structure and file names should look as follows (displaying only one camera directory for brevity):

├── 20250430_145017
│   ├── 20250430_145017_metadata.yaml
│   ├── audio
│   │   ...
│   ├── ephys
│   │   ...
│   ├── sync
│   │   ...
│   │
│   └── video
│       ├── 20250430_145027.21241563
│       ...
│       ├── 20250430145035_camera_frame_count_dict.json
│       ├── 20250430145035
│       │    ├── 21241563
│       │    │   ├── calibration_images
│       │    │   ├── 21241563-20250430145035.h5
│       │    │   ├── 21241563-20250430145035.mp4
│       │    │   ├── 21241563-20250430145035.slp
│       ...

Run AP triangulation & Re-coordinate

Once SLP files are converted to H5, you are ready to run triangulation. Triangulation is the process of estimating the 3D coordinates of the tracked items based on the 2D coordinates from multiple camera views.

SLEAP-Anipose triangulation can be run to obtain 3D arena points, or 3D animal points.

3D arena points It was previously explained how to record a calibration session, and in that session you recorded a 1-minute video of the arena with visible microphones and IR-reflective markers in its corners. All the video views of this recording can be loaded into the SLEAP GUI, and only on the first frame of each view, you label the 24 microphones and 4 corners with a 28-node skeleton that can be found in /usv-playpen/_config/playpen_skeleton.json. You label the microphones with the corresponding channel number, and corners with N, E, S and W, according to the following schematic:

Arena and microphones

After labeling the first frame on each view, you can export the data as H5 files going to File > Export Analysis HDF5. You are now ready to run arena triangulation.

To do this, you need to list the root directories of interest, select the same root directory under Tracking calibration / arena root directory, select Run AP triangulation and Re-coordinate, select Triangulate arena nodes, select “arena” for Save transformation type and choose “No” for Delete original .h5. Finally, click Next and then Process:

Processing Step 7

This shouldn’t take longer than one minute; the directory structure and file names should look as follows (note that you keep both the original and translated_rotated_metric H5 files!):

├── 20250430_145017
│   ...
│
├── 20250430_141750
│    ├── sync
│    │   ...
│    ├── video
│        ├── 20250430_141750.21241563
│        │   ...
│        ├── 20250430141750
│        │   ├── 20250430141750_points3d.h5
│        │   ├── 20250430141750_points3d_translated_rotated_metric.h5
│        │   ...
│        ├── calibration_20250430_141321.21241563
│        │   ...

3D animal points To triangulate animal points, you need to list the root directories of interest, list their respective experimental codes, select the directory with the triangulated arena file, select Run AP triangulation and Re-coordinate, select “animal” for Save transformation type and choose “Yes” for Delete original .h5. Finally, click Next and then Process (a progress bar in the terminal will update you on the status of the process):

Processing Step 8

The process results in the creation of an H5 file which ends in _points3d_translated_rotated_metric.h5, and can be found as shown below:

├── 20250430_145017
│   ├── 20250430_145017_metadata.yaml
│   ├── audio
│   │   ...
│   ├── ephys
│   │   ...
│   ├── sync
│   │   ...
│   │
│   └── video
│       ├── 20250430_145027.21241563
│       ...
│       ├── 20250430145035_camera_frame_count_dict.json
│       ├── 20250430145035
│       │    ├── 20250430145035_points3d_translated_rotated_metric.h5
│       ...

The /usv-playpen/_parameter_settings/process_settings.json file also contains a section partially modifiable in the GUI, but it can entirely be modified manually:

  • calibration_file_loc : directory containing the _calibration.toml file relevant for the session

  • triangulate_arena_points_bool : whether to triangulate arena or animal tracked nodes

  • frame_restriction : range of frames to be triangulated; empty finds the least number of frames across all cameras and triangulates those

  • excluded_views : list of camera serial numbers to be excluded from triangulation

  • display_progress_bool : whether to display the progress bar in the terminal during execution

  • ransac_bool : whether to use RANSAC for triangulation

  • rigid_body_constraints : list of rigid body constraints to be used for triangulation

  • weak_body_constraints : list of weak body constraints to be used for triangulation

  • smooth_scale : scale of the smoothing kernel

  • weight_weak : weight of the weak body constraints

  • weight_rigid : weight of the rigid body constraints

  • reprojection_error_threshold : threshold for reprojection error in pixels

  • regularization_function : regularization function to be used for triangulation

  • n_deriv_smooth : number of derivatives to be used for smoothing

  • original_arena_file_loc : directory containing the original arena 3D file

  • save_transformed_data : whether to save the transformed data as “animal” or “arena”

  • delete_original_h5 : whether to delete the original H5 file

  • static_reference_len : length of the static reference in meters, defaults to distance between two outer rail edges of two arena corners

  • experimental_codes : list of experimental codes associated with each session

"conduct_anipose_triangulation": {
  "calibration_file_loc": "",
  "triangulate_arena_points_bool": false,
  "frame_restriction": null,
  "excluded_views": [],
  "display_progress_bool": true,
  "ransac_bool": false,
  "rigid_body_constraints": [],
  "weak_body_constraints": [],
  "smooth_scale": 4,
  "weight_weak": 4,
  "weight_rigid": 1,
  "reprojection_error_threshold": 5,
  "regularization_function": "l2",
  "n_deriv_smooth": 1
},
"translate_rotate_metric": {
  "original_arena_file_loc": "",
  "save_transformed_data": "animal",
  "delete_original_h5": true,
  "static_reference_len": 0.615,
  "experimental_codes": []
}

The experimental codes are used to identify the session and the type of experiment conducted. The decoding sheet can be found below:

A - ablation
E - ephys
H - chemogenetics
O - optogenetics
P - playback
B - behavior
V - devocalization
U - urine/bedding

Q - alone
C - courtship
X - females
Y - males

L - light
D - dark

1,2,3 ... - number of animals

F - female
M - male

S - single
G - group

p - proestrus
e - estrus
m - matestrus
d - diestrus

Audio Processing

The processing of audio data passes multiple stages:

  1. Split audio to single files and crop to video duration (runs locally <15 min)

  2. De-noise audio data with harmonic-percussive source separation (runs locally or on cluster)

  3. Band-pass filter audio files (runs locally <15 min)

  4. Concatenate all audio files to single MEMMAP file (runs locally <15 min)

  5. Run DAS inference (runs on cluster)

  6. Curate DAS outputs (runs locally <2 min)

  7. Prepare USV assignment (runs locally <1 min)

  8. Run USV assignment (runs locally <5 min)

Convert to single-channel and crop to video

Before running this section, it is always a good idea to check that audio files were copied to the file server corr1erectly. These steps can be run separately (still in sequence, though), but for the sake of simplicity, they will be described jointly. To run these steps together, you need to list the root directories of interest, select Convert to single-ch files and Crop AUDIO (to VIDEO), click Next and then Process:

Processing Step 9

If you used used the SYNC recording mode (usghflags: 1574), the Trgbox-USGH device(s) needs to be set to m. If you, however, used the NO SYNC recording mode (usghflags: 1862), the Trgbox-USGH device(s) needs to be set to both:

Processing Step 9b

The Convert to single-ch files step populates the original directory with single channel files of the entire recording. The Crop AUDIO (to VIDEO) step will crop the audio files to the video duration, and save them in the cropped_to_video subdirectory. Both steps require the usage of sox. In the last step, the original directory will be deleted; reduced to one channel below for brevity:

├── 20250430_145017
│   ├── 20250430_145017_metadata.yaml
│   ├── audio
│   │   ├── cropped_to_video
│   │   │   ├── m_250430145009_ch01_cropped_to_video.wav
│   │   │       ...
│   │   ├── original_mc
│   │   │   ├── m_250430145009.wav
│   │   │       ...
│   │   ├── audio_triggerbox_sync_info.json
│   ├── ephys
│   │   ...
│   ├── sync
│   │   ├── m_video_frames_in_audio_samples
│   │   ├── s_video_frames_in_audio_samples
│   │   ├── nidq_ipi_data.npy
│   └── video
│       ...

The Crop AUDIO (to VIDEO) step will also result in the creation of a audio_triggerbox_sync_info.json file, which contains the sample number of first and last recorded video frame and the break duration detected prior to recording. It will also contain information about the total duration of the audio recording and its discrepancy with the duration of the video recording. In the sync subdirectory, the m_video_frames_in_audio_samples and s_video_frames_in_audio_samples files will be created, which contain the sample numbers of video frame starts in the audio recording. These files are useful should troubleshooting sync issues arise. In case, NIDQ data was recorded, the nidq_ipi_data.npy file will be created, which contains the IPI durations (first row) and IPI start samples (second row).

{
    "m": {
        "start_first_recorded_frame": 2654037,
        "end_last_recorded_frame": 302539204,
        "largest_break_duration": 578805,
        "duration_samples": 299885168,
        "duration_seconds": 1199.5407,
        "audio_tracking_diff_seconds": -0.0071,
        "num_dropouts": 0
    }
}

The /usv-playpen/_parameter_settings/process_settings.json file contains a section fully modifiable in the GUI, with the following parameters:

  • device_receiving_input : USGH device receiving Loopbio Triggerbox input (if using SYNC mode, this should be “m”)

  • ch_receiving_input : microphone channel receiving Loopbio Triggerbox input

"crop_wav_files_to_video": {
    "device_receiving_input": "m",
    "ch_receiving_input": 4
  }

Run HPSS

You have the option to denoise audio data using harmonic-percussive source separation (implemented with librosa). You can find materials that allow you to run this analysis on the cluster in: /usv-playpen/other/cluster/HPSS. Alternatively, to run HPSS locally, you need to list the root directories of interest, select Run HPSS, click Next and then Process:

Processing Step 10

Below, you can see an example of an audio segment with mouse vocalizations before and after such denoising.

HPSS example

The Run HPSS step populates the hpss directory with de-noised single channel files of the entire recording (reduced to one channel for brevity):

├── 20250430_145017
│   ├── 20250430_145017_metadata.yaml
│   ├── audio
│   │   ├── cropped_to_video
│   │   │   ...
│   │   ├── hpss
│   │   │   ├── m_250430145009_ch01_cropped_to_video_hpss.wav
│   │   ├── original_mc
│   │   │   ...
│   │   ├── audio_triggerbox_sync_info.json
│   ├── ephys
│   │   ...
│   ├── sync
│   │   ...
│   └── video
│       ...

The /usv-playpen/_parameter_settings/process_settings.json file contains a section fully modifiable in the GUI, with the following parameters:

  • stft_window_length_hop_size : STFT window length and hop size

  • kernel_size : harmonic-percussive source separation kernel size

  • hpss_power : harmonic-percussive source separation power

  • margin : margin for harmonic-percussive source separation

"hpss_audio": {
    "stft_window_length_hop_size": [
      512,
      128
    ],
    "kernel_size": [
      5,
      60
    ],
    "hpss_power": 4.0,
    "margin": [
      4,
      1
    ]
}

Filter and concatenate to MEMMAP

These steps can be run separately (still in sequence, though), but for the sake of simplicity, they will be described jointly. To run these steps together, you need to list the root directories of interest, select Filter audio files and Concatenate to MEMMAP, click Next and then Process:

Processing Step 11

The purpose of these two functions is to first high-pass filter each audio file (removing all lower frequencies) and then concatenate all channels into one memory-mapped file. The first step requires the usage of sox. These processing steps populate the hpss_filtered directory with de-noised, high-pass filtered single channel files of the entire recording (reduced to one channel for brevity):

├── 20250430_145017
│   ├── 20250430_145017_metadata.yaml
│   ├── audio
│   │   ├── cropped_to_video
│   │   │   ...
│   │   ├── hpss
│   │   │   ...
│   │   ├── hpss_filtered
│   │   │   ├── 250430145009_concatenated_audio_hpss_filtered_250000_299885168_24_int16.mmap
│   │   │   ├── m_250430145009_ch01_cropped_to_video_hpss_filtered.wav
│   │   │   ...
│   │   ├── original_mc
│   │   │   ...
│   │   ├── audio_triggerbox_sync_info.json
│   ├── ephys
│   │   ...
│   ├── sync
│   │   ...
│   └── video
│       ...

The /usv-playpen/_parameter_settings/process_settings.json file contains a section fully modifiable in the GUI, with the following parameters:

  • audio_format : audio file format (usually “wav”)

  • filter_dirs : list of directories to be filtered (usually “hpss”)

  • filter_freq_bounds : frequency bounds for filtering (usually [0, 30000])

"filter_audio_files": {
    "audio_format": "wav",
    "filter_dirs": [
      "hpss"
    ],
    "filter_freq_bounds": [
      0,
      30000
    ]
}

Run DAS inference

The usv-playpen GUI assumes usage of DAS for identifying vocalizations in audio recordings. To do this, one first needs to train a model on the data of interest (i.e., social interactions with vocal output). Explaining how to do this is beyond the scope of this text, so we will assume you already have a model ready for running inference.

Since the average office PC does not necessarily have GPU-capabilities, it is advised to run DAS inference on a high-performance computing cluster, as these usually have GPU-capabilities and allow for the parallelization of the inference process. The usv-playpen GUI allows you to run the process locally (which can be time consuming), and it provides you with a shell script you can modify for cluster usage (/usv-playpen/other/cluster/DAS/das_inference_global.sh).

To run DAS inference, you need to list the root directories of interest, select the directory and base name of your DAS model, select Run DAS inference, click Next and finally Process:

Processing Step 12

This will create a das_annotations subdirectory which will contain a CSV file for each recorded channel, denoting the start and end of each detected vocalization.

├── 20250430_145017
│   ├── 20250430_145017_metadata.yaml
│   ├── audio
│   │   ├── cropped_to_video
│   │   │   ...
│   │   ├── das_annotations
│   │   │   ├── m_250430145009_ch01_cropped_to_video_hpss_filtered_annotations.csv
│   │   │   ...
│   │   ├── hpss
│   │   │   ...
│   │   ├── hpss_filtered
│   │   │   ...
│   │   ├── original_mc
│   │   │   ...
│   │   ├── audio_triggerbox_sync_info.json
│   ├── ephys
│   │   ...
│   ├── sync
│   │   ...
│   └── video
│       ...

The /usv-playpen/_parameter_settings/process_settings.json file contains a section fully modifiable in the GUI, with the following parameters:

  • das_conda_env_name : name of the local conda environment used for running DAS inference

  • model_directory : directory containing the trained DAS model

  • model_name_base : base name (date) of the trained DAS model

  • output_file_type : output file type (“csv” or “h5”)

  • segment_confidence_threshold : confidence threshold for segmenting vocalizations

  • segment_minlen : minimum length of segments to be considered vocalizations

  • segment_fillgap : maximum gap between segments to be joined into a single vocalization

"das_command_line_inference": {
    "das_conda_env_name": "das",
    "model_directory": "/mnt/falkner/Bartul/DAS/model_2024-03-25",
    "model_name_base": "20240325_073951",
    "output_file_type": "csv",
    "segment_confidence_threshold": 0.5,
    "segment_minlen": 0.015,
    "segment_fillgap": 0.015
  },

Curate DAS outputs

As explained above, DAS is run on every channel separately, such that a need arises to systematize different channel detections in one singular table. This code identifies the same detections across different channels and creates a single CSV file with the start and end times of each detected vocalization.

To run, you need to list the root directories of interest, select Curate DAS outputs, click Next and then Process:

Processing Step 13

This process will create [1] a 20250430_145017_usv_summary.csv file, and [2] a 20250430_145017_usv_signal_correlation_histogram.svg file, as shown below:

├── 20250430_145017
│   ├── 20250430_145017_metadata.yaml
│   ├── audio
│   │   ├── cropped_to_video
│   │   │   ...
│   │   ├── das_annotations
│   │   │   ...
│   │   ├── hpss
│   │   │   ...
│   │   ├── hpss_filtered
│   │   │   ...
│   │   ├── original_mc
│   │   │   ...
│   │   ├── 20250430_145017_usv_summary.csv
│   │   ├── 20250430_145017_usv_signal_correlation_histogram.svg
│   │   ├── audio_triggerbox_sync_info.json
│   ├── ephys
│   │   ...
│   ├── sync
│   │   ...
│   └── video
│       ...

The usv_summary.csv file should look similar to an example table below:

┌────────┬─────────────┬─────────────┬──────────┬───┬─────────────┬───────────┬─────────────────────────────────┬──────────┐
│ usv_id ┆ start       ┆ stop        ┆ duration ┆ … ┆ mean_amp_ch ┆ chs_count ┆ chs_detected                    ┆ emitter  │
│ ---    ┆ ---         ┆ ---         ┆ ---      ┆   ┆ ---         ┆ ---       ┆ ---                             ┆ ---      │
╞════════╪═════════════╪═════════════╪══════════╪═══╪═════════════╪═══════════╪═════════════════════════════════╪══════════╡
│ 0      ┆ 0.23296     ┆ 0.299388    ┆ 0.066428 ┆ … ┆ 17.0        ┆ 24.0      ┆ [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,… ┆ null     │
│ 1      ┆ 0.36064     ┆ 0.42278     ┆ 0.06214  ┆ … ┆ 17.0        ┆ 24.0      ┆ [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,… ┆ null     │
│ 2      ┆ 0.488896    ┆ 0.58534     ┆ 0.096444 ┆ … ┆ 2.0         ┆ 24.0      ┆ [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,… ┆ null     │
│ 3      ┆ 0.643392    ┆ 0.734588    ┆ 0.091196 ┆ … ┆ 2.0         ┆ 24.0      ┆ [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,… ┆ null     │
│ 4      ┆ 0.800192    ┆ 0.942972    ┆ 0.14278  ┆ … ┆ 11.0        ┆ 24.0      ┆ [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,… ┆ null     │
│ …      ┆ …           ┆ …           ┆ …        ┆ … ┆ …           ┆ …         ┆ …                               ┆ …        │
│ 2561   ┆ 1193.784896 ┆ 1193.828988 ┆ 0.044092 ┆ … ┆ 23.0        ┆ 20.0      ┆ [0, 1, 2, 3, 5, 6, 7, 8, 9, 11… ┆ null     │
│ 2562   ┆ 1195.412544 ┆ 1195.433852 ┆ 0.021308 ┆ … ┆ 23.0        ┆ 1.0       ┆ [23]                            ┆ null     │
│ 2563   ┆ 1195.531392 ┆ 1195.5639   ┆ 0.032508 ┆ … ┆ 23.0        ┆ 4.0       ┆ [0, 17, 21, 23]                 ┆ null     │
│ 2564   ┆ 1195.775552 ┆ 1195.81926  ┆ 0.043708 ┆ … ┆ 23.0        ┆ 24.0      ┆ [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,… ┆ null     │
│ 2565   ┆ 1197.163712 ┆ 1197.196348 ┆ 0.032636 ┆ … ┆ 6.0         ┆ 2.0       ┆ [4, 6]                          ┆ null     │
└────────┴─────────────┴─────────────┴──────────┴───┴─────────────┴───────────┴─────────────────────────────────┴──────────┘

The usv_signal_correlation_histogram.svg file contains a histogram of [1] mean spectrogram correlations between channels and its noise/signal cutoff, and [2] the histogram of normalized spectral variance for single channel detections and its noise/signal cutoff (an example of which is shown below). The assumption is that noise correlates poorly across channels and has a smaller variance (as it is largely low volume).

Correlation and variance of signal summary

The /usv-playpen/_parameter_settings/process_settings.json file contains a section not modifiable in the GUI, but it can be modified manually:

  • len_win_signal : STFT window length

  • low_freq_cutoff : frequency cutoff for filtering (in kHz)

  • noise_corr_cutoff_min : minimum correlation coefficient for noise

  • noise_var_cutoff_max : maximum variance for noise

"summarize_das_findings": {
   "len_win_signal": 512,
   "low_freq_cutoff": 30000,
   "noise_corr_cutoff_min": 0.15,
   "noise_var_cutoff_max": 0.001
}

Prepare and run USV assignment

You might also want to know which animal emitted which vocalization. To do this, usv-playpen relies on vocalocator, a tool for localizing animal vocalizations in 3D space, and it assumes you already have a trained model. These steps can be run separately (still in sequence, though), but for the sake of simplicity, they will be described jointly. To run these steps together, you need to list the root directories of interest, select the arena directory, select the conda environment name for vocalocator, select the directory of the vocalocator model, select Prepare USV assignment and Run USV assignment, select the Assignment type (vocalocator or click Next and then Process:

Processing Step 14

This will create a sound_localization subdirectory which will contain several files: [1] dse.h5 file which contains all data relevant for sound localization, [2] assessment.h5 file which contains 2D assessment data, and [3] assessment_assn.npy which contains 6D assessment output - the output of this file is then transferred to the “emitter” column of the 20250430_145017_usv_summary.csv file.

├── 20250430_145017
│   ├── 20250430_145017_metadata.yaml
│   ├── audio
│   │   ├── cropped_to_video
│   │   │   ...
│   │   ├── das_annotations
│   │   │   ...
│   │   ├── hpss
│   │   │   ...
│   │   ├── hpss_filtered
│   │   │   ...
│   │   ├── original_mc
│   │   │   ...
│   │   ├── sound_localization
│   │   │   ├── model_predictions.npz
│   │   │   ├── dset.h5
│   │   ├── 20250430_145017_usv_summary.csv
│   │   ├── 20250430_145017_usv_signal_correlation_histogram.svg
│   │   ├── audio_triggerbox_sync_info.json
│   ├── ephys
│   │   ...
│   ├── sync
│   │   ...
│   └── video
│       ...

The modified usv_summary.csv file now contains information in the last column for those vocalizations that have been attributed to specific animals:

┌────────┬─────────────┬─────────────┬──────────┬───┬─────────────┬───────────┬─────────────────────────────────┬──────────┐
│ usv_id ┆ start       ┆ stop        ┆ duration ┆ … ┆ mean_amp_ch ┆ chs_count ┆ chs_detected                    ┆ emitter  │
│ ---    ┆ ---         ┆ ---         ┆ ---      ┆   ┆ ---         ┆ ---       ┆ ---                             ┆ ---      │
│ i64    ┆ f64         ┆ f64         ┆ f64      ┆   ┆ f64         ┆ f64       ┆ str                             ┆ str      │
╞════════╪═════════════╪═════════════╪══════════╪═══╪═════════════╪═══════════╪═════════════════════════════════╪══════════╡
│ 0      ┆ 0.23296     ┆ 0.299388    ┆ 0.066428 ┆ … ┆ 17.0        ┆ 24.0      ┆ [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,… ┆ null     │
│ 1      ┆ 0.36064     ┆ 0.42278     ┆ 0.06214  ┆ … ┆ 17.0        ┆ 24.0      ┆ [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,… ┆ null     │
│ 2      ┆ 0.488896    ┆ 0.58534     ┆ 0.096444 ┆ … ┆ 2.0         ┆ 24.0      ┆ [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,… ┆ 158114_2 │
│ 3      ┆ 0.643392    ┆ 0.734588    ┆ 0.091196 ┆ … ┆ 2.0         ┆ 24.0      ┆ [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,… ┆ 158114_2 │
│ 4      ┆ 0.800192    ┆ 0.942972    ┆ 0.14278  ┆ … ┆ 11.0        ┆ 24.0      ┆ [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,… ┆ 158114_2 │
│ …      ┆ …           ┆ …           ┆ …        ┆ … ┆ …           ┆ …         ┆ …                               ┆ …        │
│ 2561   ┆ 1193.784896 ┆ 1193.828988 ┆ 0.044092 ┆ … ┆ 23.0        ┆ 20.0      ┆ [0, 1, 2, 3, 5, 6, 7, 8, 9, 11… ┆ null     │
│ 2562   ┆ 1195.412544 ┆ 1195.433852 ┆ 0.021308 ┆ … ┆ 23.0        ┆ 1.0       ┆ [23]                            ┆ 156693_3 │
│ 2563   ┆ 1195.531392 ┆ 1195.5639   ┆ 0.032508 ┆ … ┆ 23.0        ┆ 4.0       ┆ [0, 17, 21, 23]                 ┆ null     │
│ 2564   ┆ 1195.775552 ┆ 1195.81926  ┆ 0.043708 ┆ … ┆ 23.0        ┆ 24.0      ┆ [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,… ┆ null     │
│ 2565   ┆ 1197.163712 ┆ 1197.196348 ┆ 0.032636 ┆ … ┆ 6.0         ┆ 2.0       ┆ [4, 6]                          ┆ 156693_3 │
└────────┴─────────────┴─────────────┴──────────┴───┴─────────────┴───────────┴─────────────────────────────────┴──────────┘

The /usv-playpen/_parameter_settings/process_settings.json file contains a section fully modifiable in the GUI, with the following parameters:

  • vcl_conda_env_name : name of the local conda environment used for running Vocalocator

  • vcl_model_directory : directory containing the trained Vocalocator model

  • vcl_version : version of the Vocalocator model (e.g., “vcl-ssl” for the SSL model)

"vocalocator": {
 "vcl_conda_env_name": "vcl",
 "vcl_model_directory": "",
 "vcl_version": "vcl-ssl",
}

A/V Synchronization

To run A/V synchronization, you need to list the root directories of interest, select A/V Synchronization, click Next and then Process:

Processing Step 15

The A/V synchronization procedure will first crate a sync_px file for each input camera, recording pixel intensities of each LED position. The objective is to identify the start of each IPI event in camera time and on both audio devices. One can then compare, for each individual IPI event, what the discrepancy is between the clocks of both devices and that is captured in the summary.svg histograms.

├── 20250430_145017
│   ├── 20250430_145017_metadata.yaml
│   ├── audio
│   │   ...
│   ├── ephys
│   │   ...
│   ├── sync
│   │   ...
│   │   ├── nidq_ipi_data.npy
│   │   ├── sync_px_21372315-250430145009.mmap
│   │   ├── 20250430_145017_summary.svg
│   └── video
│       ...

An example output of the A/V synchronization procedure is shown below.

Notice that the plot contains two columns, one for each USGH device (which can operate in NO SYNC mode). In the first row, you can observe the distribution of A-V IPI discrepancies, which is the difference between the IPI onsets detected in the video and audio data. In the example, you can see the discrepancy goes rarely beyond one camera frame, which is ~6 ms, an acceptable amount of jitter. One might also be interested in viewing how this discrepancy evolves over time. One thing we would want to avoid are drastic changes in sampling rates on any of the devices over time. In the second row, you can see the relationship between IPI onsets time (earlier-later in the session) and the A-V IPI discrepancy. Ideally, we would want to observe a flat cloud of points, which would indicate that the A/V IPI discrepancy is stable over time. If you observe a trend that goes beyond 2 tracking frames, it might be worth investigating further.

Sync summary

In case NIDQ was also used in the recording, the first of the device plots will have a subplot detailing the temporal relationship between the NIDQ IPI onsets and the video IPI onsets (in ms). This plot is informative in case there is a large A-V discrepancy, as it allows you to determine which device (A or V) is having issues. If the NIDQ-V discrepancy is small, the sync issue is likely related to the audio device. On the contrary, if the NIDQ-V discrepancy is large, the sync issue is likely related to the video device. Either way, this is a first step in investigating this further, which is highly recommended.

Sync summary NIDQ

The /usv-playpen/_parameter_settings/process_settings.json file contains a section fully modifiable in the GUI, with the following parameters:

  • extra_data_camera : serial number of the camera used to store phidget data

  • ch_receiving_input : microphone channel receiving Arduino digital input

  • extract_exact_video_frame_times_bool : instead of using frame indices multiplied by empirical frame rate, use Loopbio times directly (which is less precise!)

  • nidq_sr : sampling rate of the NIDQ device (in Hz)

  • nidq_num_channels : number of channels on the NIDQ device (9 on BNC-2110)

  • nidq_bool : whether NIDQ device received Triggerbox AND sync input

  • nidq_triggerbox_input_bit_position : triggerbox input bit position on the NIDQ device digital channel (assumes last channel is digital!)

  • nidq_sync_input_bit_position : sync input bit position on the NIDQ device digital channel (assumes last channel is digital!)

  • camera_serial_num : serial numbers of cameras that can detect flashing LEDs

  • led_px_version : version of the LED pixel positions

  • led_px_dev : maximal deviation (in px) of observed LED flashes relative to expected positions

  • relative_intensity_threshold : top threshold (on 0-1 scale) for relative temporal change in pixel intensity

  • millisecond_divergence_tolerance : maximal deviation of IPI onsets (in ms) between video detections and ground truth

"extract_phidget_data": {
 "Gatherer": {
   "prepare_data_for_analyses": {
     "extra_data_camera": "22085397"
   }
 }
},
"find_audio_sync_trains": {
     "ch_receiving_input": 2,
     "extract_exact_video_frame_times_bool": false,
     "nidq_sr": 62500.72887,
     "nidq_num_channels": 9,
     "nidq_triggerbox_input_bit_position": 5,
     "nidq_sync_input_bit_position": 7}
 },
"find_video_sync_trains": {
     "camera_serial_num": [
         "21372315"
     ],
     "led_px_version": "current",
     "led_px_dev": 10,
     "video_extension": "mp4",
     "relative_intensity_threshold": 1.0,
     "millisecond_divergence_tolerance": 12
}