The HDF Group's Call the Doctor

The HDF Group

Find the latest information about the HDF file formats and new developments from The HDF Group and the HDF community. read less
NewsNews

Episodes

HDF 4.3.0 Release - Dana Robinson on Call the Doctor 2/27/24
Feb 27 2024
HDF 4.3.0 Release - Dana Robinson on Call the Doctor 2/27/24
In this episode of Call the Doctor," director of software engineering, Dana Robinson covers updates and changes in the HDF4 library. Dana discusses HDF 4.3.0 (released 2/29/24) and version 4.4, with plans to maintain HDF4 indefinitely. The focus is on improving HDF4's compatibility with modern compilers and platforms, addressing issues like the xdr library's compatibility problems and the deployment of internal header files. Significant changes include removing xdr from configuration options, improving compiler support, and cleaning up memory sanitizer issues.The episode also touches on restructuring the Fortran code, removing outdated Fortran 77 support, and potentially merging libraries into a single HDF4 library. Additionally, plans involve phasing out the old netCDF 2.3.2 API and associated tools in favor of newer alternatives.The HDF Group will continue to modernizing HDF4 to ensure its long-term maintainability, despite potential disruptions caused by these changes. We do need to hear from users about these plans and how they might work or cause conflict with your code, so please reach out to us to let us know about your concerns on the HDF4 Forum or by emailing help@hdfgroup.org.  This session happened on February 27, 2024. You can also watch this episode online.Call the Doctor is a series of weekly, unscripted, live events! The HDF Group’s staff members will answer attendee questions and, for example, go over the previous week’s HDF Forum posts. The HDF Clinics are free sessions intended to help users tackle real-world HDF problems from a common cold to severe headaches and offer relief where that’s possible. As time permits, we will include how-tos, offer advice on tool usage, review your code samples, teach you survival in the documentation jungle, and discuss what’s new or just around the corner in the land of HDF. Join us every Tuesday at 12:20 p.m. central (US/Canada.) on Zoom!
Documenting how to save data to an HDF5 file - Aleksandar Jelenak on Call the Doctor 2/20/24
Feb 20 2024
Documenting how to save data to an HDF5 file - Aleksandar Jelenak on Call the Doctor 2/20/24
In this episode of "Call the Doctor," Aleksandar Jelenak discusses ways to document and describe the content of HDF5 files in order to generate documentation about the file content, as inspired by this forum post from Mike Jackson of Blue Quartz. The session emphasizes the need for a format that is approachable, shareable, and easily parsable by multiple programming languages. Jelenak discusses various options that have been used, including Excel spreadsheets, JSON, and text formats like YAML. He also presents his own idea of using YAML documents to describe the content of HDF5 files in a hierarchical and straightforward manner. The session concludes with a discussion about the importance of bidirectionality in the toolchain and the potential for future developments in this area.This session happened on February 20, 2024. You can also watch this episode online.Call the Doctor is a series of weekly, unscripted, live events! The HDF Group’s staff members will answer attendee questions and, for example, go over the previous week’s HDF Forum posts. The HDF Clinics are free sessions intended to help users tackle real-world HDF problems from a common cold to severe headaches and offer relief where that’s possible. As time permits, we will include how-tos, offer advice on tool usage, review your code samples, teach you survival in the documentation jungle, and discuss what’s new or just around the corner in the land of HDF. Join us every Tuesday at 12:20 p.m. central (US/Canada.) on Zoom!
An HDF5 Tutorial Developed by the Community for the Community - Call the Doctor hosted by Gerd Heber
Jan 31 2024
An HDF5 Tutorial Developed by the Community for the Community - Call the Doctor hosted by Gerd Heber
For an individual, creating an HDF5 tutorial is hard; making a good HDF5 tutorial is nearly impossible. There are many technical facets to cover; the global community it serves is phenomenally diverse, and so is its rich ecosystem. Wouldn’t it be great to have a tutorial that anyone could enjoy without pesky installation details, and everyone with an idea to improve upon it could contribute easily? Introducing the HDF5 Tutorial developed by the community for the community. Fork it on GitHub at https://github.com/HDFGroup/hdf5-tutorial.The HDF Group's Executive Director Gerd Heber will uses this session of Call the Doctor to give an overview of the tutorial and introduce the underpinnings so that everyone so inclined can contribute and help create the best possible HDF5 tutorial ever.Would you like to discuss this tutorial with Gerd and others? Come to the forum and let us know your thoughts. This session was recorded on January 30, 2024. You can also watch this episode online.Call the Doctor is a series of weekly, unscripted, live events! The HDF Group’s staff members will answer attendee questions and, for example, go over the previous week’s HDF Forum posts. The HDF Clinics are free sessions intended to help users tackle real-world HDF problems from a common cold to severe headaches and offer relief where that’s possible. As time permits, we will include how-tos, offer advice on tool usage, review your code samples, teach you survival in the documentation jungle, and discuss what’s new or just around the corner in the land of HDF. Join us every Tuesday at 12:20 p.m. central (US/Canada.) on Zoom!
Issues writing results in the CGNS (CFD General Notation System) format - Scot Breitenfeld
Jan 24 2024
Issues writing results in the CGNS (CFD General Notation System) format - Scot Breitenfeld
In this episode of "Call the Doctor," Scot Breitenfeld of The HDF Group hosted an open help session for your HPC HDF5 questions. (Questions are welcome at any session, but we knew we had some community members with specific needs for this session.) A research engineer working on open-source thermohydraulics code discusses issues with writing results in the CGNS (CFD General Notation System) format. The engineer explores different approaches, including writing the CGNS skeleton on the master processor and distributing data writing on multiple nodes. However, there are concerns about the slowdown when increasing time steps, possibly due to opening and closing the HDF file at each time step. The discussion also touches on options like sub-filing and using the core file driver to eliminate the file system component. Additionally, there's a brief inquiry about compressing large strings in HDF5. See this forum post for details. Overall, the episode addresses technical challenges in parallel file writing and optimization strategies.This session happened on January 23, 2024. You can also watch this episode online.Call the Doctor is a series of weekly, unscripted, live events! The HDF Group’s staff members will answer attendee questions and, for example, go over the previous week’s HDF Forum posts. The HDF Clinics are free sessions intended to help users tackle real-world HDF problems from a common cold to severe headaches and offer relief where that’s possible. As time permits, we will include how-tos, offer advice on tool usage, review your code samples, teach you survival in the documentation jungle, and discuss what’s new or just around the corner in the land of HDF. Join us every Tuesday at 12:20 p.m. central (US/Canada.) on Zoom!
Exploring and comparing performance between original and cloud-optimized HDF5 files
Jan 16 2024
Exploring and comparing performance between original and cloud-optimized HDF5 files
In this episode of "Call the Doctor," Aleksandar Jelenak delves into recent work on analyzing Earth science HDF5 files, comparing their original versions with cloud-optimized versions. Using data from the Global Ecosystem Dynamics Investigation Instrument (GEDI) on the International Space Station, he showcases the instrument's laser beams mapping Earth's vegetation. Aleksandar discusses the challenges of optimizing HDF5 files for cloud usage, emphasizing the need for user-friendly data for scientists worldwide. He presents a detailed analysis of chunk data sets, storage settings, and statistics, highlighting the potential benefits of cloud-optimized files. The episode concludes with performance comparisons between original and cloud-optimized files, shedding light on the advantages of efficient data storage and access.This session was recorded on January 16, 2024. You can also watch it online.Call the Doctor is a series of weekly, unscripted, live events! The HDF Group’s staff members will answer attendee questions and, for example, go over the previous week’s HDF Forum posts. The HDF Clinics are free sessions intended to help users tackle real-world HDF problems from a common cold to severe headaches and offer relief where that’s possible. As time permits, we will include how-tos, offer advice on tool usage, review your code samples, teach you survival in the documentation jungle, and discuss what’s new or just around the corner in the land of HDF. Join us every Tuesday at 12:20 p.m. central (US/Canada.) on Zoom!
New HSDS Features Coming Soon and a Design Proposal for Long Running Tasks - John Readey
Jan 9 2024
New HSDS Features Coming Soon and a Design Proposal for Long Running Tasks - John Readey
In this episode of Call the Doctor, The HDF Group's John Readey discusses upcoming features in the HSDS release. Since John is in China, the session is pre-recorded, with software engineer Matt Larson available to answer questions. The discussed features include shape reduction, broadcasting, UTF-8 fixed strings, quick scan, N-bit and scale offset filters, enhanced array type support, field ops for compound types, support for long attribute names, non-UTF-8 encodable attributes, multi-op attributes, long link names, and hyper chunking. John also introduces the concepts of async tasks for long-running operations and using Parquet for encoding chunks with variable-length types.  The design doc for async tasks can be viewed on github.  The episode includes a demonstration of the attribute multi-op feature, showcasing its efficiency compared to a serial approach. You can participate in the discussion of features for HSDS 0.9 on the forum.This session happened on January 9, 2024. You can also watch this episode.Call the Doctor is a series of weekly, unscripted, live events! The HDF Group’s staff members will answer attendee questions and, for example, go over the previous week’s HDF Forum posts. The HDF Clinics are free sessions intended to help users tackle real-world HDF problems from a common cold to severe headaches and offer relief where that’s possible. As time permits, we will include how-tos, offer advice on tool usage, review your code samples, teach you survival in the documentation jungle, and discuss what’s new or just around the corner in the land of HDF. Join us every Tuesday at 12:20 p.m. central (US/Canada.) on Zoom!
BONUS: Zarr as HDF5 Cloud Format? – Aleksandar Jelenak, The HDF Group
Jan 2 2024
BONUS: Zarr as HDF5 Cloud Format? – Aleksandar Jelenak, The HDF Group
While we're off for this week, please enjoy this audio recording from the 2023 HDF5 Users Group meeting held in Ohio. This session was presented by Aleksandar Jelenak, The HDF Group.Zarr is a fairly recent format for multidimensional data arrays specifically targeting storage systems with key-value interface. Some scientific communities interested in implementing scalable cloud-native data analysis are considering Zarr as their chosen data format because of its straightforward implementation in cloud object stores. HDF Group had developed its own cloud-native HDF5 format, called HSDS schema, about the same time as Zarr. Only HDF Group’s developed software, HSDS, currently creates data in the HSDS schema. Since both Zarr and HSDS schema share the same design approach, it would be worthwhile to consider whether Zarr could serve as the cloud-native HDF5 format. The currently developed Zarr version 3 specification introduces the concept of extensions as a way to add more storage features. The goal of this session is to discuss pros and cons of using Zarr v3 to formulate a new cloud-native HDF5 format. Some technical information will be provided with aim to open up discussion among all attendees.If you'd like, you can watch this session online.Call the Doctor is a series of weekly, unscripted, live events! The HDF Group’s staff members will answer attendee questions and, for example, go over the previous week’s HDF Forum posts. The HDF Clinics are free sessions intended to help users tackle real-world HDF problems from a common cold to severe headaches and offer relief where that’s possible. As time permits, we will include how-tos, offer advice on tool usage, review your code samples, teach you survival in the documentation jungle, and discuss what’s new or just around the corner in the land of HDF. Join us every Tuesday at 12:20 p.m. central (US/Canada.) on Zoom!
BONUS: State of HDF5 and New Features - Dana Robinson and Neil Fortner, The HDF Group
Dec 26 2023
BONUS: State of HDF5 and New Features - Dana Robinson and Neil Fortner, The HDF Group
While we're off for the holidays, enjoy this audio recording of the session, "State of HDF5 and New Features" from Dana Robinson (Director of Software Engineering) and Neil Fortner (Chief HDF5 Software Architect), presented at the August 2023 HDF5 User Group meeting held in Ohio. Dana talked about some changes being made: the HDF5 Working Group meetings, the Sustaining Engineer of the Week, centering development on GitHub and the need for external Codeowners, a new process for Change management, and some of the HDF5 issues and development work we plan to focus on. Neil talked about new features being added to HDF5: Multi Dataset I/O, Selection and Vector I/O, and the Subfiling VFD.Note: At the end when we took questions from the audience, you'll find the audio is not that great. We apologize for that. Later sessions recorded during this event were better.You can watch the recording of this session on youtube, and also access Dana's slide deck and Neil's slide deck if the audio experience isn't doing it for you.  Call the Doctor is a series of weekly, unscripted, live events! The HDF Group’s staff members will answer attendee questions and, for example, go over the previous week’s HDF Forum posts. The HDF Clinics are free sessions intended to help users tackle real-world HDF problems from a common cold to severe headaches and offer relief where that’s possible. As time permits, we will include how-tos, offer advice on tool usage, review your code samples, teach you survival in the documentation jungle, and discuss what’s new or just around the corner in the land of HDF. Join us every Tuesday at 12:20 p.m. central (US/Canada.) on Zoom!
Linked chunks in HSDS and a new HSLS feature - John Readey
Dec 12 2023
Linked chunks in HSDS and a new HSLS feature - John Readey
In this episode of "Call the Doctor," The HDF Group's John Readey explores the functionality of linked data sets in HSDS (Highly Scalable Data Service).  Using a Python notebook running on AWS, he walks through examples using data from the National Renewable Energy Lab, which has substantial HDF5 and HSDS data freely accessible. John covers various aspects, including domain information, data set details, and how to read and analyze chunks. He delves into the specifics of the chunk layout, discussing file URIs, offsets, and sizes. Comparisons between HSDS and direct S3 access using the HDF5 library reveal differences in performance due to the sequential nature of the HDF5 library's requests. John concludes by demonstrating a new feature for querying specific data sets using hsls.You can also watch this session online.Call the Doctor is a series of weekly, unscripted, live events! The HDF Group’s staff members will answer attendee questions and, for example, go over the previous week’s HDF Forum posts. The HDF Clinics are free sessions intended to help users tackle real-world HDF problems from a common cold to severe headaches and offer relief where that’s possible. As time permits, we will include how-tos, offer advice on tool usage, review your code samples, teach you survival in the documentation jungle, and discuss what’s new or just around the corner in the land of HDF. Join us every Tuesday at 12:20 p.m. central (US/Canada.) on Zoom!
LLM experiments with documentation - Gerd Heber on Call the Doctor 11/21/23
Nov 21 2023
LLM experiments with documentation - Gerd Heber on Call the Doctor 11/21/23
In this episode of "Call the Doctor" hosted by Gerd Heber, the theme revolves around documentation, particularly in the context of large language models. The discussion begins with recent technical difficulties at https://portal.hdfgroup.org due to a cyber attack, resulting in the unavailability of certain documentation. The team is working on recovery, utilizing GitHub Pages and seeking community involvement. Gerd Heber introduces a project using ui.chat, demonstrating its capabilities by creating a custom chatbot trained on the HDF5 GitHub repository and RFC documents. While showcasing the chatbot's potential, the discussion touches on challenges, such as outdated documentation and the need for ongoing updates. The episode highlights the intersection of technology, documentation, and community engagement.You can watch this video online.Call the Doctor is a series of weekly, unscripted, live events! The HDF Group’s staff members will answer attendee questions and, for example, go over the previous week’s HDF Forum posts. The HDF Clinics are free sessions intended to help users tackle real-world HDF problems from a common cold to severe headaches and offer relief where that’s possible. As time permits, we will include how-tos, offer advice on tool usage, review your code samples, teach you survival in the documentation jungle, and discuss what’s new or just around the corner in the land of HDF. Join us every Tuesday at 12:20 p.m. central (US/Canada.) on Zoom!
Recent updates to the REST VOL by Matt Larson, Call the Doctor 11-14-23
Nov 14 2023
Recent updates to the REST VOL by Matt Larson, Call the Doctor 11-14-23
In this session of Call the Doctor, The HDF Group's Matt Larson introduces the REST VOL (Virtual Object Layer) in the HDF5 library, focusing on its role as a VOL connector that maps API calls to rest requests sent to external servers, particularly HSDS instances. He emphasizes the flexibility of the VOL layer, allowing users to develop their own connectors, and compares REST VOL to alternative options for cloud-based HDF5 work. The talk covers REST VOL's history, recent updates, and performance advantages, showcasing its support for multi-read and multi-write operations. Matt also highlights new features, including support for fill values and enhanced symbolic link traversal, while discussing ongoing developments like automatic numeric type conversion and multi-server connectivity with a single REST VOL instance. The presentation concludes with an opportunity for questions, addressing topics such as dynamically loading multiple distinct REST VOL instances concurrently.You can also watch this episode online.Call the Doctor is a series of weekly, unscripted, live events! The HDF Group’s staff members will answer attendee questions and, for example, go over the previous week’s HDF Forum posts. The HDF Clinics are free sessions intended to help users tackle real-world HDF problems from a common cold to severe headaches and offer relief where that’s possible. As time permits, we will include how-tos, offer advice on tool usage, review your code samples, teach you survival in the documentation jungle, and discuss what’s new or just around the corner in the land of HDF. Join us every Tuesday at 12:20 p.m. central (US/Canada.) on Zoom!