AI-Powered Scientific Discovery: Transforming Data into Knowledge
|
Our latest collaborative research efforts in AI-enabled data exemplify how we hope to leverage agentic systems to facilitate scientific discovery. By pioneering specialized AI-driven, agentic workflows, we aim to streamline knowledge extraction and pave the way for a new era of faster, deeper scientific discovery.
As shown in the accompanying diagram, we are developing an orchestrated AI ecosystem that serves as a force multiplier for the Notre Dame scientific community. This envisioned system begins by ingesting and classifying diverse documents—structured and unstructured—into machine-readable formats. From there, linked data agents apply knowledge engineering techniques to identify key terms (e.g., “molecules,” “genes,” “instruments”) and categorize relationships among scientific concepts, preserving critical contextual details. The data is then woven into an interconnected graph, enabling dynamic, non-linear exploration across multiple studies. AI-driven summarization offers concise overviews of key findings, flags emerging trends, and addresses data gaps for a more complete perspective. Finally, query interpretation translates user questions into structured searches, with human-in-the-loop oversight ensuring expert guidance and validation. Model updates and enrichment processes further refine the system based on real-world feedback, while cloud-based data fabrics allow for scalable deployment and seamless integration with other resources.
We are looking to partner with campus researchers to build aligned proposals and projects to support your specific scientific domains. If you have data sources—whether published literature or other research outputs—that could benefit from advanced agentic processing, or if you have compelling research questions that can drive the discovery of new insights, please contact us. We look forward to partnering with faculty to turn complex scientific data into actionable knowledge.
|
New Storage Solution Entering Phase 2
|
As we discussed in the November 2024 CRC Connections newsletter, we have been diligently working behind the scenes to bring our new NetApp storage appliance online and integrated within the CRC. We are now ready to invite a few external beta testers to try out the system and provide feedback before a wider release. If interested in learning more about the new system and the testing program, please send an email to crcsupport@nd.edu.
|
The CRC has been running an ongoing pilot of access to on-premise open weight large language model (LLM) hosting since late last year. This is being facilitated through a web interface available at openwebui.crc.nd.edu and hosted at the CRC Datacenter. The models utilize the processing power of the experimental NVIDIA Grace Hopper Superchip architecture. This pilot is open to any researchers at Notre Dame that want to experiment with resource-intensive LLMs while ensuring that all data processing remains on CRC infrastructure.
January highlights include:
|
-
DeepSeek R1 Model Availability. For the time being this is installed as a private model. Due to some controversy around this model’s origins and outputs we have made it available by request only. Please send an email to crcsupport@nd.edu if you would like to request access to this model.
-
Updating Open WebUI to v0.5.14. This update brings several updates to the user experience as well as many new extensibility options, including external tool calling, custom function imports, code interpreters, etc.
|
If you are interested in specific system capabilities of Open WebUI or have research applications that could utilize this CRC hosted service, email crcsupport@nd.edu for a consultation with one of our staff members.
|
|
|
Tech Tip: I/O Best Practices for Efficient Workflows
|
Leveraging Job Arrays for Efficient Task Management
Job arrays are a powerful feature in job scheduling systems that allow users to efficiently execute and manage multiple tasks within a single job submission. By utilizing job arrays, users can streamline workflows, minimize administrative overhead, and optimize resource utilization. This article explores job arrays in two scheduling systems used by CRC—Altair Grid Engine (GE) and HTCondor—offering job script examples to demonstrate their implementation.
|
conda create --name myenv python=3.10 #!/bin/bash
#$ -t 1-10
# Define the task array
PARAMS=(0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0)
# Access the specific parameter value for each task
PARAM=${PARAMS[$SGE_TASK_ID - 1]}
# Execute the task
./mytask --param $PARAM
|
In this GE job script, we:
|
-
Specify a task array using the -t lag, setting the range of tasks (1–10).
- Store parameter values in the PARAMS array.
- Use the $SGE_TASK_ID environment variable to access the appropriate parameter for each task.
- Execute mytask with the assigned parameter.
|
Job Array Example in HTCondor
|
Executable = mytask
Arguments = --param $(Process)
# Define the task array
Queue 10
|
In this HTCondor job script, we:
|
- Set mytask as the executable.
- Pass the task-specific parameter using the $(Process) macro, which represents the task ID.
- Use the Queue command to submit 10 tasks, with HTCondor automatically assigning unique values to $(Process).
|
Enhancing Computational Efficiency with Job Arrays
|
Job arrays provide a structured and efficient approach to managing multiple tasks within a single job submission. By leveraging this feature in GE and HTCondor, users can optimize computational workflows, improve scalability, and enhance productivity in high-performance computing environments.
|
|
|
Top 10 Computation Users (January 2025)
|
|
|
Chemical & Biomolecular Engineering
327,631 CPU hours
|
Chemical & Biomolecular Engineering
293,591 CPU hours
|
Physics
233,001 CPU hours
|
Biological Sciences
194,313 CPU hours
|
Aerospace & Mechanical Engineering
177,079 CPU hours
|
| Aerospace & Mechanical Engineering
175,034 CPU hours
|
Chemistry & Biochemistry
158,110 CPU hours
|
Chemical & Biomolecular Engineering
138,910 CPU hours
|
Chemical & Biomolecular Engineering
97,818 CPU hours
|
Chemical & Biomolecular Engineering
95,235 CPU hours
|
|
| Top 10 Graphics Processing Unit (January 2025)
|
|
|
Computer Science and Engineering
5,745 GPU hours
|
Chemical & Biomolecular Engineering
5,055 GPU hours
|
Computer Science and Engineering
4,722 GPU hours
|
Computer Science & Engineering
3,670 GPU hours
|
Chemistry & Biochemis
3,512 GPU hours
|
| Computer Science and Engineering
3,427 GPU hours
|
Computer Science & Engineering
3,253 GPU hours
|
Chemistry & Biochemistry
2,925 GPU hours
|
Applied and Computational Mathematics and Statistics
2,510 GPU hours
|
Computer Science & Engineering
2,242 GPU hours
|
|
|
| User Training Office Hours |
Every Wednesday and Thursday
2:00 – 3:00 p.m.
812 Flanner Hall (map)
|
The CRC offers multiple training opportunities for both new and existing users. We periodically provide short courses and other learning opportunities, which are advertised on our website and through email lists. In-person office hours are held every Wednesday and Thursday from 2:00-3:30 p.m. in Flanner Hall, room 812, on a first-come, first-served basis. You can also arrange a Zoom meeting at your convenience by emailing CRCsupport@nd.edu with your availability. We recommend bringing a laptop to in-person sessions.
|
- A CRC User Account is required to participate. If you need an account, please fill out and submit the CRC Account Request Form.
-
Office hours will be held in 812 Flanner Hall. Click here to register.
|
|
|
Manage your preferences | Opt Out using TrueRemove™
Got this as a forward? Sign up to receive our future emails.
View this email online.
|
940 Grace Hall University of Notre Dame | Notre Dame, IN 46556 US
|
|
|
This email was sent to mreill14@nd.edu.
To continue receiving our emails, add us to your address book.
| |
|
|