init6

init6init6init6

init6

init6init6init6
  • HOME
  • SERVICES
  • ICERACK
  • FREE ASSESSMENT
  • More
    • HOME
    • SERVICES
    • ICERACK
    • FREE ASSESSMENT
AVAILABLE FOR PROJECTS
  • HOME
  • SERVICES
  • ICERACK
  • FREE ASSESSMENT
AVAILABLE FOR PROJECTS

Ways we make your infrastructure actually work.

From initial deployment to ongoing managed operations. Every engagement is vendor-neutral  no hardware margin, no supplier bias. 

Cluster Design & Deployment

Cluster Design & Deployment

Cluster Design & Deployment

From first conversation to accepted, benchmarked, production-ready cluster. 

DESIGN

Managed HPC Operations

Cluster Design & Deployment

Cluster Design & Deployment

Monthly retainer. Your cluster stays healthy without a full-time hire. 

OPERATE

Procurement & RFP Advisory

Cluster Design & Deployment

Procurement & RFP Advisory

Review and rewrite your HPC RFP before it goes to tender. 

ADVISE

Chaos Engineering

Training & Workshops

Procurement & RFP Advisory

We break your cluster first. Deliberately. In a controlled maintenance window. 

PROACTIVE

Training & Workshops

Training & Workshops

Training & Workshops

Practical training for researchers, administrators, and IT teams. 

TRAIN

01 HPC Cluster Design & Deployment

Deploy

What We Deliver

 

  • Workload analysis and right-sizing — the cluster you need, not the largest you can afford
  • Hardware specification and vendor-neutral procurement advisory
  • OS provisioning with Warewulf 4 or xCAT on Rocky Linux 9 / RHEL 9
  • Slurm configuration: partitions, fairshare, GPU scheduling, memory limits, cgroup enforcement
  • InfiniBand bring-up: MOFED installation, fabric validation with ibdiagnet, bandwidth testing
  • Parallel storage: Lustre or BeeGFS, stripe configuration, IOR-validated throughput
  • HPL, STREAM, and IOR benchmarking with written acceptance report
  • User training and documentation your team will actually use

Who It's For

 Research labs, AI startups, pharma R&D teams, engineering simulation groups, and IT resellers delivering HPC projects. 

Typical Timeline

 6–11 weeks from purchase order to acceptance, depending on cluster size and site readiness. 

02 Managed HPC Operations

OPERATE

What's Included

 

  • Monthly health checks: node status, scheduler queue, storage usage, network fabric
  • Slurm tuning: memory limits, fairshare weights, partition configuration as workloads evolve
  • OS and software updates: security patches, OpenHPC updates, module environment maintenance
  • Performance diagnosis: when jobs are slower than expected, we find out why
  • On-call support for critical issues  defined SLA per tier
  • Monthly utilisation report with recommendations

Why a Retainer Works

 A full-time HPC administrator costs ₹8–15L/year in salary. It's also a difficult hire to make and retain. A monthly retainer at ₹30K–60K provides specialist expertise on-demand  without the hiring risk. 

Who It's For

 University labs, smaller research institutions, and startups that have deployed HPC but don't have dedicated operations expertise in-house. 

03 HPC Procurement & RFP Advisory

ADVICE

The Problem We Solve

 Government and academic HPC RFPs in India consistently produce underperforming systems. The root cause is almost never hardware quality — it's specification quality. Specs written by procurement teams without HPC domain expertise create perverse vendor incentives. 

The Six Failures We Fix

 

  • Peak FLOPS as the primary metric  a number that real workloads never achieve
  • Interconnect under-specified: "high-speed networking" without technology, bandwidth, or latency requirements
  • Storage specified in capacity only nothing about bandwidth, IOPS, or filesystem type
  • Software stack absent  hardware is 40% of an HPC deployment
  • No benchmark acceptance criteria  no contractual basis to reject underperformance
  • TCO not modelled  lowest capex wins, operational costs ignored

What We Deliver

A reviewed and rewritten specification with measurable acceptance criteria. Vendors can't win on paper and underdeliver in production. 

04 Chaos Engineering

TRAIN

HPC Resilience Audit & Chaos Engineering

 

Most HPC clusters are accepted on HPL benchmarks tested under ideal conditions, then handed over. They are never deliberately broken. The first real failure happens in production, at the worst possible moment.


We break your cluster first. Deliberately. In a controlled maintenance window. Across 8–12 specific failure scenarios  node dropout, InfiniBand degradation, storage failure, scheduler stress, memory overcommit, login node loss. We document how your system behaves under each. We fix what's fixable. You get a resilience report before production does.

Node Failure

Pull a compute node mid-job. Does the job fail cleanly, hang silently, or reschedule?

Storage Failure

Take a Lustre OST or BeeGFS chunk offline. Does the application crash, hang, or corrupt output silently 

Memory Overcommit

Submit jobs exceeding requested memory. Does cgroup enforcement kill cleanly or does the node swap? 

InfiniBand Degradation

Force a port to failure under thermal load. Does MPI degrade gracefully or catastrophically? 

Scheduler Stress

Submit 500 jobs simultaneously. Kill and restart slurmctld. Does the queue recover cleanly? 

Login Node Loss

Take the primary login node offline. Can users reach the cluster? Is there a secondary path? 

 Every scenario classified: recovers automatically · requires manual intervention · causes data loss · fails silently. Delivered as a written resilience report with remediation steps. The test your vendor never ran. 

05 HPC Training & Workshops

TRAIN

Workshop Topics

 

  • Slurm job scheduling: writing job scripts, resource requests, troubleshooting failed jobs
  • Linux for HPC: file systems, process management, environment modules, common pitfalls
  • Storage and I/O: understanding parallel filesystems, stripe configuration, avoiding bottlenecks
  • InfiniBand fundamentals: how it works, how to validate it, how to diagnose problems
  • HPC cluster administration: day-to-day operations for non-specialist admins
  • Benchmarking and acceptance: HPL, STREAM, IOR — what they measure and how to interpret results

Format

Half-day or full-day workshops. Remote via video call or on-site across India. Custom curricula available for specific workload environments (ML/AI, CFD, molecular dynamics, genomics). 

Who It's For

Research labs post-deployment, university IT teams taking on HPC responsibility, and PhD/postdoc researchers who use HPC but weren't trained on it. 

WAR ROOM

 Emergency HPC incident response. We're on a call within the hour — not a ticket system. Available evenings & weekends IST. 

WHATSAPP NOW
  • HOME

init6

THE HPC COMPANY

Copyright © 2025 init6 - All Rights Reserved.


This website uses cookies.

We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.

Accept