BACKGROUND: Non-alcoholic fatty liver disease (NAFLD) is a prevalent cause of chronic liver disease and encompasses a broad spectrum of disorders, including simple steatosis, steatohepatitis, fibrosis, cirrhosis, and liver cancer. However, due to the global epidemic of NAFLD, where invasive liver biopsy is the gold standard for diagnosis, it is necessary to identify a more practical method for early NAFLD diagnosis with useful therapeutic targets; as such, molecular biomarkers could most readily serve these aims. To this end, we explored the hub genes and biological pathways in fibrosis progression in NAFLD patients. METHODS: Raw data from microarray chips with GEO accession GSE49541 were downloaded from the Gene Expression Omnibus database, and the R package (Affy and Limma) was applied to investigate differentially expressed genes (DEGs) involved in the progress of low- (mild 0-1 fibrosis score) to high- (severe 3-4 fibrosis score) fibrosis stage NAFLD patients. Subsequently, significant DEGs with pathway enrichment were analyzed, including gene ontology (GO), KEGG and Wikipathway. In order to then explore critical genes, the protein-protein interaction network (PPI) was established and visualized using the STRING database, with further analysis undertaken using Cytoscape and Gephi software. Survival analysis was undertaken to determine the overall survival of the hub genes in the progression of NAFLD to hepatocellular carcinoma. RESULTS: A total of 311 significant genes were identified, with an expression of 278 being upregulated and 33 downregulated in the high vs. low group. Gene functional enrichment analysis of these significant genes demonstrated major involvement in extracellular matrix (ECM)-receptor interaction, protein digestion and absorption, and the AGE-RAGE signaling pathway. The PPI network was constructed with 196 nodes and 572 edges with PPI enrichment using a p-value < 1.0 e-16. Based on this cut-off, we identified 12 genes with the highest score in four centralities: Degree, Betweenness, Closeness, and Eigenvector. Those twelve hub genes were CD34, THY1, CFTR, COL3A1, COL1A1, COL1A2, SPP1, THBS1, THBS2, LUM, VCAN, and VWF. Four of these hub genes, namely CD34, VWF, SPP1, and VCAN, showed significant association with the development of hepatocellular carcinoma. CONCLUSIONS: This PPI network analysis of DEGs identified critical hub genes involved in the progression of fibrosis and the biological pathways through which they exert their effects in NAFLD patients. Those 12 genes offer an excellent opportunity for further focused research to determine potential targets for therapeutic applications.