Jump to content

AI at its best , inspired by Github Copilot - Introducing English as the New Programming Language for Apache Spark


Spartan

Recommended Posts

Introduction

We are thrilled to unveil the English SDK for Apache Spark, a transformative tool designed to enrich your Spark experience. Apache Spark™, celebrated globally with over a billion annual downloads from 208 countries and regions, has significantly advanced large-scale data analytics. With the innovative application of Generative AI, our English SDK seeks to expand this vibrant community by making Spark more user-friendly and approachable than ever!

Motivation

GitHub Copilot has revolutionized the field of AI-assisted code development. While it's powerful, it expects the users to understand the generated code to commit. The reviewers need to understand the code as well to review. This could be a limiting factor for its broader adoption. It also occasionally struggles with context, especially when dealing with Spark tables and DataFrames. The attached GIF illustrates this point, with Copilot proposing a window specification and referencing a non-existent 'dept_id' column, which requires some expertise to comprehend.

 

english programming language


Instead of treating AI as the copilot, shall we make AI the chauffeur and we take the luxury backseat? This is where the English SDK comes in. We find that the state-of-the-art large language models know Spark really well, thanks to the great Spark community, who over the past ten years contributed tons of open and high-quality content like API documentation, open source projects, questions and answers, tutorials and books, etc. Now we bake Generative AI’s expert knowledge about Spark into the English SDK. Instead of having to understand the complex generated code, you could get the result with a simple instruction in English that many understand:

 


transformed_df = df.ai.transform('get 4 week moving average sales by dept')

The English SDK, with its understanding of Spark tables and DataFrames, handles the complexity, returning a DataFrame directly and correctly!

Our journey began with the vision of using English as a programming language, with Generative AI compiling these English instructions into PySpark and SQL code. This innovative approach is designed to lower the barriers to programming and simplify the learning curve. This vision is the driving force behind the English SDK and our goal is to broaden the reach of Spark, making this very successful project even more successful.

 

code diagram

 

Features of the English SDK

The English SDK simplifies Spark development process by offering the following key features:

  • Data Ingestion: The SDK can perform a web search using your provided description, utilize the LLM to determine the most appropriate result, and then smoothly incorporate this chosen web data into Spark—all accomplished in a single step.
  • DataFrame Operations: The SDK provides functionalities on a given DataFrame that allow for transformation, plotting, and explanation based on your English description. These features significantly enhance the readability and efficiency of your code, making operations on DataFrames straightforward and intuitive.
  • User-Defined Functions (UDFs): The SDK supports a streamlined process for creating UDFs. With a simple decorator, you only need to provide a docstring, and the AI handles the code completion. This feature simplifies the UDF creation process, letting you focus on function definition while the AI takes care of the rest.
  • Caching: The SDK incorporates caching to boost execution speed, make reproducible results, and save cost.
  • Upvote 2
Link to comment
Share on other sites

Ee ai emo kani columns  I wan to this output for and give dax for.powrbi ante mottam rasi istundi 

I have learnt more powerbi thru chatgpt than my training thru Ameerpet 

  • Like 1
Link to comment
Share on other sites

11 hours ago, kevinUsa said:

Ee ai emo kani columns  I wan to this output for and give dax for.powrbi ante mottam rasi istundi 

I have learnt more powerbi thru chatgpt than my training thru Ameerpet 

okasari output check chesko motham boothule untay

  • Haha 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...