user-defined-functions - w3toppers.com

How do I call a UDF on a Spark DataFrame using JAVA?

Spark >= 2.3 Scala-style udf can be invoked directly: import static org.apache.spark.sql.functions.*; import org.apache.spark.sql.expressions.UserDefinedFunction; UserDefinedFunction mode = udf( (Seq<String> ss) -> ss.headOption(), DataTypes.StringType ); df.select(mode.apply(col(“vs”))).show(); Spark < 2.3 Even if we assume that your UDF is useful and cannot be replaced by a simple getItem call it has incorrect signature. Array columns are exposed using … Read more

Using UDF ignores condition in when

You have to remember that Spark SQL (unlike RDD) is not what-you-see-is-what-you-get. Optimizer / planner is free to schedule operations in the arbitrary order or even repeat stages multiple times. Python udfs are not applied on a Row basis, but using batch mode. when is not so much ignored, but not used to optimize execution … Read more

Derive multiple columns from a single column in a Spark DataFrame

Generally speaking what you want is not directly possible. UDF can return only a single column at the time. There are two different ways you can overcome this limitation: Return a column of complex type. The most general solution is a StructType but you can consider ArrayType or MapType as well. import org.apache.spark.sql.functions.udf val df … Read more

SQL Server 2008 – How do i return a User-Defined Table Type from a Table-Valued Function?

Even though you can not return the UDTT from a function, you can return a table variable and receive it in a UDTT as long as the schema match. The following code is tested in SQL Server 2008 R2 — Create the UDTT CREATE TYPE dbo.MyCustomUDDT AS TABLE ( FieldOne varchar (512), FieldTwo varchar(1024) ) … Read more

how to write number to word function in sql server [closed]

Consider using an auxiliary numbers table. NB: This MS SQL Create a Sequence table – This could include all the numbers you need or at least up to 999. I have limited it to the least but it adds extra logic. CREATE TABLE [dbo].[Sequence] ( seq INTEGER NOT NULL UNIQUE, word [varchar](25) NOT NULL ) … Read more

Writing UDF for looks up in the Map in java giving Unsupported literal type class java.util.HashMap

You can pass the look up map or array etc. to the udf by using partial. check out this example. from functools import partial from pyspark.sql.functions import udf fruit_dict = {“O”: “Orange”, “A”: “Apple”, “G”: “Grape”} df = spark.createDataFrame([(“A”, 20), (“G”, 30), (“O”, 10)], [“Label”, “Count”]) def decipher_fruit(label, fruit_map): label_names = list(fruit_map.keys()) if label in … Read more

Spark Error:expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct)

The source of the problem is that object returned from the UDF doesn’t conform to the declared type. np.unique not only returns numpy.ndarray but also converts numerics to the corresponding NumPy types which are not compatible with DataFrame API. You can try something like this: udf(lambda x: list(set(x)), ArrayType(IntegerType())) or this (to keep order) udf(lambda … Read more

Table-Valued Functions in ORACLE 11g ? ( parameterized views )

No need for SYS_CONTEXT or cursor definitions. You do need a type so that, when the SQL is parsed, it can determine which columns are going to be returned. That said, you can easily write a script that will generate type and collection type definitions for one or more tables based on the data in … Read more

Returning from a function with OUT parameter

It would work like this: CREATE OR REPLACE FUNCTION name_function(param_1 varchar , OUT param_2 bigint) LANGUAGE plpgsql AS $func$ BEGIN INSERT INTO table (collumn_seq, param_1) — “param_1” also the column name? VALUES (DEFAULT, param_1) RETURNING collumn_seq INTO param2; END $func$; Normally, you would add a RETURN statement, but with OUT parameters, this is optional. Refer … Read more

How to find mean of grouped Vector columns in Spark SQL?

Spark >= 2.4 You can use Summarizer: import org.apache.spark.ml.stat.Summarizer val dfNew = df.as[(Int, org.apache.spark.mllib.linalg.Vector)] .map { case (group, v) => (group, v.asML) } .toDF(“group”, “features”) dfNew .groupBy($”group”) .agg(Summarizer.mean($”features”).alias(“means”)) .show(false) +—–+——————————————————————–+ |group|means | +—–+——————————————————————–+ |1 |[8.740630742016827E12,2.6124956666260462E14,3.268714653521495E14] | |6 |[2.1153266920139112E15,2.07232483974322592E17,6.2715161747245427E17]| |3 |[6.3781865566442836E13,8.359124419656149E15,1.865567821598214E14] | |5 |[4.270201403521642E13,6.561211706745676E13,8.395448246737938E15] | |9 |[3.577032684241448E16,2.5432362841314468E16,2.3744826986293008E17] | |4 |[2.339253775419023E14,8.517531902022505E13,3.055115780965264E15] | |8 |[8.029924756674456E15,7.284873600992855E17,3.08621303029924E15] | |7 |[3.2275104122699105E15,7.5472363442090208E16,7.022556624056291E14] … Read more