Create random and unique result set with repeated info

dreamquartz · Post by **dreamquartz** » Sun Jan 24, 2021 2:16 am

Hi All,

I have a table indexed from 0-99.
The information is repeated after Index: 13.
This means that Index 23 holds the same information as index 9.

ID Information
-------------
0 D
1 G
2 F
3 N
4 A
5 J
6 M
7 B
8 I
9 C
10 H
11 E
12 K
13 L
14 D
15 G
16 F
17 N
18 A
19 J
20 M
21 B
22 I
23 C

I have to randomly select a total number of records with a min of 1 and a maximum of 14, where the information is unique.

Example: The total number of selected records is 7 records.
NOTE: The next time the total number of selected records is again 7, the resultset can be completely different.

The information can be anything between A and N, as long as the information is not repeated.

Example 1st time random selection
CORRECT; All information is UNIQUE

ID Information
-------------
0 D
17 N
13 L
6 M
9 C
22 B
10 H

Example 2nd time random selection
CORRECT; All information is UNIQUE

ID Information
-------------
3 N
4 A
8 I
21 B
19 J
11 E
0 D

Example of ramdom selection
INCORRECT; Index: 20 holds the same information as Index: 6

ID Information
-------------
20 M
17 N
13 L
6 M
9 C
22 B
10 H

Can anyone please indicate how to solve something like this.

Any suggestion is much appreciated,

Dream

chrisb · Post by **chrisb** » Sun Jan 24, 2021 9:41 pm

2 bits of code:
i have used the rownum() function so HSQLDB 2.x is required, i used 2.5.1.
replace all occurrences of "tData" with your table name.

Edit: 2021 Jan 24
updated code should have used floor & not ceiling:
have replaced
"RowNum" + (ceiling(y."InfoCount" * rand() - 1) * 14 - 1 id --ID of records we wish to select
with
"RowNum" + floor(y."InfoCount" * rand()) * 14 - 1 id --ID of records we wish to select
in both queries, output is identical.

Code: Select all

--qShowAllValues.
--should aid in understanding of the concept
--"InfoCount": number of occurrences of "Info" in source table i.e. sequences. (a sequence is 14 records)
--"ID": records which match the top whatever of ID will be selected from your table
select
--	top 7 --literal value of records to select i.e. 1 to 14
	y.*,
	"RowNum" + floor(y."InfoCount" * rand()) * 14 - 1 id --ID of records we wish to select
from
( --get row number of "Info" from our result set
	select
		z.*, rownum() "RowNum"
	from
	( --distinct "Info" & number of occurrences in source table
		select
			"Info", count(*) "InfoCount"
		from
			"tData"
		group by
			"Info"
	) z
) y
order by rand()

Code: Select all

--qFinalCode
select *
from "tData"
where ID in
(
	select
		top 7 --literal value of records to select i.e. 1 to 14
		"RowNum" + floor(y."InfoCount" * rand()) * 14 - 1 id --ID of records we wish to select
	from
	( --get row number of "Info" from our result set
		select
			z.*, rownum() "RowNum"
		from
		( --distinct "Info" & number of occurences in source table
			select
				"Info", count(*) "InfoCount"
			from
				"tData"
			group by
				"Info"
		) z
	) y
	order by rand()
)
order by "Info"

dreamquartz · Post by **dreamquartz** » Tue Jan 26, 2021 3:44 am

Thanks for your response, chrisb.

I am trying to implement your answer into my DataBase.
It appears to be working.

I do have a question, though.
The '7' used in

Code: Select all

TOP 7

is not a static number, but a dynamic one.
I chose '7' arbitrarily, for the example.

Is there a way to make it into a User Input?
The Query only runs in Direct SQL.
The information available about 'TOP" does not make it clear.

Keep you posted,

Dream

chrisb · Post by **chrisb** » Tue Jan 26, 2021 5:09 am

so you want a parameter query?

1) delete the line with 'top 7'.

2) insert:

Code: Select all

and rownum() <= :Num_Rows_1_To_14

before the final 'order by' clause.

3) enable the parser.

Edit: 2021 Jan 28 20:09
@dreamquartz
the original code with 'top 7' worked perfectly but you wanted a parameter query.
the base parser will not accept 'top', 'limit' or 'fetch' so i suggested using 'rownum()' but did not test.
we can use rownum() to resolve our issue but not in the way suggested.
we need to insert one more level ('select' statement) to our code.
i have tested the code below & it appears flawless.

Code: Select all

--qFinalCode_Updated
select *
from "tData"
where ID in
(
	select * from
	(
		select
			y."RowNum" + floor(y."InfoCount" * rand()) * 14 - 1 id --ID of records we wish to select
		from
		( --get row number of "Info" from our result set
			select
				z.*, rownum() "RowNum"
			from
			( --distinct "Info" & number of occurences in source table
				select
					"Info", count(*) "InfoCount"
				from
					"tData"
				group by
					"Info"
			) z
		) y
		order by rand()
	)
	where rownum() <= coalesce(:Num_Rows_1_To_14, 14)
)
order by "Info"

dreamquartz · Post by **dreamquartz** » Fri Jan 29, 2021 12:01 am

Hi chrisb,

Thanks for your response.

Tried the coding, but running it gives:

SQL Status: 42563
Error code: -5563

incompatible data type in operation in statement [SELECT * ....]

When trying to figure out why I could not get a resultset of larger than "ID" = 12, I realized that "Info" is hard linked to "ID". The need therefore is that not only "Info" is UNIQUE, but the combination of "ID"-"Info" must be UNIQUE ,where "Info" must be UNIQUE.

Code: Select all

SELECT *
 FROM
	 "tData"
 WHERE
	 ID in
		 (
			 SELECT
				 top 7 *,
				 "RowNum" + floor(y."InfoCount" * rand()) * 14 - 1 id
			 FROM
				 (
					 SELECT
						 z.*,
						 rownum() "RowNum"
					 FROM
						 (
							 SELECT
								 "Info",
								 count(*) "InfoCount"
							 FROM
								 "tData"
							 GROUP BY
								 "Info"
						 ) z
				 ) y
			 ORDER BY
				 rand()
		 )
ORDER BY
"Info"

does also not run due to

SQL Status: 42578
Error code: -5578

duplicate column name in derived table in statement [SELECT * .....]

Because of the 1st Error, I could run this part

Code: Select all

SELECT
	 TOP 7 y.*,
	 "RowNum" + FLOOR(y."InfoCount" * RAND()) * 7 -1 "ID"
 FROM
	 (
		 SELECT
			 z.*,
			 ROWNUM() "RowNum"
		 FROM
			 (
				 SELECT
					 "Info",
					 COUNT(*) "InfoCount"
				 FROM
					 "tData"
				 GROUP BY
					 "Info"
			 ) z
	 ) y
 ORDER BY
	 RAND()

, but this is one of the resultsets:

: resultset.png (7.5 KiB) Viewed 4833 times

The attachment shows that "ID" is repeated.

I am trying to figure out how to get the " ID" unique in the resultset, where the minimum is 1 and the maximum is 14 records.
For both these resultsets, the Query works.

Dream

chrisb · Post by **chrisb** » Fri Jan 29, 2021 2:09 am

i first responded to this thread on 24 Jan it's now 28 Jan.

my code does not contain top 7 *,
my code does not contain RAND()) * 7 -1

all you have to do is:
1) drag the mouse over my code. i suggest using 'qFinalCode_Updated' which i posted earlier today (it accepts user input), but you may choose any one of the three versions i posted (they all work).
2) hit right mouse button over the highlighted selection.
3) select Copy from the pop-up menu.
4) paste into a text editor (not a word processor), 'Notepad++' is very good.
5) search for "tData" & replace with "YourTableName" (you did not provide a table name).
6) search for "Info" & replace with "YourFieldName" (you provided 'Information' as the field name).
7) paste the amended text into a 'Base Query Design Window'.
8) execute.
9) you may now input a value between 1 & 14 then hit enter or just hit enter (defaults to 14).

assuming that you are using the code named 'qFinalCode_Updated':
to repeatedly view 14 records (max result set) click the 'Refresh' icon then hit enter. (14 rows of unique "Info" with their corresponding unique random ID's.

dreamquartz · Post by **dreamquartz** » Fri Jan 29, 2021 6:02 am

Hi chrisb,

Sorry about that.
Me trying to translate the principles behind the Queries into my DataBase.
Forgot to change the info back to your examples.
The table tData had to have the "ID" to Field Type: Double for your Queries to work.

In my DataBase, using my tables, however, it still does not work.
2 things are significantly different:
1. I used letters to distinct better between "ID" and "Information". In my DataBase "ID" "Information" are actually SMALLINT.
2. I simply created a sequential repeat of the "Information", to create the data example for this topic. In my DataBase that is completely random.
Both differences have apparently impact on the resultset I need to create.

Attached you will find the real relationship between "ID" and "Information".
Background:
"Test" stands for tests we provide to Workers
"Exam" stands for exams we provide to Workers.

There are more similar tests (22), grouped in groups of 2, than there are similar exams (12) grouped in groups of 2.
Test 1 is almost identical to test 2. This is throughout the test-bank. This is also the case for the exams.
Exams are similar to the tests.
The table has a 'hard coded' relationship between different tests and different exams.
That relationship can not be broken. The table will be expanded in the future with more tests and exams.

Edit: 01/29/2021
The number of tests and the number of exams is growing

I am trying to make sure that there as many random choices available, based on the groups of Workers (min 1, max 12) that need to be examined.
The criteria are:
1. worker is not allowed to get the same test as the exam
2. not the same exam in an exam group
3. different tests, to avoid cheating

Edit: 01/29/2021
4. random order of test/exam per exam group

Dream

chrisb · Post by **chrisb** » Sat Jan 30, 2021 12:27 am

i have provided a solution to the initial question & consider this thread solved.

you are now asking a different question, supplying different data and stipulating new criteria.
i assume ID is a primary key & could possibly have gaps in the sequence. i am unsure as to its relevance.
i assume that "Test" is uique, sequential, without gaps & always contains an even number of records which should be grouped in pairs (1 & 2, 3 & 4, 21 & 22 etc.) & that only 1 from a pair may be selected (never 2).
"Exam" contains values between 1 & 12 & are paired. it looks as though they are paired by "Test" i.e. "Test" 1 & 2 with "Exam" 3 & 5? therefore duplicates of "Exam" are allowed?

i do not understand:
worker is not allowed to get the same test as the exam

an example of the result you are looking for assuming 12 workers & using the data from the attachment would be a great help.

i am doing a lot of guessing here so try the code below, it's based on my comments above & if it's not what you are looking for i think that it's probably best that you start a new thread.

Code: Select all

--qOneOfPair
select a.* from
(
	select d1.ID, d1."Test", d1."Exam"
	from "tData" d1
	join
	(
		select
			case when rand() < .5 then "Test" else "Test" + 1 end "Test" --if rand() >= 0.5 add 1 to "Test" (second of pair)
		from "tData"
		where mod("Test", 2) = 1 --only select "Test" with odd value (first of pair)
	) d2
	on d1."Test" = d2."Test"
	order by rand()
) a
where rownum() <= :NumRows_1_to_12
order by "Test"

dreamquartz · Post by **dreamquartz** » Mon Feb 01, 2021 2:18 am

Hi chrisb,

I will have to leave the issue alone for a while, because of other responsibilities.
I will have limited access to internet.
I copied your response.
I will look at it offline.
You have provide me with very useful information and insight. I thank you for that.

When I am available again, I will be able to pick this issue up.

Dream

chrisb · Post by **chrisb** » Tue Feb 02, 2021 8:10 pm

@dreamquartz,

as i peer through the mist:
the initial data posted shows the field "Information" as 12 distinct units of text repeated in a defined sequence.
what you actually have is a field called "Exam" as 12 distinct units of type smallint which is repeated randomly.
these are 2 obviously different scenarios requiring different solutions.
as you want to use parameters(1-12) in order to vary output i need to use 2 queries (save both with parser enabled).
table: "tData".
fields: "ID", "Test", "Exam".

save as 'qUniqueExam_Setup'

Code: Select all

--qUniqueExam_Setup
select a.*, rownum()
from (select ID, "Test", "Exam" from "tData" order by "Exam", rand()) a

save as 'qUniqueExam_Output'

Edit: 2021 Feb 03. some how omitted 'order by rand()', now fixed.

Code: Select all

----qUniqueExam_Output
select a.* from
(
	select b.ID, b."Test", b."Exam"
	from
		(select "Exam", min(ROWNUM) r from "qUniqueExam_Setup" group by "Exam") a
	join
		(select * from "qUniqueExam_Setup") b
		on a.r = b.ROWNUM
	order by rand()
) a
where rownum() <= coalesce(:NumRows, 12)
order by "Exam"

dreamquartz · Post by **dreamquartz** » Sat Feb 27, 2021 7:55 pm

chrisb wrote:@dreamquartz,

as i peer through the mist:
the initial data posted shows the field "Information" as 12 distinct units of text repeated in a defined sequence.
what you actually have is a field called "Exam" as 12 distinct units of type smallint which is repeated randomly.
these are 2 obviously different scenarios requiring different solutions.
as you want to use parameters(1-12) in order to vary output i need to use 2 queries (save both with parser enabled).
table: "tData".
fields: "ID", "Test", "Exam".

save as 'qUniqueExam_Setup'
Code: Select all
--qUniqueExam_Setup
select a.*, rownum()
from (select ID, "Test", "Exam" from "tData" order by "Exam", rand()) a
save as 'qUniqueExam_Output'
Edit: 2021 Feb 03. some how omitted 'order by rand()', now fixed.
Code: Select all
----qUniqueExam_Output
select a.* from
(
	select b.ID, b."Test", b."Exam"
	from
		(select "Exam", min(ROWNUM) r from "qUniqueExam_Setup" group by "Exam") a
	join
		(select * from "qUniqueExam_Setup") b
		on a.r = b.ROWNUM
	order by rand()
) a
where rownum() <= coalesce(:NumRows, 12)
order by "Exam"

Hi chrisb,

Wow, I am so thankful.
This works.....

I have tried many variations, before asking....

I am now working on having both Exam and Test to be unique in a resultset.

So, I will leave it open for now instead of SOLVED.

Dream

Create random and unique result set with repeated info

Create random and unique result set with repeated info

Re: How 2 create random and unique resultset with repeat inf

Re: How 2 create random and unique resultset with repeat inf

Re: How 2 create random and unique resultset with repeat inf

Re: Create random and unique result set with repeated info

Re: Create random and unique result set with repeated info

Re: Create random and unique result set with repeated info

Re: Create random and unique result set with repeated info

Re: Create random and unique result set with repeated info

Re: Create random and unique result set with repeated info

Re: Create random and unique result set with repeated info